Technology :


KS performs conversions in two ways: through the SPS/SPKS paper and wav conversion systems and with crawler file type conversion plug-ins for documents, databases and email. Discussions below address both areas with emphasis on OCR and voice recorded conversion.

Databases and XML
CrawlScape contains a plug-in and filter for crawling databases in a multi-step process. The process segments tables into individual records. An XML file is created to recombine records across related tables (polytuplet). This allows PS technology to recombine the individual pieces into full inter-related records. Real time search on a dynamic database is made possible if the database administrator writes XML records for all database transactions. CrawlScape will monitor change and apply indexing in real time to mirror the database changes. And this is done without letting users manipulate or view database applications - ideal for privacy and security, while at the same time providing a research environment on data for the pattern profiling searcher.

Paper (OCR) and Voice conversion (Speak to Text)
SPS is an automation infrastructure that requires a network scanner and the ScapeShape system with SPS option. The user simply selects the send button on the scanner. The rest is automated. Paper is fed into the scanner and automatically sent to a conversion repository, where documents are converted and saved in the index of the search system. All scanned documents are created in a triplet structure (polytuplet) so that during search a user may click the original scanned Tiff document, a PDF converted version with page based segmentation PDF's. The system is designed to be an easy to use, document storage and conversion system. Interception of scanned documents for cleaning and editing is not required as the SPS application converts, stores and splits the documents automatically. Refer to sections below for details.

A scanner and OCR client interface can be used for creative formatting of the output PDF documents and zone-based scanning; otherwise SPS is an automated scan and conversion system, eliminating need for time wasting document manipulation.

OCR is performed by a Microsoft Windows application while collation, mapping and sorting occur in UNIX systems. The following performance specifications are normal:
  • 5-20 documents per second conversion performance per server
  • 70-99.5% accuracy depending on print and voice quality
  • error correction eliminating 80-100% of errors during search
  • collation and sorting using PS vector mapping content analyzer
  • 1bit depth tiff conversion with 24 bit for images and maps

Automated OCR conversion (post scan)
SPS monitors a multiple-user folder system to watch for incoming scanned documents and triggers processes to transform new documents into searchable assets. PatternScape pattern recognition technology is used for collating and sorting documents based on content, thereby providing an automatic folder sub system. This is done transparently. The system can operate across WAN/LAN and Internet networks for remote use such as home office scanners, remote location scanners and fax feeds.

Error correction
Error correction is performed in a synonym mapping process where error terms (words) are mapped to correctly spelled words. Consequently, documents are found during search even when a search term is stored incorrectly as it is "synonym-mapped" to the correct equivalent.

Files types
SPS generates multiple segmented Tiff (1 bit depth and Tiff 24 bit depth-optional), multiple segmented PDF's, and text documents as well as an XML file tuplet for association during search. If the user searches for a document, they are given all formats for use and review, thereby eliminating the need to track file locations and relationships. Additionally, the original scanned Tiff is available as persistent original record.

Index process control
SPS passes the results of a converted document-to-collation processing which results in a folder storage structure. The user can either locate the scanned documents based on folder navigation or through a pattern-based search. All files converted in SPS are renamed from meaningless scanner naming convention to content-based vector naming. For example:

Dr. Chan folder
Document vector name: Sore-throat-Acute-Pharyngitis.pdf
Folder location: ../DrChan/Assessment/Patients/B Smith/Physical Exams/

Automated Speak-to-Text conversion
SPKS operates in an identical way as SPS with the exception that it processes wav files in real-time or in batch (uploaded). The Voice conversion technology uses a Microsoft API for conversion of speech to text. Error correction is provided through a synonym mapping search table as noted above.

Large wav file splitter
Wav files are split into snippets (forming a polytuplet in search) to facilitate search and listening speed. The polytuplet presentation includes the original wav file, its snippet, and a txt file. This way a user does not need to listen to long recordings to find results. Rather one may listen to snippets for confirmation before using large converted recorded files. Naturally, a transcriptionist saves time by using the converted document in cases that reports must be formatted.

Automated sort and collating folders
All wav documents are saved in folders based on content of the document. This is done using the PS DocMap process for mapping phrases and creating a folder system based on the mapped array. The administrator only need to create a table of reference for the system to sort documents correctly. For example, in a clinic application, doctor names would be root folders while patients folders would be sub-branches; assessments, billings, treatments, Rx, notes, and instruction folders would propagate below the patient folder-this is all done automatically based on document content compared to the mapping reference table. This provides for automated scan-and-convert archiving with collation and folder structuring. All this is wrapped in an extensive search infrastructure. Regardless of the size of the archival project, the system can convert and process large repositories. And  search vast amounts of document thereby providing an alternative to scan and save to DVD (CD) solutions.

