KnowledgeShape - Indexing

Main

Products

Solutions

Technology

Pricing

Partners

Support

Corporate Info

Feedback

Technology :

Technology Overview

Searching

Crawling

Indexing

Paper & Voice Notes Conversion

Information Landscape

Operating Systems, Applications, & Server Platforms

Intranet Site Deployment Framework

Packaging

Indexing:

CrawlScape
CS is a graphical user interface that contains an extensive set of functions for crawling, indexing and search deployment. Advanced controls permit users to construct crawling and indexing projects, and varieties of user defined search sites. The system is an index management framework with aggressive features for building, managing and maintaining index and search assets for the enterprise. Scheduling simplifies administration of the enterprise search infrastructure.

The system can be used to crawl and index documents, databases, email systems and documents that form relationship pairs or triplets also known here as polytuplets. An example is: a jpeg and a text document form a searchable crime evidence pair (tuplet). The crawler is controlled by an application which uses MySQL to manage the complex features of server management, crawl controls, scheduling crawl and index processes, and user management.

CrawlScape simplifies user integration of crawl and index processes in complicated and hard to manage search environments. Screenshots of the various functions mentioned are given at the bottom of this page while the following table summarizes options and configurations for the framework.

CrawlScape	Server Install	Multiple Sessions Manager	Distributed Host Servers	Multi- Search Deploy -ment	Scheduler	User Mgmt	URL Crawl	Database Crawl	Email Crawl	Index Merge	Tuplets
CS-1	Single						x
CS-2	Multiple					x	x
CS-3	Multiple				x	x	x			x
CS-4	Multiple	x	x	x	x	x	x			x
CS-Enterprise	Multiple	x	x	x	x	x	x	x	x	x	x

Index technology
CrawlScape indexing technology employs an open source Java based API together with PatternScape and CrawlScape Java Servlets. CrawlScape combines these applications and creates an indexing and search deployment framework for system administrators and site management professionals. Merging and moving multiple indexes are added processing functions of the framework while multiple search site deployment is central to its power. The framework is constructed in Java servlets, and also uses MySQL and Tomcat applications with an automatic scheduling system for session administration.

Files types (Documents, Text, PDF, HTML, XML, JPEG, image files, databases, email)
CrawlScape indexes the content of virtually any readable document or file type if there exists a plug-in for it (please refer to crawling section). KS has a developer partner program in which it distributes plug-ins for developers, who are specialized in document type filtering and recognition. KS has developed special plug-ins for file tuplets, databases and email as mentioned throughout this web site.

Index process control
Indexing is a CPU and RAM intensive operating process, thus it requires load management and scheduling of tasks for trouble free deployments. CrawlScape is furnished with a scheduling process to help users plan the load profile across 24-hour periods. Since index-merging facility exists, users can schedule small index sessions and merge in a batch period, thereby allowing for optimum use of server and network availability. Importantly, KS clients enjoy features that have been put through rigorous tests and deployment scenarios to ensure success in the real world. Please refer to the pricing and configuration section for details about pre-configured test kits for your network.