|
|
Indexing:

CrawlScape
CS is a graphical user interface that contains an extensive set of functions for crawling, indexing and search deployment. Advanced controls permit users to construct crawling and indexing projects, and varieties of user defined search sites. The system is an index management framework with aggressive features for building, managing and maintaining index and search assets for the enterprise. Scheduling simplifies administration of the enterprise search infrastructure.
The system can be used to crawl and index documents, databases, email
systems and documents that form relationship pairs or triplets also known here as polytuplets. An example is: a jpeg and a text document form a searchable crime evidence
pair (tuplet). The crawler is controlled by an application
which uses MySQL to manage the complex features of server
management, crawl controls, scheduling crawl and index
processes, and user management.
CrawlScape simplifies user integration of crawl and index processes in complicated and hard to manage search environments. Screenshots of the various functions mentioned are given at the bottom of this page while the following table summarizes options and configurations for the framework.
| CrawlScape |
Server
Install |
Multiple
Sessions Manager |
Distributed
Host
Servers |
Multi-
Search
Deploy
-ment |
Scheduler |
User
Mgmt |
URL
Crawl |
Database
Crawl |
Email
Crawl |
Index
Merge |
Tuplets |
| CS-1 |
Single |
|
|
|
|
|
x |
|
|
|
|
| CS-2 |
Multiple |
|
|
|
|
x |
x |
|
|
|
|
| CS-3 |
Multiple |
|
|
|
x |
x |
x |
|
|
x |
|
| CS-4 |
Multiple |
x |
x |
x |
x |
x |
x |
|
|
x |
|
| CS-Enterprise |
Multiple |
x |
x |
x |
x |
x |
x |
x |
x |
x |
x |
Index technology
CrawlScape indexing technology employs an open source Java based
API together with PatternScape and CrawlScape Java Servlets. CrawlScape
combines these applications and creates an indexing and search deployment framework for system
administrators and site management professionals. Merging and moving multiple indexes are added processing
functions of the framework while multiple search site deployment
is central to its power. The framework is constructed in Java servlets, and
also uses MySQL
and Tomcat applications with an automatic scheduling system for
session administration.
Files types (Documents, Text, PDF, HTML, XML, JPEG, image files, databases, email)
CrawlScape
indexes the content of virtually any readable document or file type if there exists a
plug-in for it (please refer to crawling section). KS has a developer partner program in which it
distributes plug-ins for developers, who are specialized in document type filtering and
recognition. KS has developed special plug-ins for file
tuplets, databases and email as mentioned throughout this
web site.
Index process control
Indexing is a CPU and RAM
intensive operating process, thus it requires load management and scheduling of tasks for trouble free deployments. CrawlScape is furnished with a scheduling process to help users plan the load profile across 24-hour periods. Since
index-merging facility exists, users can schedule small index
sessions and merge
in a batch period, thereby allowing for optimum use of server and network availability.
Importantly, KS clients enjoy features that have been put
through rigorous tests and deployment scenarios to ensure
success in the real world. Please refer to the pricing and
configuration section for details about pre-configured test
kits for your network. |
|