Search Imaged Text
ContentCrawler is an integrated analysis, processing and reporting framework that provides document management professionals with the peace of mind of knowing that their content is 100% searchable.
The automated end-to-end process intelligently assesses image-based documents in the content repository for conversion to a searchable format, which is then re-profiled.
- Increase organizational productivity
- Simplify management of image-based documents
- Reduce non-compliance risks
- Increase efficiency through automation
- Leverage investment in DMS and search technology
- Reduce costs managing OCR technology
Key ContentCrawler Features
Search and Find
ContentCrawler finds image-based documents, even ones within email attachments stored in a content repository. These documents will be OCR’ed and converted to text-searchable PDFs.
Super Smart Technology
ContentCrawler will not OCR documents that have a text layer, or documents that have been identified as having little or no text. The “text threshold” can be set by the Administrator to ignore documents with minimal text.
Processing can run in one of two (or both) modes: Convert Backlog or Active Monitoring. Convert Backlog converts all legacy documents to text-searchable PDFs, while Active Monitoring converts documents as soon as they are profiled into the content repository.
Documents are converted to text-searchable PDFs and automatically saved as New Versions, Attachments or Related Documents in the DMS. These documents are now text-searchable and ready to be found by your DMS search technology.
Automated or Manual Process
Converting image-based documents to text-searchable PDFs can be an automated end-to-end process or a manual one with built-in “Hold for Review” stages before Convert to PDF and/or Save Back into the DMS.
ContentCrawler is an integrated analysis, processing and reporting framework that gives document management professionals peace of mind knowing their content is 100% searchable.
ContentCrawler integrates with the following systems:
- Autonomy iManage
- HP TRIM
- MS SharePoint
- OpenText Content Server
- OpenText eDOCS DM
- OpenText Livelink