Download

Section categories

3D and Graphical Chat Tools [492]
3D Plug-ins [27]
Audio and Video Plug-ins [189]
Conferencing and Collaboration Tools [218]
Cookie and Cache Managers [147]
Dial-up Networking Dialers [178]
Download Managers [593]
FTP and Archie Clients [483]
FTP Server Tools [81]
Graphics Plug-ins [13]
Internet Client Suites [196]
Keep Alive Tools [45]
Mail Server Tools [187]
Misc Communications Tools [553]
Misc Dial-up Networking Tools [77]
Misc Networking Tools [1080]
Misc Plug-ins [72]
Misc Server Tools [172]
Misc Web Browser Tools [771]
Misc Web Server Tools [54]
Misc Winsock Tools [84]
MUD Clients [17]
Network Information Tools [442]
Network Management Tools [434]
Newsreader Tools [139]
Offline Browser Tools [156]
Online Timers [99]
Presentation Plug-ins [7]
Proxy Server Tools [191]
Remote Computing Tools [242]
Stock and News Tickers [62]
Terminals and Telnet Clients [132]
Text Chat Clients [243]
UNIX Ports and Commands [19]
URL and Bookmark Managers [165]
Video Chat Tools [81]
Voice Chat Tools [2]
VRML Browsers [13]
Weather Tickers [35]
Web Browsers [322]
Web Searching Tools [479]
Web Servers [68]
Web Server Components [38]
Web Server Statistics Tools [58]
Web Surfing Accelerators [90]

Our poll

How often do you buy software?
Total of answers: 4

Statistics


Total online: 2
Guests: 2
Users: 0

Login form

File Catalog

Main » Files » Network and Internet » Misc Communications Tools

OCR File Splitter 2.0
26.01.2011, 18:04
Automatically Separate Tiff images or Searchable PDFs by their Text Content
OCR File Splitter is a program that is designed to split files based upon text contents. It can be used on Tiff Images (requires Microsoft Office Document Imaging) or searchable PDF files.

The program will separate a multi-page file into individual files by applying rules to each page of the document. If text is present that matches a rule it will become the first page of a new document, if it is not present the page will be added to the previous document.
With this logic files can be mixed together as the first page on one document may contain "Acme Corp" and on another it may contain "Consolidated Corp" etc., as the program can process an unlimited amount of rules when searching for the first page of a document.

The program monitors (watches) file folders for images to process. As many folders as desired can be watched with each having a different set of rules being applied to the files. This allows an easier setup if in the workflow process some manual separation can be done. For instance, the program could be used to separate all Invoices and purchase orders in one batch; however, if purchase orders and invoices were both placed in separate input folders setup would be easier.

How it works:
To determine the beginning of the document the program utilizes the OCR engine in Microsoft Office Document Imaging to obtain the document's text or it will extract the text from a searchable PDF. Once extracted the text is searched to see if it contains text that matches a rule. The rule could be as simple as it has to contain a certain word or phrase. Or it could be it must contain, and must not contain certain words or phrases. To assist in correcting OCR errors the program utilizes fuzzy logic and EasyPatterns.

Category: Misc Communications Tools | Added by: File-Post
Views: 79 | Downloads: 0 | Rating: 0.0/0
Total comments: 0
Only registered users can add comments.
[ Registration | Login ]

Search

Site friends

add site