Forms often contain preprinted elements such as field boxes or character boxes. Scanned documents in some cases also have horizontal or vertical lines due to the banding or folding of papers. Dealing with those artifacts is a challenging task for a system that tries to detect boxes during processing business documents.
The document structure analysis and character recognition are usually done in several phases:
scanning
thresholding
skew detection and correction
despeckle or speckle removal
line removal
border removal
detection of preprinted elements (like boxes)
page orientation detection and correction
layout analysis
classification
character recognition
Each step must be completed well enough for the performance of the sequence and result to be successful. Steps that follow the box removal are inefficient if the correction fails.
BoxesHelper searches for boxes the aim being the extraction and recognition of the characters. Also boxes can be used as features in the step of form identification and recognition as anchor elements.
BoxesHelper expects as input a monochrome image.
|