Geometric shape rectification methods such as trapezoidal distortion correction and line straightening distortion corrections may be needed before sending to the OCR engine (See Figure 5).įigure 5. ![]() Several factors can create distortion in the scanned image or document that contains the text of interest. Skew correction using histogram projection. From simple histogram projection (see Figure 4) to model-based alignment correction may be needed depending on the complexity of the input document.įigure 4. Most scanned documents and text source images are not properly aligned, and this has a significant implication in the accuracy of the OCR result. Figure 3 shows an example cropping of a receipt from the background.įigure 3. Automatic cropping may be achieved by training a dedicated cropping model or using existing OCR engines and some heuristics and image processing to detect the rough boundaries of all detected text in the image. ![]() CroppingĬropping the relevant region of interest (ROI) that contains the text of interest is the first step in preprocessing for OCR. Among the most used preprocessing algorithms applied are cropping, alignment correction, distortion correction, binarization, and denoising (filtering noise out) of the input document or image. This is the most crucial stage next to the actual OCR engine itself. Several preprocessing stages need to be applied to increase the overall recognition accuracy of the OCR system. The input scanned documents or images are usually not in an ideal size, shape, and orientation. Documents may be in various orientation, deformations can happen because of the scanning process, and noises may be introduced because of the scanning cameras and/or the environment in which the scanning happened. Most often, the input needs to pass through several preprocessing stages before the OCR is applied. Real-world scanned documents and text embedded in images and videos are not readily available for applying OCR directly. A high-level generalized OCR pipeline with the computer vision modules, ML-based or traditional feature extraction and character/word recognition, and text processing modules. These two components are used to prepare the input for an OCR engine while the postprocessing stage is used to convert the raw text into structured text output. The two main components of the CV modules are the preprocessing and segmentation stages. It requires a combination of computer vision (CV) modules, recognition (ML) modules, and text modules to extract the text into a readily useable structured form (Figure 2). OCR is used to convert text embedded in scanned documents, images, or videos into a format that is easily editable, searchable, and ready for downstream NLP analytics task. ![]()
0 Comments
Leave a Reply. |