For decades, OCR was the sole means to transform printouts into computer-process able data, and it is still the preferred method for turning paper invoices into extractable data that can be linked into financial systems, for example. However, electronic document submission now provides organizations with a significantly improved approach to areas like invoicing and sales processing, lowering costs and allowing employees to focus on higher-value activities. All thresholding1 techniques fall under the class of “data-reduction” algorithms. That is, the techniques seek to compress or reduce the information contained within a given image in order to reduce computational complexity. The purpose then is to remove or reduce the unwanted information or “noise” and leave the “important” information intact, such as removing the background of a bank check so the OCR application can read the signature panel. For OCR applications, data reduction is usually confined to reducing a grayscale or color image to a black-and-white (binary or bi-tonal) image. This reduction is accomplished by calculating a level of intensity against which individual pixel values are compared.
This work is licensed under a Creative Commons Attribution 4.0 International License.
You may also start an advanced similarity search for this article.