What is Optical Character Recognition?

Optical character recognition or OCR is one way of automatic data entry. This type of software changes text—whether handwritten, type-written or printed information—into a type of data that can be edited on a computer. In its simplest form, paper documents are scanned by the use of an image scanner. The role of the OCR software is to then view the image and compare the composition of the letters to some letter images already stored by the software. When this is done, the software creates a text file that anyone can edit by using a normal text editor.

As for more complicated programs, they look at several more things aside from the shape of the letters in the original document. The system, view images, layout and others. They produce versions very similar to the original document but in a version that can be changed and edited. As always, OCR produces it’s best work when used with clean and neat printed materials.

OCR is used mainly to create an electronic version of books and documents. It can also be used to computerize records in an office. Also, it can be used to publish text on a website. By converting a document to OCR, one can now edit the contents, find a word, store it with a smaller file size, show a copy without scanning artifacts. After OCR, different techniques can also be used on the document such as machine translation, text mining and text-to-speech. OCR is also used in pattern recognition, computer vision and artificial intelligence.

It is important to know that OCR systems need to be calibrated before it can be used to read some fonts. There are now more “smarter” OCR systems that have a high degree of recognition and can accurately discern most fonts. Some of the early versions of OCR can only read one font at a time.