Select Region
Global Web Site - English
North America - English
Western Europe - English
Western Europe - Deutsch
Western Europe - Français
Russia - Русский
Ukraine - Русский
China - 中文
Brazil - Português
Suppose you wanted to digitize a magazine article or a printed contract. You could spend hours retyping and then correcting misprints. Or you could convert all the required materials into digital format in several minutes using a scanner (or a digital camera) and Optical Character Recognition software.
What exactly is meant by OCR?
Optical Character Recognition, or OCR, is a technology that enables you to convert different types of documents, such as scanned paper documents, PDF files or images captured by a digital camera into editable and searchable data.
Imagine you’ve got a paper document - for example, magazine article, brochure, or PDF contract your partner sent to you by email. Obviously, a scanner is not enough to make this information available for editing, say in Microsoft Word. All a scanner can do is create an image or a snapshot of the document that is nothing more than a collection of black and white or colour dots, known as a raster image. In order to extract and repurpose data from scanned documents, camera images or image-only PDFs, you need an OCR software that would single out letters on the image, put them into words and then - words into sentences, thus enabling you to access and edit the content of the original document.
What Technology lies behind OCR?
The exact mechanisms that allow humans to recognize objects are yet to be understood, but the three basic principles are already well known by scientists – integrity, purposefulness and adaptability (IPA*). These principles constitute the core of ABBYY FineReader OCR allowing it to replicate natural or human-like recognition.
Let’s take a look on how FineReader OCR recognizes text. First, the program analyzes the structure of document image. It divides the page into elements such as blocks of texts, tables, images, etc. The lines are divided into words and then - into characters. Once the characters have been singled out, the program compares them with a set of pattern images. It advances numerous hypotheses about what this character is. Basing on these hypotheses the program analyzes different variants of breaking of lines into words and words into characters. After processing huge number of such probabilistic hypotheses, the program finally takes the decision, presenting you the recognized text.
In addition, ABBYY FineReader provides dictionary support for 36 languages. This enables secondary analysis of the text elements on word level. With dictionary support, the program ensures even more accurate analysis and recognition of documents and simplifies further verification of recognition results.
* IPA
What Principles Is FineReader OCR Based On? The most advanced recognition systems, such as ABBYY FineReader OCR, are focused on replicating natural or “animal like” recognition. In the heart of these systems lie three fundamental principles: Integrity, Purposefulness and Adaptability. The principle of integrity says that the observed object must always be considered as a “whole” consisting of many interrelated parts. The principle of purposefulness supposes that any interpretation of data must always serve some purpose. And the principle of adaptability means that the program must be capable of self-learning.
One does not have to be an OCR specialist to see the advantages of an OCR application built on the IPA principles. These principles endow the program with maximum flexibility and intelligence, bringing it as close as possible to human recognition.
After years of research, ABBYY was able to implement the IPA principles described above in its OCR technologies.
Recognition of Digital Camera Images
Images captured by a digital camera differ from scanned documents or image-only PDFs. They often have defects such as distortion at the edges and dimmed light, making it difficult for most OCR applications, to correctly recognize the text. The latest version of ABBYY Fine Reader supports adaptive recognition technology specifically designed for processing camera images. It offers a range of features to improve the quality of such images, providing you with the ability to fully use the capabilities of your digital devices.
More information on Recognition of Digital Camera images is available here.
How to use OCR Software?
Using ABBYY FineReader OCR is easy: the process generally consists of three stages: Open (Scan) the document, Recognize it and then Save in a convenient format (DOC, RTF, XLS, PDF, HTML, TXT etc.) or export data directly to one of Office applications such as Microsoft Word, Excel or Adobe Acrobat.
In addition, the latest version of ABBYY FineReader supports Automated Tasks mode which is essential when you deal with routine tasks regularly. With this feature, recognition tasks run automatically without having to manually execute all of the above mentioned steps
What benefits does OCR bring to You?
Digital cameras are becoming more and more popular and truly multipurpose. In addition to everything else, you can use your camera as a portable “scanner” to capture text from hardcopy documents, books, newspapers, as well as from banners, posters and other media. Then, with ABBYY FineReader, you can convert your camera images into electronic text files for editing, archiving, creating new documents and for other purposes.
Where can I use my digital camera to capture text?
A digital camera is an ideal alternative to a scanner, if you don’t want to deal with a scanner each time when you need to convert a document into a text file, especially if you don’t do it very often. Camera images can be easily opened in ABBYY FineReader, ready for processing.
If you are working with books (for example, in a library), you can simply take you digital camera and capture all necessary text and images for further processing on you PC or notebook (even from those books that cannot be scanned at all).
When you are traveling out of the office (for example, on a business trip) and need to digitize some important documents for editing, archiving, creating other documents, a digital camera can be used as a portable scanner.
A digital camera can also be used to capture text outdoors from banners, posters, billboards, walls, timetables and so on.
At last, you will probably find your own way to use a digital camera and its new capabilities. But if you are going to use it like James Bond does, please don’t forget about intellectual property rights and copyright laws.
Tips&Tricks for shooting text with a digital camera
It is important to learn how to use a digital camera effectively for the best OCR results, even for skillful photographers. Camera images differ from scanned images by a range of characteristics, but ABBYY FineReader, with its adaptive recognition technology for camera images, makes them appropriate for OCR and conversion into text formats.
So, if you have FineReader installed on your PC and know some simple “secrets” on how to shoot documents and books, you will certainly receive perfect results. Well, the secrets are:
Your Digital Camera
Use a digital camera with 5-megapixel resolution or higher, ideally equipped with the following features:
General Tips
Try to take 2-3 shots of the same document to make sure your hands are stable and an image didn’t come out blurry or a corner of the document wasn’t cut off.
Use the “close-up” or “macro” mode. In most cameras it is indicated by a flower icon.
Camera Positioning and Focus
Position the lens parallel to the plane of the document.
Fit the entire document into the frame.
Focus on the center of a page.
Use the camera’s optical zoom to zoom in on the document and frame it tightly around the document.
Lighting anf Flash
Make sure there is sufficient lighting. Natural light is the best.
Disable the flash (in most point-and-shoot digital cameras, the flash is on auto mode by default).
If you have to take a picture of a document in poor lighting and need the flash, try to use the flash from 20 inches away and try to find additional light sources.
Don’t use the flash on glossy paper.
Extra Tips for Advanced Users
And finally, if you know your camera “inside out” and wish to improve your skills in photographing documents and books or wish to achieve good results in some special conditions, there are a few extra tips to follow:
Use the white balance feature. If your camera has manual white balance, use a white sheet of paper to set white balance. Otherwise, select the appropriate balance mode for your lighting conditions.
Enable the anti-shake setting: otherwise, use a tripod.
In poor lighting conditions: