OCR Overview

Optical Character Recognition (OCR) is a set of methods and techniques used to convert text printed on scanned image based documents into editable computer text (such as, ASCII or Unicode characters, not image files).

The text is analyzed by the OCR environment which generates multiple hypotheses about the characters. Each character is applied a different rating (or “weight”) and measured with a degree of confidence that the OCR environment has determined the value to be. Characters are analyzed by several different classifiers to obtain the maximum level of accuracy and confidence.

Intelligent Optical Character Recognition takes the OCR process a step further by taking the image and analyzing in specifically defined areas. This speeds up the analysis process whilst providing a more accurate assessment of the image.

The Blue Prism Cloud Optical Character Recognition (OCR) application is used in different data capture scenarios and supports up to 113 different languages.

The Blue Prism Cloud OCR environment consist of three main applications; a document template design application, a project management application and an operational (verification) application. Providing the underlying mechanism for the applications to operate, is a processing engine (like a Windows Service) that controls and manages the flow of documents from the ingestion of documents, through recognition and verification to the exporting of data.

The three applications that are used throughout the overall process and used during this training, are detailed below:

  • FlexiLayout Studio – used to identify/create the specific areas or regions within a scanned document requiring capture during the recognition process;
  • Project Setup Station – to establish the profile and configuration for the specific project;
  • Verification Station – used to verify documents or correct verification errors after they have been processed.

The image below illustrates the overall system structure process.

Projects are established to control the overall process. A configured project instructs the OCR applications where critical components are located and how data is processed.

The OCR applications operate by scanning what is referred to as a hot folder, this part of the service is referred to as ingestion. The ingestion process runs periodically to check whether any new documents have landed within the hot folder. Any documents found are grabbed by the OCR applications and sent for recognition. The OCR application analyzes and checks each document against a Document Definition template, which is created in the design application, FlexiLayout Studio.

There could be a single hot folder used to process all the images. In this way images would be added and the appropriate Document Definition template would be applied during the recognition stage according to information found. Alternatively, there could be multiple hot folders done on a project basis to segregate incoming work where no similarities exist.

The Document Definition template allows either the whole document to be scanned for required text and associated fields, or the template can be tailored to scan just specific areas for required text and data fields. By applying a restricted Document Definition template to a scanned document, a higher degree of confidence of data extraction can be obtained and the processing speed is improved.

The OCR applications then verifies the document and determines, using a level of confidence, whether the required information has been captured. If the verification has been performed with a 100% confidence, then the document is exported using an agreed format.

The level of confidence can be set with values available of 40%, 60% and 80%. The default value is 60% confidence, but it is recommended that this is set to 80% to give a higher level of accuracy, though it may create additional verification errors if the quality of the scanned image is poor.

Once a document reaches export, it creates a file within the export folder with an agreed format, for example XML. The file name output typically matches the file name which was specified when it was added to the hot folder to enable traceability.

Outside of the OCR applications is the Blue Prism Cloud Folder Watcher. This is an in-house created Windows Service which polls a folder (the export folder) and when a file is created it compiles the XML file and then sends the payload (the data) to the IADA Loader method (a function) which then adds it to a Queue for processing by a Digital Worker. This area is not covered within the OCR training course material.

If the verification was performed with characters captured with a lower level of confidence, then the document is sent to the Verification Station for manual checking/correcting. This application will enable an operator to verify the data items, characters and words, identified with a lower level of confidence. If the recognition was performed correctly the operator can simply accept the capture value. If it was performed incorrectly the operator can correct and then accept the changes. Once completed the operator can mark the document ready for export.

If the verification was performed and key fields (blocks) were not identified, then the document would be identified as “unknown”. At this stage the document can either be used for Training or Trashed, for example in the scenario where an erroneous document was inadvertently added into the hot folder it would be Trashed. In the Training scenario, the logic used to map the fields are then attributed to an existing Document Definition template to learn new variants of the scanned document type.