Optical character recognition (OCR)

Blue Prism provides several OCR capabilities for on-screen text:

Native character recognition

Using surface automation

Where the Blue Prism Application Modeller cannot be used directly to identify application elements, a technique called surface automation can be used to capture an image of the application screen and map the location of key elements on it. Such applications can be modelled by using screen regions, image matching, and character recognition. This technique is useful for spying applications that are not running on the same machine as Blue Prism Enterprise, and where other spy modes are not available.

Native character recognition based on font matching is leveraged through the Recognise Text action in a Read stage when used against a previously captured Application Modeller region. This extracts text data from the region and stores it in a Data item. The input parameters for the Recognise Text action are font, foreground color, and background color.

Native character recognition requires the font to be generated before being used.For more details, see Fonts.

Using OCR Plus

OCR Plus provides enhanced character recognition with improved accuracy and robustness by:

  • Automatically identifying foreground and background colors.

  • Distinguishing between seemingly identical characters (such as the letter O and the number 0) and enabling disambiguation via regular expression (RegEx) patterns.

  • Improving the font matching algorithm.

This is leveraged through the Recognise Text (OCR Plus) action in a Read stage. The input parameters are font and optional RegEx. If no input parameters are specified, the system will still recognize the font and attempt to match the word as closely as possible. However, if there is any ambiguity, a default RegEx is used which accepts one of the following word patterns:

  • Uppercase then lowercase
  • Uppercase or only lowercase
  • Only numbers

Examples for typical RegEx expressions:

  • Number: “[0-9]+”
  • Uppercase then lowercase word: “[A-Z][a-z]*”
  • Uppercase and number string: “[0-9A-Z]+”

Both native character recognition and OCR Plus require the font to be generated before being used. The corresponding options for this are available in the Font Generator dialog, accessed from the System - Fonts screen, and in the Generate a Blue Prism Font dialog, accessed from the Blue Prism Region Editors screen. For more details, see Fonts.

Tesseract OCR

For situations where it is not appropriate to use the native character recognition engine to interact with on-screen text, for example, where smoothed-text is enforced or for interacting with scanned or otherwise-restricted copies of electronic documents, Blue Prism can make use of an embedded Tesseract OCR engine to recognize text using pattern matching and complex, language-based text recognition.

In order to maximize the effectiveness of the text recognition, a minimum of 300 dots-per-inch (dpi) is required. For images, such as on-screen text, where the dpi is lower than this, a Scale parameter will artificially increase the size of the captured region before passing it to the engine. Generally setting the scale factor to 4 or 5 will provide successful results.

The Tesseract OCR engine is leveraged though the Read Text with OCR action in a Read stage when used against a previously captured Application Modeller region and includes the options to read text, lists and grids. It is also possible to output the pre-worked images to a specific diagnostics location to allow verification that the scaling being applied is sufficient for the selected region.

Language packs

Language packs for use with Tesseract can be obtained from the internet. Blue Prism works with Tesseract version 4.0.0 and it is imperative that the correct major version of the language files are used with it. Currently, the version 4.0.0 language files can be downloaded from the Tesseract website.

To add support for another language, download the appropriate files and copy them to the Tesseract\tessdata folder (usually C:\Program Files\Blue Prism Limited\Blue Prism Automate\Tesseract\tessdata).

The language files are prefixed with a language code, for example, fra (French), deu (German), jpn (Japanese), chi-tra (Traditional Chinese). Once installed on each of the required devices, this code can be specified in the Language parameter of the Read Text with OCR action within a Read stage, to instruct the engine to use the required pack.

Page segmentation mode

The Read Text with OCR action within a Read stage has an optional text parameter Page Segmentation Mode, allowing a Tesseract-defined value to be specified. The values which can be entered in this parameter are shown below, along with a brief description of their action.

If no value is entered for the Page Segmentation Mode, then the default value of Auto will be used.

Parameter

Description

OSD

Orientation and script detection (OSD) only

AutoWithOSD

Automatic page segmentation with OSD.

AutoNoOCR

Automatic page segmentation, but no OSD, or OCR.

Auto

Fully automatic page segmentation, but no OSD. (Default)

Column

Assume a single column of text of variable sizes

VerticalBlock

Assume a single uniform block of vertically aligned text

Block

Assume a single uniform block of text

Line

Treat the image as a single text line

Word

Treat the image as a single word

CircledWord

Treat the image as a single word in a circle

Character

Treat the image as a single character

SparseText

Find as much text as possible in no particular order.

SparseTextWithOSD

Sparse text with OSD.

RawLine

Treat the image as a single text line, bypassing workarounds that are Tesseract-specific.

For further information on segmentation modes please consult the official documentation provided by Tesseract on their website.