Optical character recognition (OCR)

An OCR engine is available within Blue Prism for situations where it is not appropriate to use the native character recognition engine to interact with on-screen text. Commonly this will include scenarios such as where smoothed-text is enforced; or for interacting with scanned or otherwise-restricted copies of electronic documents.

The Blue Prism capability uses an embedded Tesseract OCR engine to recognise text using pattern matching and complex, language-based text recognition.

In order to maximise the effectiveness of the recognition a minimum of 300 dots-per-inch (dpi) is required. For images, such as on-screen text, where the dpi is lower than this, a Scale parameter will artificially increase the size of the captured region before passing it to the engine. Generally setting the scale factor to 4 or 5 will provide successful results.

The OCR engine is leveraged though a Read stage when used against a previously captured Application Modeller region and includes the options to read text, lists and grids. It is also possible to output the pre-worked images to a specific diagnostics location to allow verification that the scaling being applied is sufficient for the selected region.

Language packs

Language packs for use with Tesseract can be obtained from the internet. Blue Prism works with Tesseract version 4.0.0 and it is imperative that the correct major version of the language files are used with it. Currently, the version 4.0.0 language files can be downloaded from the Tesseract website.

To add support for another language, download the appropriate files and copy them to the Tesseract\tessdata folder (usually C:\Program Files\Blue Prism Limited\Blue Prism Automate\Tesseract\tessdata).

The language files are prefixed with a language code e.g fra (French), deu (German), jpn (Japanese), chi-tra (Traditional Chinese) etc. Once installed on each of the required devices, this code can be specified in the Language parameter of the "Read Text with OCR" action within a Read stage, to instruct the engine to use the required pack.

Page segmentation mode

The "Read Text with OCR" action within a Read stage has an optional text parameter Page Segmentation Mode, allowing a Tesseract-defined value to be specified. The values which can be entered in this parameter are shown below, along with a brief description of their action.

If no value is entered for the Page Segmentation Mode, then the default value of Auto will be used.

Parameter	Description
OSD	Orientation and script detection (OSD) only
AutoWithOSD	Automatic page segmentation with OSD.
AutoNoOCR	Automatic page segmentation, but no OSD, or OCR.
Auto	Fully automatic page segmentation, but no OSD. (Default)
Column	Assume a single column of text of variable sizes
VerticalBlock	Assume a single uniform block of vertically aligned text
Block	Assume a single uniform block of text
Line	Treat the image as a single text line
Word	Treat the image as a single word
CircledWord	Treat the image as a single word in a circle
Character	Treat the image as a single character
SparseText	Find as much text as possible in no particular order.
SparseTextWithOSD	Sparse text with OSD.
RawLine	Treat the image as a single text line, bypassing workarounds that are Tesseract-specific.

For further information on segmentation modes please consult the official documentation provided by Tesseract on their website.

	Click here to see the documentation for all Blue Prism products.
© 2024 Blue Prism Limited \| Last updated on 05 Feb 2024