Training models overview

Decipher uses training models to learn, and gradually improve, document processing. This topic provides an overview of the different types of training available in Decipher:

Rules-based training (default)

Each document uploaded to Decipher trains the rules-based model, and once verified, feeds into an overall training pool. This model is often accurate enough for structured documents without enabling the additional machine learning functionality. This training is based on the document layout, and is not directly connected to the DFD or document type. Training from similar documents layouts is combined where Decipher has matched 60% of the text fields. This percentage can be modified using the TemplateMinMatchPercent miscellaneous parameter. For more details, see Miscellaneous parameters.

The rules-based training captures data from the following elements, referred to as hints, defined in the DFD:

  • Keywords

  • Data types

  • Lists

  • Regex

  • Formula

  • Location (after training)

With the exception of location data, this information is available in the DFD and is used in combination with the training data (where it exists). Location data is captured after the first document has been trained.

Document classification

Document classification is carried out by uploading a group of documents for a document type, which will inform how Decipher separates batches of documents when required. This is only required when you have more than one document type selected in your batch type. It will ensure the correct document form definition (DFD) is assigned, and the requested data is extracted.

For more details, see Training classification models.

Structured machine learning

Enabling this model in a document type will add an extra layer of document-specific learning for structured documents, supplementing the rules-based training for increased success. The scale of the improvement is dependent on the level of success in the original rules-based training.

The default training size is 1,000 documents, but this can be set to between 50 and 5,000. The model can be set to train automatically, or periodically after a configured number of documents. For more details, see Machine learning .

Changing the IDs of any fields will cause the training associated with them to be lost, as the model identifies the fields by the ID. If new fields are added, the model will only have existing knowledge of the previous fields, but will gradually learn the new elements.