Machine learning

Decipher uses two methods of learning:

  • Rule-based learning system – This is enabled by default, and can be used straight-away when processing documents in Decipher IDP. The rules are updated as you verify a document, by selecting fields from the pre-defined document form definition and associating them with regions in the document being processed. Learns specific words and positioning of data in a document – as applied to the document form definition.
  • Machine learning – This needs to be enabled in the Edit Document Type dialog (it is not visible in the Add Document Type dialog), and allows Decipher IDP to classify documents and extract data. Once the specified number of documents have been verified the model is ready for use, but will continue to learn as further documents are verified.

Machine learning offers a higher level of accuracy than the rule-based learning system. When processing a document, Decipher IDP applies the machine learning model first, before referencing the rule-based system where required.

Machine learning models are applied to document types, however, they are also applied to the DFD associated with the document type, this means that any document types that use the same DFD will also use the same machine learning model. Best practice recommends that each document type has its own DFD and consequently its own machine learning model. This prevents issues where changes to the DFD resets the machine leaning model, or the machine learning isn’t always relevant to the document type.

A model is assigned through the UI of the document type, but internally it is assigned to a DFD. So any document types which share the same DFD also share the ML model which is not obvious to the user.

Solution: The user needs to create a separate DFD for each document type for which they need a different ML model.

Almost all of the time one DFD is used by one Document type, and in that case, setting a model in the Document Type settings is more intuitive for the users. However, in those rare use cases where a DFD is shared between multiple Document types AND the user needs different ML models it be confusing.

Train machine learning models

1000 documents need to be verified to train a machine learning model. After the initial training, by default it will retrain after a further 1000 documents have been verified. This quantity can be amended in the Edit Document Type with any value between 50 and 5000. The training from the last 5000 verifications is retained in the machine learning model.

Changing the IDs of any fields in the document form definition will cause the training associated with them to be lost as the model knows the fields by ID. If new fields are added the model will only know about the previously existing fields but will gradually learn the new ones.

Use existing machine learning models

You can load a pre-trained machine learning model from the Batch Type. The trained model must be associated with the document form definition that was used when it was trained. You can export and import machine learning models – MLD files, see Edit Document Type for details.