Document types

Document types are a category of document, such as invoices, purchase orders, or loan applications. The Document Types page enables you to create new document types and associate each one with a document form definition. Multiple document types can then be associated with a batch type – enabling you to process more than one document type in the same batch.

You can activate machine learning by editing an existing document type and associating it with a machine learning model.

Machine learning models are applied to document types, however, they are also applied to the DFD associated with the document type, this means that any document types that use the same DFD will also use the same machine learning model. Best practice recommends that each document type has its own DFD and consequently its own machine learning model. This prevents issues where changes to the DFD resets the machine leaning model, or the machine learning isn’t always relevant to the document type.

To manage document types, click Admin Panel > Document Types.

Three options are available to manage document types:

  1. Create – click Add document type and enter details for the document type.
  2. Edit – click the edit button to update a document type and activate machine learning.
  3. Delete – click the delete button to remove the document type from the database.

Document type details

Document types use the fields listed below.

The machine learning options are only visible in the Edit Document Type dialog and not the Add Document Type dialog.

Type name

The name of your document type. For example Invoice.

Document form definition

Select a Document Form Definition from the drop-down.

Type description

Enter an overview of the document type.

Classification confidence threshold

The confidence threshold for bypassing Class Verify. If bypassing Class Verify has been enabled and all the documents in a batch have been classified with high confidence, the batch will go directly to the next step in the workflow, without an operator having to manually verify the classification.

Machine Learning

The default setting is Off. When machine learning is switched on, the machine learning options display.

Regardless of this setting, machine learning training is disabled by default in the SsiDataCaptureClient.exe.config file. See Enable machine learning training for details.

ML Model

Select the machine learning model that you want to associate with the selected document form definition, or click the Create new model link below the field to create a new machine learning model.

You can also use this dialog to upload an existing machine learning model – MLD file. However, this must relate directly to the selected document form definition, otherwise the learning in the model will automatically reset.

After you have created a new model, you will need to select it from the ML Model drop-down.

Because only one machine learning model can be associated with a DFD, but a single DFD can be associated with multiple document types, a warning message will display if you attempt to apply a machine learning model that is different to the model already applied to the DFD through a different document type. If you accept the warning message and continue to update the document type, the machine learning model will be automatically updated in the other document type(s) that are associated with the same DFD.

Capture Mode

The default setting for normal Decipher IDP OCR functionality is Structured. Select Unstructured if you are using the NLP plugin for completely unstructured documents.

If you change a structured machine learning model to unstructured and update the document type, any existing learning will be removed.

Training Size

This is the number of documents to use when training a machine learning model. The default is 1,000. However, if you are training an NLP model, fewer documents will be need to produce good results.

If you mark the model for immediate training and fewer that the specified number of documents have been processed, documents that have previously been used for training will be added to the new documents to reach the specified figure.

Periodic training

Periodic training of the machine learning model is disabled by default. You can also choose to train the model after a specified number of documents have been processed. The predefined setting is to retrain the model after each 1,000 new documents have been processed.

Train now

The Last training field displays the date the selected machine learning model was last trained. The New documents available field displays the number of documents that have been processed using this document type since the machine learning model was last trained.

Select Mark for training if you want to train the machine learning model now. This training will be processing in the background. You will need to unselect this check box after the training has been carried out to avoid continuous re-training of the model.

Invisible document type

This option is not currently used by Decipher IDP.

Attachment document type

Select this option to mark this document type as an attachment.

An attachment is one or more pages that are part of the document but we don’t need any data extracted from them. You can mark a page as an attachment and Decipher IDP can also automatically detect attachments.

After the software detects this document type, during classification, it automatically attaches the document to the preceding document.

Watch a video about attachment documents.

Primary recognition language

The primary language to use when processing documents using this document type. If configured, this setting will override the equivalent batch type setting.

Primary locale

The primary locale is used to validate dates and parse addresses during processing using this document type. If configured, this setting will override the equivalent batch type setting.

Secondary recognition language

An additional language to be used for processing this document type. If configured, this setting will override the equivalent batch type setting.

Secondary locale

An additional locale for this document type. If configured, this setting will override the equivalent batch type setting.

Exception reasons

Add exception reasons and descriptions for use during data verification.

Videos

Attachment documents

This video describes how to use attachment documents and how they are handled by Decipher IDP.