Data enrichments

Data enrichments are used to enhance data prior to data export. Users should refrain from implementing data enrichments based on unsupported assumptions about what will be written on the form or transformations that can lead to misunderstandings later on due to unexplained differences between the final digitized value and the original paper copy. It is best not to destroy the original data too early.

Types of data enrichments

What is a transformation?

A transformation is a formatter that can be applied to digitized value to modify the final output returned to the user by cleaning up or parsing data.

What is a validation?

Validations do not alter the digitized value; rather, they flag any fields that do not pass the validation in the REVIEW Portal. Validations are therefore only relevant if the REVIEW Portal is used.

Default settings and additional enrichments

  • READ does not take into account case, hence it always returns results in lowercase letters.
  • Punctuation and special characters are common characters that are currently challenging to digitize. Examples include periods (.), commas (,), slashes (/), and dashes (-).
  • The original output string is returned if the transform produces a non-data value (such as —blank—).
  • Fields with less than 85% confidence will, by default, be flagged for review if REVIEW is enabled for the workflow.

The following enrichments can be applied alongside the transformations andamp; validations:

  • Capitalization (text fields only): Lower Case (default), Title Case, Upper Case.
  • Confidence Thresholds (REVIEW Workflows Only): The confidence threshold can be adjusted at the account-level to fit the needs of the workflow. Similarly, the confidence threshold can be customized on a field-level for select fields critical to the workflow.

CSV column descriptions

Transformations are applied by field id, field name, and template. Any edits in the listed columns above will not update the fields or templates on the platform. It will only disassociate the field from the template and any changes to the other data enrichment columns will not be applied onto the field.

Column name

Description

schema_field_id

This is a system generated field id.

field_name

This is the name of the field when created in the Field Library.

field_type

This is the data type. You will find under this column:

  1. Text field includes all text fields, signature andamp; presence of text fields
  2. Radio Buttons are all "Select One" fields
  3. Checkboxes are all "Select Many" fields

field_constraints

These are the multiple choice selection values assigned when the field was created.

template

This is the template name the field appears on.

required

False is the default value in the column. True in a required column will flag the field in REVIEW if –blank– is returned.

threshold

If left blank, the default confidence threshold is 85%. If the confidence threshold should be different other than the default, fill in the cell with 0.##.

Confidence thresholds should only be changed after the field is monitored for at least a month. If the confidence threshold is decreased, less lower confidence fields are expected to be flagged in REVIEW. If the confidence threshold is increased, more fields will be flagged in REVIEW.

case

Fill in with Upper, Lower, or Title to change the casing of the returned output.

transform_type

Fill in with the data enrichment type: transformation, validation or parser.

transform_task

Fill in with the data enrichment task. List of transformation tasks.

transform_params

Fill in with regex customizations in JSON format.

List of transformation tasks

Transformation

Task Description

Alpha only

Remove non-alphabetic characters and spaces from the original output.

Alphanumeric only

Remove any non-alphanumeric characters. If the string is one of the following: "--impossible--", "--inactive--", "--missing--", then it returns that value. But if the string is empty or "--blank--", then it returns "--blank--".

Numbers only

The Numbers Only transform will remove all non-numeric characters.

Phone number format

Reduce the original output to a 10 digit value. The only exception is if it is 11 digits and the first digit is 1 (US code), in which case it drops the leading 1 and takes the final 10 digits. It will not validate whether it is a real phone number, only that the original output has 10 digits only. If the original output does not result in a 10 digit value, "--blank--" will be returned by default.

SSN transform

The SSN transform will reduce the original output to a 9 digit value. If a valid 9-digit SSN cannot be determined,"--blank--" will be returned by default.

Date formatting

The Date Formatting transform will find a valid date, modify the format and return a MM/DD/YYYY value. If a valid date cannot be determined,"--impossible--" will be returned by default.

Advanced amounts

The Advance Amounts transform will modify the number value that appears in the original output into a consistent decimal value. This transform is able to accommodate negative values. The precision parameter can be used to set the number of decimal places that will be present. It defaults to the normal value of 2 for dealing with monetary amounts.

Spaces or other punctuation prior to the final two digits will always be interpreted as indicating a break between dollars and cents. A dash, or dashes, after the final digit is interpreted as 00 cents.

Transform percentage

The Percentage transform is meant to handle fractions that may be found in percentage fields. It will remove any non-numeric characters such as the % sign and ensure that there is a space between the whole number and fraction (ex. 331/3% will be broken into 33 1/3).

Zip formatter

The Zip Formatter transform will truncate the number values in the original output into a zip format. If the original output does not result in a 5 or 9 digit value, "--impossible--" will be returned by default. The 5 digit zip code (xxxxx) will be returned by default.

US state formatter

The US State Formatter transform will convert the medium text generated US state fields into the appropriate two letter abbreviation. Any non-data value will be returned as itself (blanks, impossibles, inactive, missing, etc.). If there is data present and it cannot be transformed into a valid two letter US state abbreviation, "--impossible--" will be returned by default.

Find and replace

The Find and Replace Transform will identify up to two text values in the original output and replace it with a respective value.

The order of the new values matters since the transform will go from left to right and replace all instances of the "find" value with the "replacement" value. If the transform results in an empty string, then "--blank--" will be returned.

List of parsers

Parser task

Description

Basic name parser

The basic name parser will break down the original output of a full name field into its respective full name component fields. The full name field can be broken into the following component fields: "_title", "_first", "_middle", "_middle_initial", "_last", "_suffix".
.

The expected format of the full name output is "First Middle Last Suffix" OR "Last [Suffix], First Middle" ordering. If the "Last [Suffix], First Middle" format is expected to appear in the original output without the comma, please specify in the notes column so the parser can accommodate the format.

Basic address parser

The basic address parser will break down the original output of a full address field into its respective full address component fields. The full address field can be broken into the following component fields: "_street", "_unit", "_city", "_state", "_zipcode"
.

List of validation

Validation Task Description
Validate no more than one The Validate no more than one validation will flag the field if more than one multiple choice value is returned in the original output. It should be applied on Select Many fields where Select One functionality is needed.
Validate email The Validate email validation will flag the field if the email in the original output contains any unacceptable characters or spaces.

Validate character count

The Validate character count validation will flag the final value if it fails the character count comparison.

If you set exact, it will validate that the provided string is exactly that many characters long. It will ignore the other parameters if exact is provided.

If you provide both min and max, it will validate that the provided string is between min and max characters long (inclusive).

If you provide solely min, it will validate the provided string is at least min characters long.

If you provide solely max, it will validate the provided string is no more than max characters long.

If you have additional data formatting requirements, please contact an SS&C Chorus team member.