Infoworks 6.1.3
Prepare Data

Fuzzy Matching

Transformation designers can use a pre-built Fuzzy Matching transformation component to match and score similar data, by overcoming spelling, phonetic and other data quality issues.

NOTE The Fuzzy Match node is not supported for pipelines in snowflake environment.

Following are the steps to apply Fuzzy Match node in pipeline:

  1. Double-click the Fuzzy Match node. The properties page is displayed.
  2. Select the Input Port.
  3. Click Add Match Property, enter the required details and click Save.

Properties

FieldDescription
TypeA fuzzy node can have multiple match properties. Each match property can have multiple column mappings. The options include exact and fuzzy. Match property of type Exact performs inner join on the column mapping details provided.
Match Function

The fuzzy match function. The options include Soundex and Levenshtein.

Levenshtein returns a match score for the input and lookup tables. According to Levenshtein algorithm, lower the match score the better is the match.

The result dataset of Soundex contains only those values of the input and lookup column where the soundex algorithm returns true. Soundex accepts input port column and lookup port column only of type String.

Score Column NameThis field is displayed only if the match function selected is Levenshtein.
Threshold - HighThis field is displayed only if the match function selected is Levenshtein. Indicates the high threshold to filter out the data higher than this threshold value. The data which satisfies the condition of lesser than or equal to the threshold value will be displayed.
Input ColumnInput column that must be compared against lookup column.
Lookup Column

Lookup table column that must be compared with the given input column.

NOTE The Soundex algorithm accepts columns of string type only.

  Last updated by Prerana Dutta