Transformation designers can use a pre-built Fuzzy Matching transformation component to match and score similar data, by overcoming spelling, phonetic and other data quality issues. |
Following are the steps to apply Fuzzy Match node in pipeline:
Field | Description |
---|---|
Type | A fuzzy node can have multiple match properties. Each match property can have multiple column mappings. The options include exact and fuzzy. Match property of type Exact performs inner join on the column mapping details provided. |
Match Function | The fuzzy match function. The options include Soundex and Levenshtein. Levenshtein returns a match score for the input and lookup tables. According to Levenshtein algorithm, lower the match score the better is the match. The result dataset of Soundex contains only those values of the input and lookup column where the soundex algorithm returns true. Soundex accepts input port column and lookup port column only of type String. |
Score Column Name | This field is displayed only if the match function selected is Levenshtein. |
Threshold - High | This field is displayed only if the match function selected is Levenshtein. Indicates the high threshold to filter out the data higher than this threshold value. The data which satisfies the condition of lesser than or equal to the threshold value will be displayed. |
Input Column | Input column that must be compared against lookup column. |
Lookup Column | Lookup table column that must be compared with the given input column. |