Performing In NotIn Operation


	The In NotIn node has two inputs: inner table and outer table. The outer table is the table to be filtered. Columns of only this node will be propagated downstream. The inner table is the filtering table.

A column of the outer table (outer column), will be matched against a column of the inner table (inner column) to filter records of the outer table.

Selecting the IN filter type returns only those records from the outer table where the outer column value exists in the list of values in inner column. Selecting Not IN filter type returns only those records from the outer table where the outer column value does not exist in the list of values in inner column.

Following are the steps to apply In NotIn node in pipeline:

Double-click the In NotIn node. The properties page is displayed.
Click Add Filter, enter the required details and click Save.

To allow multiple columns and data modifying expressions for inner and outer ports, select the Advanced mode, enter the expression and click Save.

NOTE This feature is supported on Spark, Snowflake and BigQuery execution engines. SQL import of the query is also supported.

NOTES

The NOT-IN transformation produces an empty result, if the inner-column is nullable. This is because NOT IN is equivalent to ALL. If the inner column has a null, its comparison with any outer column value always produces null. Work-around: To obtain the required result, remove NULL values from inner-column by inserting a filter node (IS NOT NULL) between inner-table node and IN-NOTIN node.
Even after renaming a column in the Schema section, the original column name will be displayed in the Properties. However, the renamed column will be propagated in the downstream nodes.

Last updated on Sep 26, 2022

Was this page helpful?