Synchronizing Data to External Target

You can configure the Target connections and sync data as described in the following sections.


	Configure a Target -> Sync Data to Target

Configure a Target

The following are the steps to configure a Target.

From the Data Sources menu, select one of the tables and click View Source/Ingest button.
Select the source table to be synchronized to Target.
Click on the configuration link and click on the Sync to ExternalTarget tab.
Configure the parameters, by selecting one of the following options available in the Sync Type drop-down menu.

Sync Type	Description
Disabled	Default Value.
Overwrite	Drops and recreates the target.
Merge	Merges data to the existing table based on the natural key.
Append	Appends data to the existing target.

NOTE If the Sync type is set to Append and the table that is configured is selected for Incremental Merge mode of ingestion, then the exported table does not include any duplicates.

If the table is configured for Incremental Merge mode of ingestion and selected for Merge mode of Sync to Target, then the exported table can include random duplicate data (with or without duplicates).

However, for the delete operation during ingestion, the records will still be available in the exported table.

Step 5: Select one of the Target types and enter all the mandatory fields as provided under the following sections:

Configuring BigQuery Target

Configuring Azure Synapse Analytics Target

Configuring Snowflake Target

Configuring Cosmos DB Target

Configuring Delimited File Target

Configuring Postgres Target

Configuring Teradata Target

Step 6: Click Save to save the configuration settings.

Configuring BigQuery Target

Following are the BigQuery configuration details:

Field	Description
Sync Type	The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key.
Target Type	The type of connection. Select BigQuery.
Load Job Configuration
Export Type	You can select the export type from the dropdown: BQ Load or Spark. Default value: Spark.
Source File Format	This field denotes the source file type which has to be exported. NOTE It is present only for a metadata source.
Target Cloud Type	This field indicates the type of cloud data will be exported to.
Max Size Per Load Job	Maximum size of files in bytes per load job. It must be a whole number. If the maximum size of files is given as m and the total file size is n, where n >= m, then the internal n/m BQ load jobs will run. If max size per load job is lesser than the file size, then that specific file will be skipped. Default value: It is 10000000000000 or approximately 9TB, and the field is editable.
NOTE For the BQ Load, schema sync is not supported and the option to presist data on gcs is not available. Also, Complex types are not supported in the BQ Load.
Connection Parameters
Custom	Use custom Target Data Connection by entering connection details for the current target type.
Target Data Connection	Use one of the existing Target Data connections for the current target type.
Project ID	The ID of the project in the source BigQuery.
Authentication Mechanism	The mechanism using which the security information is stored. The options include System and Service Account Credential File. If the Authentication Mechanism selected is Service Account Credential File, you can upload the file or enter the file location. NOTE You can upload only one file.
GCS Temporary Storage Bucket	The path to the Google Cloud Storage folder accessible by Infoworks. This is used as temporary storage during load to BigQuery. Ensure that the service account has read/write access to this storage path.
Target Configurations
Dataset Name	Dataset name of the BigQuery target.
Create a dataset if it does not exist	Enable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a dataset in BigQuery target.
Table Name	Table name of the BigQuery target.
Natural Keys	The required natural keys for the BigQuery target.
Is Existing Table	When enabled, existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks LIMITATION Any configuration change (schema or partition change) would result in job failure once the export is done successfully.
Partition Type	The column name for the partition. The options include BigQuery Load Time, Date Column, Timestamp Column, and Integer Column.
Partition Type - > Big Query Load time	Includes the options for BigQuery Load. For example, Day or Hour.
Partition Type -> Date Column -> Partition Column	You can select the Partition Column as last_modified (date).
Partition Type -> Timestamp Column -> Partition Time Interval	This field is visible when the Partition type is BigQuery Load Time, Date Column, or Timestamp Column NOTE For Date column___ _** _ _ _ _ _ , the available options are**Day ___, Month , and Year . For Timestamp column , the available options are __Hour,Day,Month, andYear. _For BigQuery Load Time _ _ _ _ _ , the available options are _Hour, Day, Month, and Year.
Partition Type -> Integer Column	Includes the following fields with the options set to custid (int), postal code (int), or gender (int). Includes the options Start/End/Range.
Clustering Columns (upto 4 columns allowed)	The columns to be used for clustering. You can select up to four columns.
Persist Staging Data in GCS	The option to indicate whether the temporary staging data must be persisted in GCS. The default option is to not persist data.
Sync Table Schema	The option to synchronize exported table schema with source table schema. NOTE Any new columns needs to be manually selected and added as part of the column list.
BigQuery Labels	This label enables you to add labels to your tables in BigQuery. For information about BigQuery labels, refer to BigQuery Labels.
Columns
Exclude Columns	Select the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include Columns	Select the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Big Query, see the section Sync Data to Target.

Configuring Azure Synapse Analytics Target

Following are the Azure Synapse Analytics Target configuration details:

Field	Description
Sync Type	The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key.
Target Type	The type of connection. Select Azure Synapse Analytics.
Connection Parameters
Custom	Use custom Target Data Connection by entering connection details for the current target type.
Target Data Connection	Use one of the existing Target Data connections for the current target type.
Server Name	The server name of the Azure Synapse Analytics.
User Name	The user name of the Azure Synapse Analytics.
Authentication Type for Password	Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Password. If you select External Secret Store, then select the Secret which contains the password.
Warehouse	The warehouse name of the Azure Synapse Analytics.
Additional Params	The optional JDBC parameters.
Target Configurations
Schema Name	Schema name of the Azure Synapse Analytics target.
Create a schema if it does not exist	Enable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Azure Synapse Analytics Target.
Table Name	Table name of the Azure Synapse Analytics target.
Indexing Type	The indexing options include clustered columnstore index, clustered index and nonclustered index, and a non-index option - heap. Note: Clustered Column Store and Non-Clustered indexes are not supported for tables containing column(s) of STRING data type, and Columns with STRING data type and TIMESTAMP datatype can not be used as a partitioning column.(https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver15)
Distribution Type	Distribution methods for designing distributed tables. The options include Hash, Replication and Round Robin.
Natural Keys	The required natural keys for the Azure Synapse Analytics target.
Partition Column Name	The column name for partition.
Sync Table Schema	The option to synchronize exported table schema with source table schema. NOTE Any new columns needs to be manually selected and added as part of the column list.
Columns
Exclude Columns	Select the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include Columns	Select the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Azure Synapse Analytics, see the section Sync Data to Target.

Configuring Snowflake Target

Following are the snowflake configuration details:

Field	Description
Sync Type	The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key.
Target Type	The type of connection. Select Snowflake.
Connection Parameters
Custom	Use custom Target Data Connection by entering connection details for the current target type.
Target Data Connection	Use one of the existing Target Data connections for the current target type.
Connection URL	The URL to connect to Snowflake which specifies the hostname for the account in the format, `<account_name>.snowflakecomputing.com`
Account Name	The Snowflake account name. The full account name might include additional segments that identify the region and cloud platform where the account is hosted. For example, xy12345 for US West (Oregon), xy12345.us-east-2.aws for US East (Ohio), and xy12345.us-east-1 US East (N. Virginia).
User Name	Username of the snowflake account. For example, mark@example.com.
Authentication Type for Password	Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Password. If you select External Secret Store, then select the Secret which contains the password.
Warehouse	Snowflake warehouse name. For example, sales.
Additional Params	Optional JDBC parameters.
Target Configurations
Database Name	Database name of the snowflake target.
Create a database if it does not exist	Enable this option to create a new database with the name provided above. Ensure that the user has sufficient privileges to create a databse in Snowflake target.
Schema Name	Schema name of the snowflake target.
Create a schema if it does not exist	Enable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Snowflake target.
Table Name	Table name of the snowflake target.
Natural Keys	The required natural keys for the snowflake target.
Query Tags	A string that is added to the Snowflake query and can be accessed via Query history in Snowflake.
Sync Table Schema	The option to synchronize exported table schema with source table schema. NOTE Any new columns needs to be manually selected and added as part of the column list.
Columns
Exclude Columns	Select the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include Columns	Select the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Snowflake, see the section Sync Data to Target.

LIMITATION To run Sync type as Append or Merge mode with Snowflake target, you must ensure that a table (with same database name, schema name as configured in sync to target configuration) has to exist on Snowflake target. If the table is not available on the Snowflake target, then you must manually create it.

Configuring Cosmos DB Target

Following are the Cosmos DB configuration details:

Field	Description
Sync Type	The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key.
Target Type	The type of connection. Select Cosmos DB.
Connection Parameters
Custom	Use custom Target Data Connection by entering connection details for the current target type.
Target Data Connection	Use one of the existing Target Data connections for the current target type.
End Point	The URL for the Cosmos DB target.
Authentication type Master Key	The master key token is the all access key token that allows users to have full control of Cosmos DB resources in a particular account. Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Master Key. If you select External Secret Store, then select the Secret which contains the master Key.
Target Configurations
Database Name	Database name of the Cosmos DB target.
Collection Name	Collection to which the documents must to be written.
Is Existing Collection	Enable this option if the documents must be appended/merged to the existing/provided collection. If unchecked, a new collection will be created.
Provision Throughput (400 - 10000)	Provision throughput for the container created. The default value is 400.
Partition Key	The partition key on the Cosmos DB collection to be created. Partition key must be in path format for the JSON document which is to be written to Cosmos DB, for example, /employee/employee_age.
Unique Keys	A unique key policy is created when an Azure Cosmos container is created. Unique keys ensure that one or more values within a logical partition is unique. Unique key must be in the path format for the JSON document which is to be written to the Cosmos DB, for example, /employee/employee_age. Click Add to add the unique keys or click Remove to remove the unique key.
Sync Table Schema	The option to synchronize exported table schema with source table schema. NOTE Any new columns needs to be manually selected and added as part of the column list.
Columns
Exclude Columns	Select the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include Columns	Select the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Cosmos DB, see the section Sync Data to Target.

NOTE For Dataproc 1.4 (Spark 2x and Scala 2.11 version), you must set internalIpOnly to true in the {INFOWORKS_HOME}/conf/dataproc_defaults.json file, to successfully run Sync to target with Cosmos DB.

LIMITATIONS The following are some of the limitations for Sync to target jobs with Cosmos DB target.

Export to Cosmos DB target is not supported on Spark 2x and Scala 2.12 versions.
The column name "id" is mandatory in the data to be written while running jobs on Spark 3.x and Scala 2.12 versions. Else, the CDC merge jobs writing data to Cosmos DB will fail.

Configuring Delimited File Target

Following are the Delimited File configuration details:

Field	Description
Sync Type	The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key.
Target Type	The type of connection. Select Delimited File.
Connection Parameters
Environment	Select the environment where you want to run the Sync to Target job.
Storage	Select the storage where the exported data will reside.
Target Path	Select the path for the stored data.
Target Configurations
Delimiter Character	The field separator or delimiter to separate out the fields in the file.
Escape Character	The character used in the delimited files to escape occurrences of Column Separator and Column Enclosed By in data. All the files in the mentioned source path must have the same value. This value must be a character. The default value is " \ ".
Null Values	Enter the values in the null values text box that are to be replaced in the null values in the final csv file.
Additional Params	Includes the additional parameters.
Partition Column	Is the column to partition data with.
Enable Text Enclosing	Encloses quotes for the fields that are exported.
Add Header Row	Adds the header row for the CSV file that is exported.
Sync Table Schema	The option to synchronize exported table schema with source table schema. NOTE Any new columns needs to be manually selected and added as part of the column list.
Columns
Exclude Columns	Select the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include Columns	Select the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Delimited File, see the section Sync Data to Target.

LIMITATION The complex data types are not supported for Sync Data to Target on Delimited files.

Configuring Postgres Target

Following are the Postgres configuration details:

Field	Description
Sync Type	The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key.
Target Type	The type of connection. Select Postgres.
Connection Parameters
Custom	Use custom Target Data Connection by entering connection details for the current target type.
Target Data Connection	Use one of the existing Target Data connections for the current target type.
JDBC URL	The URL that the JDBC uses to connect to the database.
Username	The username for the connection to the database.
Authentication Type for Password	Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Password. If you select External Secret Store, then select the Secret which contains the password.
Additional Params	The optional JDBC parameters.
Target Configurations
Schema Name	Schema name of the Postgres target.
Create a schema if it does not exist	Enable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Postgres.
Table Name	Table name of the Postgres target.
Natural Keys	The required natural keys for the Postgres target.
Is Existing Table	When enabled, existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks.
Partition Type	The type of partition for the table. By default, the Partition Type is set to None. You can also select the following partition types: List: To partition a table by a list of known values. Range: To partition a table by a range of values. Hash: To partition a table using a hash function on the partition key.
Partition Column	The column used to partition the data in the target.
Child Partition Table	The name of the child partition table. This field is displayed for the list, range, and hash partition types. You can add multiple child partition tables. While creating the child partition tables, make sure that of each row of source dataset is mapped to anyone of the child partition. If a row does not have a matching child partition, it will fail.
Is Default	Select this check box to use the default partitioning for the child partition. Default partition is supported only for list partition.
Partition Type -> List -> Values	Enter the value based on which child partition is created.
Partition Type -> Range -> Min	The minimum value for the range.
Partition Type -> Range -> Max	The maximum value for the range.
Partition Type -> Hash -> Partition Modulus	The partition modulus for the Hash Partition type.
Partition Type -> Hash -> Remainder	The remainder associated with the child partition.
Remove	Click Remove to remove the child partition table.
Add	Click Add to add a new Child Partition table.
Index Type	The indexing option. By default, the Index Type is set to None. You can also select the following Index types: Hash: Can be used where equal comparison is more important. You can only look up for data that matches exactly. BTree: Is used for the data that can be sorted and can handle equality and range queries. GIST: This option is used when the data to be indexed is more complex. SP-GIST: This option is used when the data can be grouped into non-overlapping groupings. GIN: This option is used when indexing data that consists of multiple elements in a single column such as arrays, json documents (jsonb) or text search documents (tsvector). BRIN: This option is used for large size tables that have columns with some natural sort order. Note:GIST, SP-GIST, GIN indexes are not supported with primitive datatypes. Following datatypes can be used with these indexes: GIST, SP-GIST : TSTZRANGE GIN : array
Index Column	The Index column for the target table to be created.
Is Unique Index	Enable this option to define a unique constraint on the btree index. This field is available only for BTree type indexing. Note: Unique constraints on partitioned tables must include all the partition key columns.
Sync Table Schema	The option to synchronize exported table schema with source table schema. NOTE Any new columns needs to be manually selected and added as part of the column list.
Columns
Exclude Columns	Select the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include Columns	Select the appropriate column name and click Include Columns to include columns in the target table.

LIMITATIONS

For Postgres, the master-slave selection during failover cannot be automated. It has to be set-up manually.
Airflow does not allow configurations to select where the reads / writes should be directed to.

To sync data to Postgres, see the section Sync Data to Target.

Configuring Teradata Target


Field	Description
Sync Type	The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key.
Target	The type of connection. Select Teradata.
Connection Parameters
Custom	Use custom Target Data Connection by entering connection details for the current target type.
Target Data Connection	Use one of the existing Target Data connections for the current target type.
Server Name	The server name of the Teradata.
User Name	The user name of the Teradata.
Authentication Type for Password	Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Password. If you select External Secret Store, then select the Secret which contains the password.
Additional Params	The optional JDBC parameters.
Target Configurations
Database Name	Database name of the Teradata target.
Natural Keys	The required natural keys for the Teradata target.
Target Table Type	This option classifies tables as SET or MULTISET tables based on how the duplicate records are handled. SET tables do not allow duplicate rows to be inserted, and it is not allowed to create a duplicate row with an UPDATE statement. MULTISET tables do not have these restrictions.
Partition Type	The type of partition for the table. By default, the Partition Type is set to None. You can also select Range_N partition type. NOTE For other partition types, user can choose to create the table with desired partitions in Teradata, and use the Is Existing Table check box for exporting data.
Partition Column	The column to be used to partition tables in target. The column needs to be of Date data type.
Partition Start	The start value for the Range_N partition.
Partition End	The end value for the Range_N partition.
Partition Interval Value	The interval value for the Range_N partition.
Partition Interval Period	The interval period for the Range_N partition.
Indexing Columns	The Primary Index column(s) for the target table to be created.
Unique Index	Enable this option to define a unique constraint on the index.
Sync table Schema	The option to synchronize exported table schema with source table schema. NOTE Any new columns needs to be manually selected and added as part of the column list.
Columns
Exclude Columns	Select the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include Columns	Select the appropriate column name and click Include Columns to include columns in the target table.

NOTE

Indexing and Partition columns must be included in natural keys for Merge sync type.
We use TERADATA FASTLOAD by default, which does not support duplicate records. If you want to load duplicate records, set TYPE=DEFAULT in Additional Params.
(Applicable only for Fastload) If table export fails and you receive, "Details of the failure can be found in the exception chain that is accessible with getNextException", there are two ways to handle it. You can follow either of them.
- Set TYPE=DEFAULT in Additional Params under Connection Parameters.
- Set advance configuration with key set to export_datawriter_conf and value set to repartition_for_td_export=true.

To sync data to Postgres, see the section Sync Data to Target.

Sync Data to Target

After the Target is configured, perform the following steps to sync data to target.

Step 1: From the Data Sources menu, select one of the tables, and click View Source button.

Step 2: Select the source table to be synchronized to Target.

Step 3: Click the Sync Data to Target button.

Step 4: Enter the mandatory fields as listed in the table below:

Field	Description
Job Name	The name of the ingestion job.
Max Parallel Tables	The maximum number of tables that can be crawled at a given instance.
Compute Cluster	The template based on which the cluster will spin up for each table.The compute clusters created by admin and are accessible by the user are listed in the drop down.
Overwrite Worker Count	The option to override the maximum and minimum number of worker node values as configured in the compute template

Number of Worker Nodes	The number of worker nodes that will spin up in the cluster.
Save as a Table Group	The option to save the list of tables as a table group.

Step 5: Click Sync Data to Target button.

Last updated on Mar 27, 2023

Was this page helpful?