Infoworks 5.5.0
Onboard Data

Synchronizing Data to External Target

You can configure the Target connections and sync data as described in the following sections.

Configure a Target -> Sync Data to Target

Configure a Target

The following are the steps to configure a Target.

  1. From the Data Sources menu, select one of the tables and click View Source/Ingest button.
  2. Select the source table to be synchronized to Target.
  3. Click on the configuration link and click on the Sync to ExternalTarget tab.
  4. Configure the parameters, by selecting one of the following options available in the Sync Type drop-down menu.
Sync TypeDescription
DisabledDefault Value.
OverwriteDrops and recreates the target.
MergeMerges data to the existing table based on the natural key.
AppendAppends data to the existing target.

NOTE If the Sync type is set to Append and the table that is configured is selected for Incremental Merge mode of ingestion, then the exported table does not include any duplicates.

If the table is configured for Incremental Merge mode of ingestion and selected for Merge mode of Sync to Target, then the exported table can include random duplicate data (with or without duplicates).

However, for the delete operation during ingestion, the records will still be available in the exported table.

Step 5: Select one of the Target types and enter all the mandatory fields as provided under the following sections:

Configuring BigQuery Target

Configuring Azure Synapse Analytics Target

Configuring Snowflake Target

Configuring Cosmos DB Target

Configuring Delimited File Target

Configuring Postgres Target

Configuring Teradata Target

Step 6: Click Save to save the configuration settings.

Configuring BigQuery Target

Following are the BigQuery configuration details:

FieldDescription
Sync Type

The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the options to enable it.

Overwrite: Drops and recreates the target.

Append: Appends data to the existing target.

Merge: Merges data to the existing table based on the natural key.

Target TypeThe type of connection. Select BigQuery.
Load Job Configuration
Export Type

You can select the export type from the dropdown: BQ Load or Spark.

Default value: Spark.

Source File Format

This field denotes the source file type which has to be exported.

NOTE It is present only for a metadata source.

Target Cloud TypeThis field indicates the type of cloud data will be exported to.
Max Size Per Load Job

Maximum size of files in bytes per load job. It must be a whole number.

If the maximum size of files is given as m and the total file size is n, where n >= m, then the internal n/m BQ load jobs will run. If max size per load job is lesser than the file size, then that specific file will be skipped.

Default value: It is 10000000000000 or approximately 9TB, and the field is editable.

NOTE For the BQ Load, schema sync is not supported and the option to presist data on gcs is not available. Also, Complex types are not supported in the BQ Load.
Connection Parameters
CustomUse custom Target Data Connection by entering connection details for the current target type.
Target Data ConnectionUse one of the existing Target Data connections for the current target type.
Project IDThe ID of the project in the source BigQuery.
Authentication Mechanism

The mechanism using which the security information is stored. The options include System and Service Account Credential File. If the Authentication Mechanism selected is Service Account Credential File, you can upload the file or enter the file location.

NOTE You can upload only one file.

GCS Temporary Storage BucketThe path to the Google Cloud Storage folder accessible by Infoworks. This is used as temporary storage during load to BigQuery. Ensure that the service account has read/write access to this storage path.
Target Configurations
Dataset NameDataset name of the BigQuery target.
Create a dataset if it does not existEnable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a dataset in BigQuery target.
Table NameTable name of the BigQuery target.
Natural KeysThe required natural keys for the BigQuery target.
Is Existing Table

When enabled, existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks

LIMITATION Any configuration change (schema or partition change) would result in job failure once the export is done successfully.

Partition TypeThe column name for the partition. The options include BigQuery Load Time, Date Column, Timestamp Column, and Integer Column.
Partition Type - > Big Query Load timeIncludes the options for BigQuery Load. For example, Day or Hour.
Partition Type -> Date Column -> Partition ColumnYou can select the Partition Column as last_modified (date).
Partition Type -> Timestamp Column -> Partition Time Interval

This field is visible when the Partition type is BigQuery Load Time, Date Column, or Timestamp Column

NOTE For Date column___ _**** _ **_ _ _ _ **, the available options areDay ___,** Month , and Year . For **Timestamp column , the available options are __Hour,Day,Month, andYear. _For** BigQuery Load Time _** _** **_ _ _ , the available options are _Hour, Day, Month, and Year.

Partition Type -> Integer ColumnIncludes the following fields with the options set to custid (int), postal code (int), or gender (int). Includes the options Start/End/Range.
Clustering Columns (upto 4 columns allowed)The columns to be used for clustering. You can select up to four columns.
Persist Staging Data in GCSThe option to indicate whether the temporary staging data must be persisted in GCS. The default option is to not persist data.
Sync Table Schema

The option to synchronize exported table schema with source table schema.

NOTE Any new columns needs to be manually selected and added as part of the column list.

BigQuery LabelsThis label enables you to add labels to your tables in BigQuery. For information about BigQuery labels, refer to BigQuery Labels.
Columns
Exclude ColumnsSelect the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include ColumnsSelect the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Big Query, see the section Sync Data to Target.

Configuring Azure Synapse Analytics Target

Following are the Azure Synapse Analytics Target configuration details:

FieldDescription
Sync Type

The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it.

Overwrite: Drops and recreates the target.

Append: Appends data to the existing target.

Merge: Merges data to the existing table based on the natural key.

Target TypeThe type of connection. Select Azure Synapse Analytics.
Connection Parameters
CustomUse custom Target Data Connection by entering connection details for the current target type.
Target Data ConnectionUse one of the existing Target Data connections for the current target type.
Server NameThe server name of the Azure Synapse Analytics.
User NameThe user name of the Azure Synapse Analytics.
Authentication Type for Password

Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store.

If you select Infoworks Managed, then provide Authentication Password for Password.

If you select External Secret Store, then select the Secret which contains the password.

WarehouseThe warehouse name of the Azure Synapse Analytics.
Additional ParamsThe optional JDBC parameters.
Target Configurations
Schema NameSchema name of the Azure Synapse Analytics target.
Create a schema if it does not existEnable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Azure Synapse Analytics Target.
Table NameTable name of the Azure Synapse Analytics target.
Indexing TypeThe indexing options include clustered columnstore index, clustered index and nonclustered index, and a non-index option - heap. Note: Clustered Column Store and Non-Clustered indexes are not supported for tables containing column(s) of STRING data type, and Columns with STRING data type and TIMESTAMP datatype can not be used as a partitioning column.(https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver15)
Distribution TypeDistribution methods for designing distributed tables. The options include Hash, Replication and Round Robin.
Natural KeysThe required natural keys for the Azure Synapse Analytics target.
Partition Column NameThe column name for partition.
Sync Table Schema

The option to synchronize exported table schema with source table schema.

NOTE Any new columns needs to be manually selected and added as part of the column list.

Columns
Exclude ColumnsSelect the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include ColumnsSelect the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Azure Synapse Analytics, see the section Sync Data to Target.

Configuring Snowflake Target

Following are the snowflake configuration details:

FieldDescription
Sync Type

The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it.

Overwrite: Drops and recreates the target.

Append: Appends data to the existing target.

Merge: Merges data to the existing table based on the natural key.

Target TypeThe type of connection. Select Snowflake.
Connection Parameters
CustomUse custom Target Data Connection by entering connection details for the current target type.
Target Data ConnectionUse one of the existing Target Data connections for the current target type.
Connection URLThe URL to connect to Snowflake which specifies the hostname for the account in the format, <account_name>.snowflakecomputing.com
Account NameThe Snowflake account name. The full account name might include additional segments that identify the region and cloud platform where the account is hosted. For example, xy12345 for US West (Oregon), xy12345.us-east-2.aws for US East (Ohio), and xy12345.us-east-1 US East (N. Virginia).
User NameUsername of the snowflake account. For example, mark@example.com.
Authentication Type for Password

Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store.

If you select Infoworks Managed, then provide Authentication Password for Password.

If you select External Secret Store, then select the Secret which contains the password.

WarehouseSnowflake warehouse name. For example, sales.
Additional ParamsOptional JDBC parameters.
Target Configurations
Database NameDatabase name of the snowflake target.
Create a database if it does not existEnable this option to create a new database with the name provided above. Ensure that the user has sufficient privileges to create a databse in Snowflake target.
Schema NameSchema name of the snowflake target.
Create a schema if it does not existEnable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Snowflake target.
Table NameTable name of the snowflake target.
Natural KeysThe required natural keys for the snowflake target.
Query TagsA string that is added to the Snowflake query and can be accessed via Query history in Snowflake.
Sync Table Schema

The option to synchronize exported table schema with source table schema.

NOTE Any new columns needs to be manually selected and added as part of the column list.

Columns
Exclude ColumnsSelect the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include ColumnsSelect the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Snowflake, see the section Sync Data to Target.

LIMITATION To run Sync type as Append or Merge mode with Snowflake target, you must ensure that a table (with same database name, schema name as configured in sync to target configuration) has to exist on Snowflake target. If the table is not available on the Snowflake target, then you must manually create it.

Configuring Cosmos DB Target

Following are the Cosmos DB configuration details:

FieldDescription
Sync Type

The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it.

Overwrite: Drops and recreates the target.

Append: Appends data to the existing target.

Merge: Merges data to the existing table based on the natural key.

Target TypeThe type of connection. Select Cosmos DB.
Connection Parameters
CustomUse custom Target Data Connection by entering connection details for the current target type.
Target Data ConnectionUse one of the existing Target Data connections for the current target type.
End PointThe URL for the Cosmos DB target.
Authentication type Master Key

The master key token is the all access key token that allows users to have full control of Cosmos DB resources in a particular account.

Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store.

If you select Infoworks Managed, then provide Authentication Password for Master Key.

If you select External Secret Store, then select the Secret which contains the master Key.

Target Configurations
Database NameDatabase name of the Cosmos DB target.
Collection NameCollection to which the documents must to be written.
Is Existing CollectionEnable this option if the documents must be appended/merged to the existing/provided collection. If unchecked, a new collection will be created.
Provision Throughput (400 - 10000)Provision throughput for the container created. The default value is 400.
Partition KeyThe partition key on the Cosmos DB collection to be created. Partition key must be in path format for the JSON document which is to be written to Cosmos DB, for example, /employee/employee_age.
Unique KeysA unique key policy is created when an Azure Cosmos container is created. Unique keys ensure that one or more values within a logical partition is unique. Unique key must be in the path format for the JSON document which is to be written to the Cosmos DB, for example, /employee/employee_age. Click Add to add the unique keys or click Remove to remove the unique key.
Sync Table Schema

The option to synchronize exported table schema with source table schema.

NOTE Any new columns needs to be manually selected and added as part of the column list.

Columns
Exclude ColumnsSelect the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include ColumnsSelect the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Cosmos DB, see the section Sync Data to Target.

NOTE For Dataproc 1.4 (Spark 2x and Scala 2.11 version), you must set internalIpOnly to true in the {INFOWORKS_HOME}/conf/dataproc_defaults.json file, to successfully run Sync to target with Cosmos DB.

LIMITATIONS The following are some of the limitations for Sync to target jobs with Cosmos DB target.

  • Export to Cosmos DB target is not supported on Spark 2x and Scala 2.12 versions.
  • The column name "id" is mandatory in the data to be written while running jobs on Spark 3.x and Scala 2.12 versions. Else, the CDC merge jobs writing data to Cosmos DB will fail.

Configuring Delimited File Target

Following are the Delimited File configuration details:

FieldDescription
Sync TypeThe options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key.
Target TypeThe type of connection. Select Delimited File.
Connection Parameters
EnvironmentSelect the environment where you want to run the Sync to Target job.
StorageSelect the storage where the exported data will reside.
Target PathSelect the path for the stored data.
Target Configurations
Delimiter CharacterThe field separator or delimiter to separate out the fields in the file.
Escape CharacterThe character used in the delimited files to escape occurrences of Column Separator and Column Enclosed By in data. All the files in the mentioned source path must have the same value. This value must be a character. The default value is " \ ".
Null ValuesEnter the values in the null values text box that are to be replaced in the null values in the final csv file.
Additional ParamsIncludes the additional parameters.
Partition ColumnIs the column to partition data with.
Enable Text EnclosingEncloses quotes for the fields that are exported.
Add Header RowAdds the header row for the CSV file that is exported.
Sync Table Schema

The option to synchronize exported table schema with source table schema.

NOTE Any new columns needs to be manually selected and added as part of the column list.

Columns
Exclude ColumnsSelect the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include ColumnsSelect the appropriate column name and click Include Columns to include columns in the target table.

To sync data to Delimited File, see the section Sync Data to Target.

LIMITATION The complex data types are not supported for Sync Data to Target on Delimited files.

Configuring Postgres Target

Following are the Postgres configuration details:

FieldDescription
Sync Type

The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it.

Overwrite: Drops and recreates the target.

Append: Appends data to the existing target.

Merge: Merges data to the existing table based on the natural key.

Target TypeThe type of connection. Select Postgres.
Connection Parameters
CustomUse custom Target Data Connection by entering connection details for the current target type.
Target Data ConnectionUse one of the existing Target Data connections for the current target type.
JDBC URLThe URL that the JDBC uses to connect to the database.
UsernameThe username for the connection to the database.
Authentication Type for Password

Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store.

If you select Infoworks Managed, then provide Authentication Password for Password.

If you select External Secret Store, then select the Secret which contains the password.

Additional ParamsThe optional JDBC parameters.
Target Configurations
Schema NameSchema name of the Postgres target.
Create a schema if it does not existEnable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Postgres.
Table NameTable name of the Postgres target.
Natural KeysThe required natural keys for the Postgres target.
Is Existing TableWhen enabled, existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks.
Partition Type

The type of partition for the table. By default, the Partition Type is set to None. You can also select the following partition types:

List: To partition a table by a list of known values.

Range: To partition a table by a range of values.

Hash: To partition a table using a hash function on the partition key.

Partition ColumnThe column used to partition the data in the target.
Child Partition TableThe name of the child partition table. This field is displayed for the list, range, and hash partition types. You can add multiple child partition tables. While creating the child partition tables, make sure that of each row of source dataset is mapped to anyone of the child partition. If a row does not have a matching child partition, it will fail.
Is DefaultSelect this check box to use the default partitioning for the child partition. Default partition is supported only for list partition.
Partition Type -> List -> ValuesEnter the value based on which child partition is created.
Partition Type -> Range -> MinThe minimum value for the range.
Partition Type -> Range -> MaxThe maximum value for the range.
Partition Type -> Hash -> Partition ModulusThe partition modulus for the Hash Partition type.
Partition Type -> Hash -> RemainderThe remainder associated with the child partition.
RemoveClick Remove to remove the child partition table.
AddClick Add to add a new Child Partition table.
Index Type

The indexing option. By default, the Index Type is set to None. You can also select the following Index types:

Hash: Can be used where equal comparison is more important. You can only look up for data that matches exactly.

BTree: Is used for the data that can be sorted and can handle equality and range queries.

GIST: This option is used when the data to be indexed is more complex.

SP-GIST: This option is used when the data can be grouped into non-overlapping groupings.

GIN: This option is used when indexing data that consists of multiple elements in a single column such as arrays, json documents (jsonb) or text search documents (tsvector).

BRIN: This option is used for large size tables that have columns with some natural sort order.

Note:GIST, SP-GIST, GIN indexes are not supported with primitive datatypes. Following datatypes can be used with these indexes:

  1. GIST, SP-GIST : TSTZRANGE
  2. GIN : array
Index ColumnThe Index column for the target table to be created.
Is Unique Index

Enable this option to define a unique constraint on the btree index. This field is available only for BTree type indexing.

Note: Unique constraints on partitioned tables must include all the partition key columns.

Sync Table Schema

The option to synchronize exported table schema with source table schema.

NOTE Any new columns needs to be manually selected and added as part of the column list.

Columns
Exclude ColumnsSelect the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include ColumnsSelect the appropriate column name and click Include Columns to include columns in the target table.

LIMITATIONS

  • For Postgres, the master-slave selection during failover cannot be automated. It has to be set-up manually.
  • Airflow does not allow configurations to select where the reads / writes should be directed to.

To sync data to Postgres, see the section Sync Data to Target.

Configuring Teradata Target

FieldDescription
Sync Type

The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it.

Overwrite: Drops and recreates the target.

Append: Appends data to the existing target.

Merge: Merges data to the existing table based on the natural key.

TargetThe type of connection. Select Teradata.
Connection Parameters
CustomUse custom Target Data Connection by entering connection details for the current target type.
Target Data ConnectionUse one of the existing Target Data connections for the current target type.
Server NameThe server name of the Teradata.
User NameThe user name of the Teradata.
Authentication Type for Password

Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store.

If you select Infoworks Managed, then provide Authentication Password for Password.

If you select External Secret Store, then select the Secret which contains the password.

Additional ParamsThe optional JDBC parameters.
Target Configurations
Database NameDatabase name of the Teradata target.
Natural KeysThe required natural keys for the Teradata target.
Target Table TypeThis option classifies tables as SET or MULTISET tables based on how the duplicate records are handled. SET tables do not allow duplicate rows to be inserted, and it is not allowed to create a duplicate row with an UPDATE statement. MULTISET tables do not have these restrictions.
Partition Type

The type of partition for the table. By default, the Partition Type is set to None. You can also select Range_N partition type.

NOTE For other partition types, user can choose to create the table with desired partitions in Teradata, and use the Is Existing Table check box for exporting data.

Partition ColumnThe column to be used to partition tables in target. The column needs to be of Date data type.
Partition StartThe start value for the Range_N partition.
Partition EndThe end value for the Range_N partition.
Partition Interval ValueThe interval value for the Range_N partition.
Partition Interval PeriodThe interval period for the Range_N partition.
Indexing ColumnsThe Primary Index column(s) for the target table to be created.
Unique IndexEnable this option to define a unique constraint on the index.
Sync table Schema

The option to synchronize exported table schema with source table schema.

NOTE Any new columns needs to be manually selected and added as part of the column list.

Columns
Exclude ColumnsSelect the appropriate column name and click Exclude Columns to exclude columns in the target table.
Include ColumnsSelect the appropriate column name and click Include Columns to include columns in the target table.

NOTE

  • Indexing and Partition columns must be included in natural keys for Merge sync type.
  • We use TERADATA FASTLOAD by default, which does not support duplicate records. If you want to load duplicate records, set TYPE=DEFAULT in Additional Params.
  • (Applicable only for Fastload) If table export fails and you receive, "Details of the failure can be found in the exception chain that is accessible with getNextException", there are two ways to handle it. You can follow either of them.
    • Set TYPE=DEFAULT in Additional Params under Connection Parameters.
    • Set advance configuration with key set to export_datawriter_conf and value set to repartition_for_td_export=true.

To sync data to Postgres, see the section Sync Data to Target.

Sync Data to Target

After the Target is configured, perform the following steps to sync data to target.

Step 1: From the Data Sources menu, select one of the tables, and click View Source button.

Step 2: Select the source table to be synchronized to Target.

Step 3: Click the Sync Data to Target button.

Step 4: Enter the mandatory fields as listed in the table below:

FieldDescription
Job NameThe name of the ingestion job.
Max Parallel TablesThe maximum number of tables that can be crawled at a given instance.
Compute ClusterThe template based on which the cluster will spin up for each table.The compute clusters created by admin and are accessible by the user are listed in the drop down.
Overwrite Worker CountThe option to override the maximum and minimum number of worker node values as configured in the compute template
Number of Worker NodesThe number of worker nodes that will spin up in the cluster.
Save as a Table GroupThe option to save the list of tables as a table group.

Step 5: Click Sync Data to Target button.

  Last updated