You can configure the Target connections and sync data as described in the following sections.
Configure a Target -> Sync Data to Target |
The following are the steps to configure a Target.
Sync Type | Description |
---|---|
Disabled | Default Value. |
Overwrite | Drops and recreates the target. |
Merge | Merges data to the existing table based on the natural key. |
Append | Appends data to the existing target. |
If the table is configured for Incremental Merge mode of ingestion and selected for Merge mode of Sync to Target, then the exported table can include random duplicate data (with or without duplicates).
However, for the delete operation during ingestion, the records will still be available in the exported table.
Step 5: Select one of the Target types and enter all the mandatory fields as provided under the following sections:
Configuring Azure Synapse Analytics Target
Configuring Delimited File Target
Step 6: Click Save to save the configuration settings.
Following are the BigQuery configuration details:
Field | Description |
---|---|
Sync Type | The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key. |
Target Type | The type of connection. Select BigQuery. |
Load Job Configuration | |
Export Type | You can select the export type from the dropdown: BQ Load or Spark. Default value: Spark. |
Source File Format | This field denotes the source file type which has to be exported. |
Target Cloud Type | This field indicates the type of cloud data will be exported to. |
Max Size Per Load Job | Maximum size of files in bytes per load job. It must be a whole number. If the maximum size of files is given as m and the total file size is n, where n >= m, then the internal n/m BQ load jobs will run. If max size per load job is lesser than the file size, then that specific file will be skipped. Default value: It is 10000000000000 or approximately 9TB, and the field is editable. |
Connection Parameters | |
Custom | Use custom Target Data Connection by entering connection details for the current target type. |
Target Data Connection | Use one of the existing Target Data connections for the current target type. |
Project ID | The ID of the project in the source BigQuery. |
Authentication Mechanism | The mechanism using which the security information is stored. The options include System and Service Account Credential File. If the Authentication Mechanism selected is Service Account Credential File, you can upload the file or enter the file location. |
GCS Temporary Storage Bucket | The path to the Google Cloud Storage folder accessible by Infoworks. This is used as temporary storage during load to BigQuery. Ensure that the service account has read/write access to this storage path. |
Target Configurations | |
Dataset Name | Dataset name of the BigQuery target. |
Create a dataset if it does not exist | Enable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a dataset in BigQuery target. |
Table Name | Table name of the BigQuery target. |
Natural Keys | The required natural keys for the BigQuery target. |
Is Existing Table | When enabled, existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks |
Partition Type | The column name for the partition. The options include BigQuery Load Time, Date Column, Timestamp Column, and Integer Column. |
Partition Type - > Big Query Load time | Includes the options for BigQuery Load. For example, Day or Hour. |
Partition Type -> Date Column -> Partition Column | You can select the Partition Column as last_modified (date). |
Partition Type -> Timestamp Column -> Partition Time Interval | This field is visible when the Partition type is BigQuery Load Time, Date Column, or Timestamp Column |
Partition Type -> Integer Column | Includes the following fields with the options set to custid (int), postal code (int), or gender (int). Includes the options Start/End/Range. |
Clustering Columns (upto 4 columns allowed) | The columns to be used for clustering. You can select up to four columns. |
Persist Staging Data in GCS | The option to indicate whether the temporary staging data must be persisted in GCS. The default option is to not persist data. |
Sync Table Schema | The option to synchronize exported table schema with source table schema. |
BigQuery Labels | This label enables you to add labels to your tables in BigQuery. For information about BigQuery labels, refer to BigQuery Labels. |
Columns | |
Exclude Columns | Select the appropriate column name and click Exclude Columns to exclude columns in the target table. |
Include Columns | Select the appropriate column name and click Include Columns to include columns in the target table. |
To sync data to Big Query, see the section Sync Data to Target.
Following are the Azure Synapse Analytics Target configuration details:
Field | Description |
---|---|
Sync Type | The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key. |
Target Type | The type of connection. Select Azure Synapse Analytics. |
Connection Parameters | |
Custom | Use custom Target Data Connection by entering connection details for the current target type. |
Target Data Connection | Use one of the existing Target Data connections for the current target type. |
Server Name | The server name of the Azure Synapse Analytics. |
User Name | The user name of the Azure Synapse Analytics. |
Authentication Type for Password | Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Password. If you select External Secret Store, then select the Secret which contains the password. |
Warehouse | The warehouse name of the Azure Synapse Analytics. |
Additional Params | The optional JDBC parameters. |
Target Configurations | |
Schema Name | Schema name of the Azure Synapse Analytics target. |
Create a schema if it does not exist | Enable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Azure Synapse Analytics Target. |
Table Name | Table name of the Azure Synapse Analytics target. |
Indexing Type | The indexing options include clustered columnstore index, clustered index and nonclustered index, and a non-index option - heap. Note: Clustered Column Store and Non-Clustered indexes are not supported for tables containing column(s) of STRING data type, and Columns with STRING data type and TIMESTAMP datatype can not be used as a partitioning column.(https://docs.microsoft.com/en-us/sql/relational-databases/partitions/partitioned-tables-and-indexes?view=sql-server-ver15) |
Distribution Type | Distribution methods for designing distributed tables. The options include Hash, Replication and Round Robin. |
Natural Keys | The required natural keys for the Azure Synapse Analytics target. |
Partition Column Name | The column name for partition. |
Sync Table Schema | The option to synchronize exported table schema with source table schema. |
Columns | |
Exclude Columns | Select the appropriate column name and click Exclude Columns to exclude columns in the target table. |
Include Columns | Select the appropriate column name and click Include Columns to include columns in the target table. |
To sync data to Azure Synapse Analytics, see the section Sync Data to Target.
Following are the snowflake configuration details:
Field | Description |
---|---|
Sync Type | The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key. |
Target Type | The type of connection. Select Snowflake. |
Connection Parameters | |
Custom | Use custom Target Data Connection by entering connection details for the current target type. |
Target Data Connection | Use one of the existing Target Data connections for the current target type. |
Connection URL | The URL to connect to Snowflake which specifies the hostname for the account in the format, <account_name>.snowflakecomputing.com |
Account Name | The Snowflake account name. The full account name might include additional segments that identify the region and cloud platform where the account is hosted. For example, xy12345 for US West (Oregon), xy12345.us-east-2.aws for US East (Ohio), and xy12345.us-east-1 US East (N. Virginia). |
User Name | Username of the snowflake account. For example, mark@example.com. |
Authentication Type for Password | Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Password. If you select External Secret Store, then select the Secret which contains the password. |
Warehouse | Snowflake warehouse name. For example, sales. |
Additional Params | Optional JDBC parameters. |
Target Configurations | |
Database Name | Database name of the snowflake target. |
Create a database if it does not exist | Enable this option to create a new database with the name provided above. Ensure that the user has sufficient privileges to create a databse in Snowflake target. |
Schema Name | Schema name of the snowflake target. |
Create a schema if it does not exist | Enable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Snowflake target. |
Table Name | Table name of the snowflake target. |
Natural Keys | The required natural keys for the snowflake target. |
Query Tags | A string that is added to the Snowflake query and can be accessed via Query history in Snowflake. |
Sync Table Schema | The option to synchronize exported table schema with source table schema. |
Columns | |
Exclude Columns | Select the appropriate column name and click Exclude Columns to exclude columns in the target table. |
Include Columns | Select the appropriate column name and click Include Columns to include columns in the target table. |
To sync data to Snowflake, see the section Sync Data to Target.
Following are the Cosmos DB configuration details:
Field | Description |
---|---|
Sync Type | The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key. |
Target Type | The type of connection. Select Cosmos DB. |
Connection Parameters | |
Custom | Use custom Target Data Connection by entering connection details for the current target type. |
Target Data Connection | Use one of the existing Target Data connections for the current target type. |
End Point | The URL for the Cosmos DB target. |
Authentication type Master Key | The master key token is the all access key token that allows users to have full control of Cosmos DB resources in a particular account. Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Master Key. If you select External Secret Store, then select the Secret which contains the master Key. |
Target Configurations | |
Database Name | Database name of the Cosmos DB target. |
Collection Name | Collection to which the documents must to be written. |
Is Existing Collection | Enable this option if the documents must be appended/merged to the existing/provided collection. If unchecked, a new collection will be created. |
Provision Throughput (400 - 10000) | Provision throughput for the container created. The default value is 400. |
Partition Key | The partition key on the Cosmos DB collection to be created. Partition key must be in path format for the JSON document which is to be written to Cosmos DB, for example, /employee/employee_age. |
Unique Keys | A unique key policy is created when an Azure Cosmos container is created. Unique keys ensure that one or more values within a logical partition is unique. Unique key must be in the path format for the JSON document which is to be written to the Cosmos DB, for example, /employee/employee_age. Click Add to add the unique keys or click Remove to remove the unique key. |
Sync Table Schema | The option to synchronize exported table schema with source table schema. |
Columns | |
Exclude Columns | Select the appropriate column name and click Exclude Columns to exclude columns in the target table. |
Include Columns | Select the appropriate column name and click Include Columns to include columns in the target table. |
To sync data to Cosmos DB, see the section Sync Data to Target.
Following are the Delimited File configuration details:
Field | Description |
---|---|
Sync Type | The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key. |
Target Type | The type of connection. Select Delimited File. |
Connection Parameters | |
Environment | Select the environment where you want to run the Sync to Target job. |
Storage | Select the storage where the exported data will reside. |
Target Path | Select the path for the stored data. |
Target Configurations | |
Delimiter Character | The field separator or delimiter to separate out the fields in the file. |
Escape Character | The character used in the delimited files to escape occurrences of Column Separator and Column Enclosed By in data. All the files in the mentioned source path must have the same value. This value must be a character. The default value is " \ ". |
Null Values | Enter the values in the null values text box that are to be replaced in the null values in the final csv file. |
Additional Params | Includes the additional parameters. |
Partition Column | Is the column to partition data with. |
Enable Text Enclosing | Encloses quotes for the fields that are exported. |
Add Header Row | Adds the header row for the CSV file that is exported. |
Sync Table Schema | The option to synchronize exported table schema with source table schema. |
Columns | |
Exclude Columns | Select the appropriate column name and click Exclude Columns to exclude columns in the target table. |
Include Columns | Select the appropriate column name and click Include Columns to include columns in the target table. |
To sync data to Delimited File, see the section Sync Data to Target.
Following are the Postgres configuration details:
Field | Description |
---|---|
Sync Type | The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key. |
Target Type | The type of connection. Select Postgres. |
Connection Parameters | |
Custom | Use custom Target Data Connection by entering connection details for the current target type. |
Target Data Connection | Use one of the existing Target Data connections for the current target type. |
JDBC URL | The URL that the JDBC uses to connect to the database. |
Username | The username for the connection to the database. |
Authentication Type for Password | Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Password. If you select External Secret Store, then select the Secret which contains the password. |
Additional Params | The optional JDBC parameters. |
Target Configurations | |
Schema Name | Schema name of the Postgres target. |
Create a schema if it does not exist | Enable this option to create a new schema with the name provided above. Ensure that the user has sufficient privileges to create a schema in Postgres. |
Table Name | Table name of the Postgres target. |
Natural Keys | The required natural keys for the Postgres target. |
Is Existing Table | When enabled, existing table behavior works as an external table to Infoworks. The table will not be created/dropped or managed by Infoworks. |
Partition Type | The type of partition for the table. By default, the Partition Type is set to None. You can also select the following partition types: List: To partition a table by a list of known values. Range: To partition a table by a range of values. Hash: To partition a table using a hash function on the partition key. |
Partition Column | The column used to partition the data in the target. |
Child Partition Table | The name of the child partition table. This field is displayed for the list, range, and hash partition types. You can add multiple child partition tables. While creating the child partition tables, make sure that of each row of source dataset is mapped to anyone of the child partition. If a row does not have a matching child partition, it will fail. |
Is Default | Select this check box to use the default partitioning for the child partition. Default partition is supported only for list partition. |
Partition Type -> List -> Values | Enter the value based on which child partition is created. |
Partition Type -> Range -> Min | The minimum value for the range. |
Partition Type -> Range -> Max | The maximum value for the range. |
Partition Type -> Hash -> Partition Modulus | The partition modulus for the Hash Partition type. |
Partition Type -> Hash -> Remainder | The remainder associated with the child partition. |
Remove | Click Remove to remove the child partition table. |
Add | Click Add to add a new Child Partition table. |
Index Type | The indexing option. By default, the Index Type is set to None. You can also select the following Index types: Hash: Can be used where equal comparison is more important. You can only look up for data that matches exactly. BTree: Is used for the data that can be sorted and can handle equality and range queries. GIST: This option is used when the data to be indexed is more complex. SP-GIST: This option is used when the data can be grouped into non-overlapping groupings. GIN: This option is used when indexing data that consists of multiple elements in a single column such as arrays, json documents (jsonb) or text search documents (tsvector). BRIN: This option is used for large size tables that have columns with some natural sort order. Note:GIST, SP-GIST, GIN indexes are not supported with primitive datatypes. Following datatypes can be used with these indexes:
|
Index Column | The Index column for the target table to be created. |
Is Unique Index | Enable this option to define a unique constraint on the btree index. This field is available only for BTree type indexing. Note: Unique constraints on partitioned tables must include all the partition key columns. |
Sync Table Schema | The option to synchronize exported table schema with source table schema. |
Columns | |
Exclude Columns | Select the appropriate column name and click Exclude Columns to exclude columns in the target table. |
Include Columns | Select the appropriate column name and click Include Columns to include columns in the target table. |
To sync data to Postgres, see the section Sync Data to Target.
Field | Description |
Sync Type | The options include Overwrite, Append, Merge, or Disabled. The Sync to Target functionality is disabled by default. Choose one of the below options to enable it. Overwrite: Drops and recreates the target. Append: Appends data to the existing target. Merge: Merges data to the existing table based on the natural key. |
Target | The type of connection. Select Teradata. |
Connection Parameters | |
Custom | Use custom Target Data Connection by entering connection details for the current target type. |
Target Data Connection | Use one of the existing Target Data connections for the current target type. |
Server Name | The server name of the Teradata. |
User Name | The user name of the Teradata. |
Authentication Type for Password | Select the authentication type from the dropdown. For example, Infoworks Managed or External Secret Store. If you select Infoworks Managed, then provide Authentication Password for Password. If you select External Secret Store, then select the Secret which contains the password. |
Additional Params | The optional JDBC parameters. |
Target Configurations | |
Database Name | Database name of the Teradata target. |
Natural Keys | The required natural keys for the Teradata target. |
Target Table Type | This option classifies tables as SET or MULTISET tables based on how the duplicate records are handled. SET tables do not allow duplicate rows to be inserted, and it is not allowed to create a duplicate row with an UPDATE statement. MULTISET tables do not have these restrictions. |
Partition Type | The type of partition for the table. By default, the Partition Type is set to None. You can also select Range_N partition type. |
Partition Column | The column to be used to partition tables in target. The column needs to be of Date data type. |
Partition Start | The start value for the Range_N partition. |
Partition End | The end value for the Range_N partition. |
Partition Interval Value | The interval value for the Range_N partition. |
Partition Interval Period | The interval period for the Range_N partition. |
Indexing Columns | The Primary Index column(s) for the target table to be created. |
Unique Index | Enable this option to define a unique constraint on the index. |
Sync table Schema | The option to synchronize exported table schema with source table schema. |
Columns | |
Exclude Columns | Select the appropriate column name and click Exclude Columns to exclude columns in the target table. |
Include Columns | Select the appropriate column name and click Include Columns to include columns in the target table. |
export_datawriter_conf
and value set to repartition_for_td_export=true
.To sync data to Postgres, see the section Sync Data to Target.
After the Target is configured, perform the following steps to sync data to target.
Step 1: From the Data Sources menu, select one of the tables, and click View Source button.
Step 2: Select the source table to be synchronized to Target.
Step 3: Click the Sync Data to Target button.
Step 4: Enter the mandatory fields as listed in the table below:
Field | Description |
---|---|
Job Name | The name of the ingestion job. |
Max Parallel Tables | The maximum number of tables that can be crawled at a given instance. |
Compute Cluster | The template based on which the cluster will spin up for each table.The compute clusters created by admin and are accessible by the user are listed in the drop down. |
Overwrite Worker Count | The option to override the maximum and minimum number of worker node values as configured in the compute template |
Number of Worker Nodes | The number of worker nodes that will spin up in the cluster. |
Save as a Table Group | The option to save the list of tables as a table group. |
Step 5: Click Sync Data to Target button.