Infoworks 6.1.3
Onboard Data
Additional Connectors

Onboarding Data from Amazon Redshift

Creating an Amazon Redshift Source

For creating an Amazon Redshift source, see Configuring Additional Connectors. Ensure that the Source Type selected is Amazon Redshift.

For configuring an Amazon Redshift Table, see Configuring a Table.

Amazon Redshift Configurations

FieldDescription
Fetch Data UsingThe mechanism through which Infoworks fetches data from the database.
Connection URLThe connection URL through which Infoworks connects to the database.
UsernameThe username for the connection to the database.
PasswordThe password for the username provided.
Source SchemaThe schema in the database to be crawled. The schema value is case sensitive.

Once the settings are saved, you can test the connection.

Configuring an Amazon Redshift Table

With the source metadata in the catalog, you can now configure the table for CDC and incremental synchronization.

  • Click the Configure Tables tab.
  • Click on the required table from the list of tables available for the source.
  • Provide the ingestion configuration details.
FieldDescription
Ingestion Configuration
Ingest TypeThe type of synchronization for the table. The options include full refresh and incremental.
Natural KeysThe combination of keys to uniquely identify the row. This field is mandatory in incremental ingestion tables. It helps in identifying and merging incremental data with the already existing data on target.
Incremental ModeThe option to indicate if the incremental data must be appended or merged to the base table. This field is displayed only for incremental ingestion. The options include append and merge.
Watermark ColumnThe watermark column to identify the incremental records.
Ingest subset of dataThe option to configure filter conditions to ingest a subset of data. For more details, see Filter Query for RDBMS Sources.
SQL Queries
Select QueryThe additional inputs required with a select clause, like optimization hints or lock changes. The default value is SELECT ${columnList} FROM ., where, ${columnList} is the placeholder for all the columns to be queried and and will be pre-filled with the appropriate values.<br><br>$inline[badge,Note,primary] Ensure to provide the correct SQL query syntax as per the DB used.
Target Configuration
Target Table NameThe name of the target table. The name cannot contain special characters except underscore.
Schema NameThe name of the target schema. The name cannot contain special characters except underscore.
Storage FormatThe format in which the tables must be stored. The options include Read Optimised (Delta) and Write Optimised (Avro).
Partition ColumnThe column used to partition the data in target. Selecting the Create Derived Column option allows you to derive a column from the partition column. This option is enabled only if the partition column datatype is date or timestamp. <br><br>Provide the Derived Column Function and Derived Column Name. The data will be partitioned based on this derived column.
Optimization Configuration
Split By ColumnThe column used to crawl the table in parallel with multiple connections to database. Split-by column can be an existing column in the data. Any column for which minimum and maximum values can be computed, can be a split-by key. <br><br>Select the Create Derived Split Column option and provide the Derived Split Column Function to derive a column from the split-by column. This option is enabled only if the split-by column datatype is date or timestamp. The data will be split based on the derived value.
Generate History ViewThe option to preserve data in the history table. If enabled, after each CDC, the data will be appended to the history table.

Establishing a Connection

To create a JDBC URL, see Establishing a Connection.