Onboarding Data from Amazon Redshift

Creating an Amazon Redshift Source

For creating an Amazon Redshift source, see Configuring Additional Connectors. Ensure that the Source Type selected is Amazon Redshift.

For configuring an Amazon Redshift Table, see Configuring a Table.

Amazon Redshift Configurations

Field	Description
Fetch Data Using	The mechanism through which Infoworks fetches data from the database.
Connection URL	The connection URL through which Infoworks connects to the database.
Username	The username for the connection to the database.
Password	The password for the username provided.
Source Schema	The schema in the database to be crawled. The schema value is case sensitive.

Once the settings are saved, you can test the connection.

Configuring an Amazon Redshift Table

With the source metadata in the catalog, you can now configure the table for CDC and incremental synchronization.

Click the Configure Tables tab.
Click on the required table from the list of tables available for the source.
Provide the ingestion configuration details.

Field	Description
Ingestion Configuration
Ingest Type	The type of synchronization for the table. The options include full refresh and incremental.
Natural Keys	The combination of keys to uniquely identify the row. This field is mandatory in incremental ingestion tables. It helps in identifying and merging incremental data with the already existing data on target.
Incremental Mode	The option to indicate if the incremental data must be appended or merged to the base table. This field is displayed only for incremental ingestion. The options include append and merge.
Watermark Column	The watermark column to identify the incremental records.
Ingest subset of data	The option to configure filter conditions to ingest a subset of data. For more details, see Filter Query for RDBMS Sources.
SQL Queries
Select Query	The additional inputs required with a select clause, like optimization hints or lock changes. The default value is `SELECT ${columnList} FROM .`, where, `${columnList}` is the placeholder for all the columns to be queried and `and` will be pre-filled with the appropriate values.<br><br> Note Ensure to provide the correct SQL query syntax as per the DB used.
Target Configuration
Target Table Name	The name of the target table. The name cannot contain special characters except underscore.
Schema Name	The name of the target schema. The name cannot contain special characters except underscore.
Storage Format	The format in which the tables must be stored. The options include Read Optimised (Delta) and Write Optimised (Avro).
Partition Column	The column used to partition the data in target. Selecting the Create Derived Column option allows you to derive a column from the partition column. This option is enabled only if the partition column datatype is date or timestamp. <br><br>Provide the Derived Column Function and Derived Column Name. The data will be partitioned based on this derived column.
Optimization Configuration
Split By Column	The column used to crawl the table in parallel with multiple connections to database. Split-by column can be an existing column in the data. Any column for which minimum and maximum values can be computed, can be a split-by key. <br><br>Select the Create Derived Split Column option and provide the Derived Split Column Function to derive a column from the split-by column. This option is enabled only if the split-by column datatype is date or timestamp. The data will be split based on the derived value.
Generate History View	The option to preserve data in the history table. If enabled, after each CDC, the data will be appended to the history table.

Establishing a Connection

To create a JDBC URL, see Establishing a Connection.

Last updated on

Was this page helpful?