To configure a table, perform the following steps:
- Click the Configure Tables tab.
- Enter the ingestion configuration details.
- Click Save.
Select Query: The additional inputs required with a select clause, like optimization hints or lock changes. The default value is SELECT ${columnList} FROM <schemaName>.<tableName>, where, ${columnList} is the placeholder for all the columns to be queried and <schemaName> and <tableName> will be pre-filled with the appropriate values. This option is available only for Generic JDBC sources.
NOTE
Ensure to use the correct SQL Query syntax as per the DB used.
User Managed Table: With Data
Available for: Incremental Sync only
Use this option when the target table already contains historical data, and you want to load only incremental data from the source via Infoworks. Key Points:
- Ideal for scenarios where the target table has been populated by another system (e.g., external system or a previous Infoworks Ingestion Job).
- From the first job, merge operations or similar actions (such as insert-overwrite) can be performed.
- There is no need to truncate the target table.
- Truncate privileges are not required.
- For append-only use cases, only new rows will be inserted, leaving existing rows unchanged. Only insert privileges are needed in such cases.
User Managed Table: Without Data
Available for: Both Full Refresh and Incremental Sync
Use this option when Infoworks is responsible for ingesting all data into the target table, and you do not wish to grant DDL privileges (such as table creation) to Infoworks. Key Points:
- You must manually create the empty target table, ensuring that the schema matches the source data.
- If the target table already contains data, Infoworks will truncate the table before the first job.
- For Full Refresh, each job will truncate and reload the entire dataset.
- For Incremental Sync, the first job will perform a full refresh, truncating the table before loading incremental data.
- Truncate privileges are required on the target table.
If you select Delimited Text Files from the Storage Format dropdown, the following fields will appear.
NOTE
Infoworks does not support schema synchronization for Delimimited Text Files storage format.
NOTE
For a Streaming Source, if a target table is HIVE compatible, then the column datatype is set to String.
You can view the following pages under table schema:
You can edit the target datatype of columns using the following steps:
- Click the Data Sources menu and click View Source for the required source
- Click the Configure button for the required table.
- Click the Table Schema tab and edit the target data type for required columns. The Databricks metastore table will be created with the specified datatypes.
- The Options field allows you to edit the precision and scale of the specific column.
- You can select a relevant column and click Manage Column to Exclude, Include or Bulk Edit.
- Select Include if you want the column to be a part of the target.
- Select Exclude if you want the column to be excluded from the target.
- Select Bulk Edit to edit the precision and scale of all the columns.
NOTE
The columns excluded are not allowed to be part of Sync to Target.
- Click Save Schema.
Following are the datatype conversions supported:
You can apply Spark Expressions on the columns in the Ingestion table. Spark expressions can be applied in the Transform Derivation field. The Transform Derivation field appears only if Transform Mode is set to Advanced.
To apply Spark expressions on a newly created column:
Step 1: Go to the Table Schema tab.
Step 2: Click Add Column.
Step 3: Set the Transform Mode to Advanced.
Step 4: Provide the Spark Expression in the Transform Derivation field.
NOTE
In the Transform Derivation field, spark functions and column names will appear in the auto-suggestion dropdown.
To apply Spark expressions for an existing column:
Step 1: Click the Add button under the Transform Derivation column.
Step 2: Set the Transform Mode to Advanced.
Step 3: Provide the Spark Expression in the Transform Derivation field.
NOTE
For sources created on Snowflake environments, time data type has been added. Additionally there is a textbox which takes the precision for time data type columns. The value of this is prefilled during metacrawl but it can be edited.
The Sample Data page includes the sample data of the source table.
The Data page includes a visual representation of all the Databricks metastore tables.
The Lineage page displays all the instances where the data of the table is being used.
To view Source Audits, perform the following steps:
- Click the Data Sources menu and click View Source for the required source.
- Click the Source Setup tab and scroll down to the Audits section. The audit logs for the source is displayed.
Alternatively, click the Tables tab > Configuration > Table Schema > Audits to view the audits section.
NOTE
This screen displays only the modified properties. If a property was left unchanged during the iteration, it will not be displayed as an update, but the property will still be stored.
In source audit logs, following are some operators:
- $push - the operator displayed when an entity is added.
- $pull - the operator displayed when an entity is deleted.
In table audit logs, when updating column details, following logs are displayed:
columns.<columnIndexInTable>.<key>:<UpdatedValue> - indicates single column update.
columns:[<list of all the columns with details>]- indicates bulk column update.
Audit Trail
The audit trail displays records of all fields that have been created, updated, or deleted. Each entry includes:
- Action Type - Indicates the type of operation performed by the user — Created, Updated, or Deleted.
- Modification - Shows the details of the change made to a field. For updated fields, both the previous value and the updated value are displayed.
You can use the Target Column Order feature to set the target table columns in a specific order.
Consider the following scenario.
FNAME column will appear at position 1 in the target table, LNAME at 2, and so on.
NOTE
The Target Column Order value is unique for each Target Column.
You can re-order columns during the Ingestion on UI by specifying the target column order in which you want the columns to land in target.
To reorder target columns:
Step 1: Go to the Table Schema tab.
Step 2: Click Reorder Columns.
Step 3: Change the order of the columns by double clicking on the column row. You can also change the column order by clicking on the arrows in each row on right side.
Step 4: Click on Apply.
Step 5: You can view the target column order by clicking on the Target View on the left top corner of Edit Schema page.
NOTE
This section applies to all ingestion sources except Streaming sources.
Enter the following fields under Target Configuration.
Following are the steps to set advanced configuration for a table:
- Click the Data Catalog menu and click Ingest for the required source.
NOTE
For an already ingested table, click View Source, click the Tables tab, click Configure for the required table, and click the Advanced Configurations tab.
- Click the Configure Tables tab, click the Advanced Configurations tab, and click Add Configuration.
- Enter key, value, and description. You can also select the configuration from the list displayed.
For common ingestion job advanced configurations, please refer to the following:
Partitioning of Temporary Storage
This feature is not being used in 5.2 pilot release.
Subscribers
You can add subscribers to be notified for the ingestion at the table level.
Click Add Subscriber, and enter the following fields:
- Emails: The email ID of the subscriber.
- Notify Via: The means of notification. Available options are Email and Slack.
- Notify For: The jobs for which subscribers must be notified.
With the source metadata in the catalog, you can now configure the table for CDC and incremental synchronization. There are two types of Cloud Data Warehouse: Snowflake and BigQuery. Click the Configuration link for the desired table.
In the General Configurations tab, provide the following Ingestion Configuration details:
NOTE
- If the combination of schema name, database name and table name are same across different tables in the same environment, an error appears showing the combination of the three is conflicting with another table.
- For Snowflake, we check the combination of table name, schema name and database name
- For BigQuery, we check the combination of dataset name and table name.
- For all other environments, we check combination of schema name and table name.