Infoworks 6.1.2
Onboard Data

Creating a Source

To create a source, perform the following steps:

  1. Log in to Infoworks.
  2. In the left navigation pane, click the Data Sources icon. The Data Sources page appears.
  1. Click Onboard New Data. The Source Connectors page appears with the list of all available connectors.

NOTE You can filter the licensed connectors by selecting the Show Licensed Connectors only check box. You can filter by categories and also search for a specific connector.

  1. Click the required connector. The Setup Source page of the selected connector appears.
  1. In the Setup Source page, enter the following details:
FieldDescriptionDetails
Source NameA name for the source that will appear in the Infoworks User Interface.The source name must be unique and must not contain space or special characters except underscore. For example, Customer_Details.
Source File LocationThe storage systems location where your files are stored.

You can select one of the following options:

  • Databricks File System (DBFS)
  • Remote Server (using SFTP)
  • Cloud Storage
Data EnvironmentThe environment on which the source is defined.Select the environment from the drop-down list. The list includes the environments created by the Admin. For more details, refer to Managing Environments section.
Catalog NameThe name of the catalog in the data lake.

The name can only contain letters, numbers, hyphen, and underscore.

NOTE This will be available only to the Unity Catalog enabled data environments.

Schema NameThe name of the schema in the data lake.The name can only contain letters, numbers, hyphen, and underscore.
StorageThe target storage where the data will be stored.Select the storage from the drop-down list. The storage options are displayed based on the environment selected.
Base LocationThe location of the target on the File System.The path must start with / and must only contain letters, numbers, hyphen, and underscore. For example, /iw/sources/samplesource.
Staging Schema NameThe name of the schema where all the temporary tables (like CDC, segment tables etc) managed by Infoworks will be created. Ensure that Infoworks has ALL permissions assigned for this schema. This is an optional field. If staging schema name is not provided, Infoworks uses the Target Schema provided.This field appears only when the Data Environment is snowflake. Ensure either of the following: the target table names are unique across schemas or ensure that two tables with the same name do not run in parallel.
BigQuery Dataset NameDataset name of the BigQuery target.This field appears only when the Data Environment is BigQuery.
Staging Dataset NameThe name of the dataset where all the temporary tables (like CDC, segment tables, and so on) managed by Infoworks will be created. Ensure that Infoworks has ALL permissions assigned for this dataset. This is an optional field. If staging dataset name is not provided, Infoworks uses the Target dataset provided.This field appears only when the Data Environment is BigQuery.

6. Click Save.

NOTE For information on managing the access control of sources created, refer to Managing Access Control of Sources.

BigQuery Labels

This label enables you to add labels to your tables in BigQuery.

NOTE If you select BigQuery data environment from the dropdown, BigQuery label will appear.

FieldDescription
Key

This field depicts the key for the specific BigQuery label.

NOTE This field can’t be empty.

ValueThis field depicts the value for the specific BigQuery label.
DescriptionThis field describes the BigQuery label
ActionsThis field indicates the actions users can perform. For example, Add or delete
Add LabelsThis fields adds a new BigQuery label
Infoworks Managed LabelsIf this checkbox is selected, these labels overwrite the BigQuery label. If this checkbox is not selected, these labels will be appended to existing BigQuery labels.

For more information, refer to BigQuery Labels.

Job Hooks

Job Hooks are external python/bash scripts that can be run before or after a source ingestion job in the data plane. These scripts can access various job properties as environment variables in the data plane. For more information about job hooks, Using a Job Hook

  Last updated by Monika Momaya