Title
Create new category
Edit page index title
Edit category
Edit link
Configuring Additional Connectors
This section explains how to onboard data from a CDATA source, to the data lake by configuring the compute environment and base location.
Before initiating the onboarding process, you must ensure to define an Environment. For more information on how to define the Environment, refer to $link[page,321250,Managing Environments] section.
The Infoworks onboarding process includes the following steps:
$link[page,321304,Configure Source and Target,configure-source-and-target]
$link[page,321304,Select Tables,select-tables]
$link[page,321304,Configure Synchronization,configure-synchronization]
$link[page,321304,Onboard Data,onboard-data]
Configure Source and Target
After defining the compute environment, you can set up a CDATA source for ingestion, that includes configuring the connection URL, user credentials, and target location where you want to onboard the data.
To define the source and target, you must perform the following steps:
- Select the Data Sources icon on the left navigation and click Onboard New Data.
- Select the CDATA Source in the Source Type, from the Categories list. For example, to create an Act CRM data source, select Act CRM.
Now, you can configure the source connection properties and select the data environment where you want to onboard the data.
Configure the following fields:
You may choose to click the Save and Test Connection button to save the settings and ensure that Infoworks is able to connect to the source system.
Click Next to proceed to the next step.
Select Tables
There are two ways in which tables can be selected, browsing the source to choose the tables and configuring a custom query.
- Browse Source: You can browse the source to select the tables to be onboarded as per requirement. You can add more tables later.
- In the Browse Source tab, select the source tables you want to onboard the data.
- Filter the tables by Source schema, Table Name, and Table Type and refresh the list.
- Specify the Target Schema Name and Target Table Name in the respective fields for the data environment.
- Click Next to proceed.
- Add Query As Table: You can create a table using a custom query so that you can ingest a subset of the data from a table in the source or data belonging to more than one table in the source.
- Click the Add Query As Table tab, and click Add Query As Table button. A pop-up window appears.
- Enter the Query, Target Schema Name and Target Table Name in the relevant fields, and click Save.
- If you want to add more tables, click on Add Query As Table and repeat step 2.
$inline[badge,NOTE,primary] Click the Edit button to edit the entered fields, and Preview button to preview the schema and sample data.
- Click Next to start the metadata crawl.
Configure Synchronization
The tables are automatically set to Full refresh by default.
To modify the configurations and synchronize the table that is onboarded:
- Click the Configuration link for the desired table.
- Enter the configuration details as mentioned below under "Configure a Table" section.
Configuring a Table
With the source metadata in the catalog, you can now configure the table for CDC and incremental synchronization.
- Click the Configure Tables tab.
- Click on the required table from the list of tables available for the source.
- Provide the ingestion configuration details.
$inline[badge,Tips and Tricks,success]
There are some constraints in the exposed SQL query, which are not the generic sql constraints:
- The column value to be matched in the where clause if is of string data type the value have to be enclosed in single quotes but not in double quotes.
Query that fails:select ${columnList} from "infoworks-new"."active_tokens" where '_id'="5f843fde639afc3ab0f2bf39"``
Query that succeeds :select ${columnList} from "infoworks-new"."active_tokens" where '_id'='5f843fde639afc3ab0f2bf39'``
- If there are any column name that starts with '_' then the column name have to be used with a single quotes in the exposed query.
Query that fails:select ${columnList} from "infoworks-new"."access_logs" WHERE _id ='f2e08b8df0dbf1fd113bbc73'``
Query that succeeds:select ${columnList} from "infoworks-new"."access_logs" WHERE '_id' ='f2e08b8df0dbf1fd113bbc73'``
Adding a column to the table
After metadata crawl is complete, you have the flexibility to add a target column to the table.
Target Column refers to adding a target column if you need any special columns in the target table apart from what is present in that source.
You can select the datatype you want to give for the specific column
You can select either of the following transformation modes: Simple and Advanced
Simple Mode
In this mode, you must add a transformation function that has to be applied for that column. Target Column with no transformation function applied will have null values in the target.
Advanced Mode
In this mode, you can provide the Spark expression in this field. For more information, refer to $link[page,321319,Adding Transform Derivation,editing-table-schema].
$inline[badge,NOTE,primary] When table is in ready state (already ingested), schema editing is disabled.
Onboard Data
The final step is to onboard the tables. You can also schedule the onboarding so that the tables are periodically synchronized.
Click Onboard Data at the bottom right of the screen to onboard the data.
On the Success message pop-up, click View Data Catalog to onboard additional data or click on View Job Status to monitor the status of the onboarding job submitted.
Additional Options
Refer the sections below, if you want to configure additional parameters for the source.
Additional Connection Parameters
You can set additional connection parameters to the source as key-value pairs. These values will be used when connecting to the source database.
To add additional connection parameters:
- Click Add and enter the Key and Value fields.
- Select Encrypt Value to encrypt the value. For example, Password.
- Select the Is Active check box if the parameter is to be set to Yes.
- Click Save to save the configuration details. The parameters appear in the Additional Connection Parameters section.
- You can edit or delete the parameters using the Edit or Delete icons.
Configuration Migration
For migrating configurations of a CDATA source table, see $link[page,321324,auto$]
Optional: Source Extensions
To add a source extension to process the data before onboarding, see $link[page,321504,auto$]
Optional: Advanced Configuration
To set the Advanced Configurations at the source level, see $link[page,321325,Setting Source-Level Configurations,setting-source-level-configurations].
Optional: Subscribers
You can notify the Subscribers for the ingestion jobs at the source level. To configure the list of subscribers, see $link[page,321326,auto$]
Delete Source
Click Delete Source to delete the source configured.
Onboard Additional Tables
To onboard more tables from the same data source, follow these steps.
- Navigate to the already configured data source
- Click the Onboard More Tables button
- Select the tables and provide necessary details
Enabling OAuth for CDATA Drivers
The CData JDBC drivers provide support for retrieving, refreshing, and storing authentication tokens. CDATA offers connectors like REST based connectors that support OAuth authentication. The OAuth authentication uses access tokens and refresh tokens to make connections to the source application. OAuth can be enabled for all the CDATA drivers that support OAuth.
Since Infoworks jobs run on multiple workers, each worker needs access tokens and refresh tokens so that they can access the data from the required application. In case the access token expires or is invalid, the driver gets a new access token using the refresh token.
The OAuth support implements the ITokenStore interface offered by CDATA and uses this interface to retrieve or modify tokens. Infoworks uses MongoDB storage to implement this interface to read or write tokens, as all the workers have access to the MongoDB storage.
Depending on the type of CDATA driver, there are different types of authentication that can be used to interact with the source application. You can find the details of different kinds of authentication that CDATA offers in the CDATA documentation for the driver you’re using.
The two most commonly used authentication mechanisms for the CDATA drivers, and how to enable them are described below:
To enable OAuth
Follow the steps mentioned below to set up the source for the CDATA connector you want to use. This example shows how to enable OAuth for the CDATA REST connector:** **
- Create a source for the CDATA driver as described in this section above.
- Click Add under Additional Connection Parameters section.
- The REST connector uses a .rsd file that defines the schema of the corresponding database. The .rsd file propagation to all the workers must be handled by the user by using the init scripts that runs on each worker while bringing up the corresponding cluster.
- Enter the key-value pair for all the additional parameters to be set. For example:
OAuthSettingsLocation: The location where the OAuthSettings file gets created which contains the access token and refresh token. Since the Infoworks jobs run on multiple workers, propagating the settings file to all the workers must be handled by the user.
AuthScheme = OAuth:Setting this parameter to OAuth enables the OAuth authentication for the source.
INITIATEOAUTH = GETANDREFRESH: By default the value is GETANDREFRESH. However, the user can overwrite this parameter at any time. If this parameter is set, then the driver will update and refresh the token if the token expires.
Infoworks offers a solution where the user does not need to handle the propagation of theOAuthSettings file across all the workers by storing the access and refresh tokens in Infoworks MongoDB which is accessible by all the workers. To use this feature, the user does not have to set anything but the AuthScheme = OAuth param. The OAuthSettingsLocation and INITIATEOAUTH parameters are set from the Infoworks backend.
- Click Save to save the configuration parameters set.
- Configure the tables as described in the section Configuring a Table.
- Ingest the data by selecting the .rsd file and click Ingest.
- Enter the details in the Ingest the Data screen as described in the section Ingesting Data.
Step Result: On job completion, the data is ingested to the target storage.
To enable OAuthJWT
Follow the steps mentioned below to set up the source for the CDATA connector you want to use. This example shows how to enable OAuthJWT for the CDATA Google BigQuery connector using a service account json file.
- Create a source for the CDATA driver as described in this section above.
- Click Add under Additional Connection Parameters section.
- Enter the key-value pair for all the following additional parameters to be set.
AuthScheme = OAuthJWT: Setting this parameter to OAuth enables the OAuthJWT authentication for the source, which authenticates to a service account using a JWT certificate.
INITIATEOAUTH = GETANDREFRESH: The default value is GETANDREFRESH. However, the user can overwrite this parameter. On setting this parameter, the driver will update and refresh the token if the token expires.
OAuthJWTCertType = GOOGLEJSON and OAuthJWTCert : location of the json file : Setting these parameters will facilitate using the json file from the location of the json file set as the OAuthJWTCert parameter. But again since Infoworks jobs run on multiple workers, the json file will be required to be on all the workers at the same location that is set as the OAuthJWTCert parameter.
OAuthJWTCertType = GOOGLEJSONBLOB and OAuthJWTCert : the content of the json file: Setting these parameters will directly pass the json file contents to the driver to authenticate and there is no need to put the json file on all the workers anymore.
- Click Save to save the configuration parameters set.
- Configure the tables as described in the section Configuring a Table.
- Ingest the data by selecting the table and click Ingest.
- Enter the details in the Ingest the Data screen as described in the section Ingesting Data.
Step Result: On job completion, the data is ingested to the target storage.
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
© UNIPHORE TECHNOLOGIES 2025 | Confidential