Title
Create new category
Edit page index title
Edit category
Edit link
Onboarding Data from Confluent Cloud
Features
- Ingest data from multiple topics into single or multiple tables through Topic Mapping using Confluent Cloud platform.
- Ingest data in Batches or Real Time, using Spark Streaming.
- View snapshot of the configured topics under Topic Preview in Raw or Structured format.
- Visual Schema Projection to create tables from a subset of data.
Ingestion Flow
- Infoworks creates a Confluent Cloud consumer from the topics based on the user provided configurations.
- The consumer reads records from topic(s) based on the configuration provided under Topic Mappings.
- A configurable number of messages are read, and the schema for those records is crawled.
- Messages are crawled using Spark Structured Streaming, which converts messages into dataframe, and are appended continuously.
- Based on the configurations like storage format, and path to the output directory, a delta target table is populated.
Prerequisites
- Confluent Cloud supports records in JSON, AVRO, and PROTOBUF format only.
- Confluent Cloud ingestion cannot be integrated into a workflow as it is a near real time continuously streaming job.
Creating Confluent Cloud Source
For creating a Confluent source, see Creating a Source. Ensure that the Source Type selected is Confluent Cloud.
Configuring the Confluent Cloud Source
- Select the Confluent Cloud data source that is created.
- The configuration flow is organized into four tabs as follows:
$inline[badge,NOTE,primary] Ensure to complete configuring the details in each tab. Further, click on the next tab name displayed on the top of the screen, to complete the Confluent Cloud configuration.
$inline[badge,LIMITATION,error] For ingestion from Azure Event Hub using OAuth, ingestion job for confluent (while using OAuth) should be submitted to a non interactive cluster.
Set Up Source
To set up the source for Confluent Cloud, enter the following fields:
Target Field Descriptions
After entering the required details, click the Save and Test Connection button to save the settings and ensure that Infoworks is able to connect to the source system. Clicking on Save and Test Connection button also ensures that the Topics list is populated with the suggestions while configuring Topic Mappings in the next step.
Topic Mappings
Using the Topic Mappings tab, you can map and organize the data in the form of Topics. You can subscribe to required topics using list or regex patterns, and map it to single or multiple tables.
Perform the following steps to configure Topic Mappings:
- Click Topic Mapping on the left navigation.
- Enter the following fields:
- Click Save and Generate Preview to save the configured settings.
The following tabs are displayed in this window:
Topic Preview Tab
The Topic preview tab allows you to quickly view the snapshot of the topics.
$inline[badge,Note,primary] If the Topic Preview is not populated, please check the broker connection details under source configuration or check the list or regex entered in the Topics Mapping page.
Click the + icon corresponding to every message displayed, to preview the content, and then click Crawl Schema button.
The maximum number of rows of messages for which preview can be made available can be configured in topic_preview_row_count parameter of the Advanced Configuration section. The default value is 100 rows.
In the topic preview, the following views are available:
- Raw: Displays the content as it is read.
- Pretty: Displays the content in a structured format.
Schema tab
Select any path to create a table with the corresponding columns in it. Click on the required node. Further, you can also hold the Shift key and click on multiple required nodes. The children nodes of all the parent nodes selected become table columns.
For example, If you select only the address in the example above, the table created will consist of four columns: street address, city, state and postal code.
To manage the nodes in the schema, you may use the Add Node, Edit Node, and Remove Node (same as name suggests) buttons on the top-right corner of the tab. To revert the edits in schema, recrawl it from the Topic Preview tab.
$inline[badge,NOTE,primary]
- The Avro Schema crawl is not supported for Union data types with complex types such as Map and Array.
- The Map field for Avro is not supported from Schema crawl using record sampling method. The type will be detected as Struct and so you must manually change the data type to Map from the Edit Schema.
- For AVRO data sources, in the “Schema” screen, only the first “struct” can be selected. Creation of a table from the middle of the tree is not allowed.
Create Table Field Description
Click Create Table, to create the table with the selected nodes. The following screen appears.
$inline[badge,Note,primary] In case of CDW onboarding (Snowflake Environment) Table names are converted to upper case once they are saved. In order to create tables with case sensitive names, please enter the table names within quotes.
After configuring the required details, click Save.
The left panel displays the list of the tables created. Click on the table name, to view/edit the table schema, and to view the sample data. Click the edit icon corresponding to the table name, to edit the table configuration.
Configure Synchronization
To configure the table, select Configure Synchronization, and enter the following fields under Ingestion Configuration.
Merge Details
Enter the following fields under Merge Details.
Edit User
You can either edit the user details for the current user or a new user.
- Select either Current User or Different User.
- Enter the E-mail.
- Enter the refresh Token available under My Profile -> Settings.
- Click Save.
Schedule Details
If you select Scheduler under Merge details, then you can set the recurrence details as follows.
Recurrence Type: Select one of the below mentioned recurrence types. By default, the recurrence type is set to Daily.
- Only Once
- By minutes
- Hourly
- Daily
- Weekly
- Monthly
Effective duration: Enter the effective start date of the schedule.
$inline[badge,NOTE,primary] If a scheduled job overlaps another running job, then it will be queued until the running job is completed.
Target Configuration
Enter the following fields under Target Configuration.
After configuring the required details, click Save to save the settings.
Adding a column to the table
After metadata crawl is complete, you have the flexibility to add a target column to the table.
Target Column refers to adding a target column if you need any special columns in the target table apart from what is present in that source.
You can select the datatype you want to give for the specific column
You can select either of the following transformation modes: Simple and Advanced
Simple Mode
In this mode, you must add a transformation function that has to be applied for that column. Target Column with no transformation function applied will have null values in the target.
Advanced Mode
In this mode, you can provide the Spark expression in this field. For more information, refer to $link[page,321319,Adding Transform Derivation,editing-table-schema].
$inline[badge,NOTE,primary] When table is in ready state (already ingested), schema editing is disabled.
Onboard Data
Perform the following steps to onboard data from Confluent Cloud:
- Click Onboard Data tab.
- Select the required table(s), and click Start to start streaming the data.
$inline[badge,Note,primary]: You may also stop the data streaming by clicking the Stop button. The Truncate button allows you to delete a table. On clicking Start, the following screen appears:
- Fill in the required details and then click Ingest. Ensure that the cluster template setup is configured for your source.
- Click Click here to track progress link to view the ingestion status. This takes a few minutes. On clicking the link, job status and summary is displayed on the tab.
- Click the Ingestion metrics tab to view the in-details summary of the job. This tab is equipped with helpful filters.
$inline[badge,NOTE,primary] Incase the data plane application remains in RUNNING state after the job is stopped, the data application has likely stopped but the RUNNING status is incorrect. You can verify that the job is stopped by observing the number of batches does not increase in the job status over a short period of time. To mitigate this issue, you can set the following configuration in conf.properties but note that it will affect all jobs: iw_job_cancel_pod_deletion_delay=15
Configuration Migration
$inline[badge,NOTE,primary] The configuration for the tables that are in ready state, will not be migrated.
For details on configuration migration process, see Configuration Migration.
Advanced Configuration
The configuration fields for Confluent Cloud are as follows:
$inline[badge,NOTE,primary] If it is of format key1=value1;key2=value2;key3=value3 where ';'' can be changed by setting entry_delimiter_streaming and '='' can be changed by setting key_value_delimiter_streaming in the advance configurations.
Subscribers
For more information on subscribers, see Subscribers.
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
© UNIPHORE TECHNOLOGIES 2025 | Confidential