Title
Create new category
Edit page index title
Edit category
Edit link
Onboarding Data from Kafka
Kafka Ingestion
Features
- Ingest data from multiple topics into single or multiple tables via Topic Mapping.
- Ingest data in Batches or Real Time, using Spark Streaming.
- View snapshot of the configured topics under Topic Preview in Raw or Structured format.
- Visual Schema Projection to create tables from a subset of data.
Ingestion Flow
- Infoworks creates a Kafka consumer from the topics based on the user provided configurations.
- The consumer reads records from topic(s) based on the configuration provided under Topic Mappings.
- A configurable number of messages are read, and the schema for those records is crawled.
- Messages are crawled using Spark Structured Streaming, which converts messages into dataframe, and are appended continuously.
- The Value field in each message is parsed as JSON, and the detected schema is applied.
- Based on the configurations like storage format, and path to the output directory, a delta target table is populated.
Prerequisites and Considerations
- Infoworks supports Kafka records in JSON format only.
- Kafka Ingestion cannot be integrated into a workflow as it is a near real time continuously streaming job.
Creating a Kafka Source
For creating a Kafka source, see $link[page,252043,Creating Source,creating-source]. Ensure that the Source Type selected is Kafka.
Configuring a Kafka Source
Click the Data Catalog menu and click the Ingest button for the source you created.
The configuration flow is organised into four tabs as follows:
$inline[badge,NOTE,primary] Configure details in each tab. Further, click on the next tab name displayed on the top of the window, to complete the Kafka configuration.
Set Up Source
The Setup Source screens is as follows:
Source Tab Fields and Description
Target Fields and Description
After entering the required details, click the Save and Test Connection button to save the settings and ensure that Infoworks is able to connect to the source system. Clicking on Save and Test Connection button also ensures that the Topics list is populated with the suggestions while configuring Topic Mappings in the next step.
Other sections available under Source setup tab are:
- $link[page,252059,Configuration Migration,configuration-migration]
- $link[page,252059,Advanced Configurations,advanced-configurations]
- $link[page,252059,Subscribers,subscribers]
Now, click the Topic Mappings tab.
Topic Mappings
This tab allows you to subscribe to required topics using list or regex patterns, and map it to single or multiple tables.
You may preview the messages in the topics subscribed, crawl them, and configure tables.
Click the Topic Mapping button to configure topics. The following window appears:
Topic Mapping Fields and Description
After adding the required topics, click Save to save the configured settings.
The following tabs are displayed in this window that appears:
- $link[page,252059,Topic Preview Tab,topic-preview-tab]
- $link[page,252059,Schema Tab,schema-tab]
Topic Preview tab
The Topic Preview tab allows you to quickly view the snapshot of the topics.** **
Click the + icon corresponding to every message displayed, to preview the content, and then click Crawl Schema button.
The maximum number of rows of messages for which preview can be made available can be configured in topic_preview_row_count parameter of the Advanced Configuration section. The default value is 100 rows.
In the topic preview, two views: Raw(displays the content as it is read) and Pretty(displays the content in a structured format) are available.
$inline[badge,Note,primary] If the Topic Preview is not populated, please check the broker connection details under source configuration or check the list or regex entered in the Topics Mapping page.
Schema tab
Schema tab is then displayed as follows:
Select any path to create a table with the corresponding columns in it. Click on the Required Node. Further, you can also hold Shift key and click on multiple required nodes. The children nodes of all the parent nodes selected become table columns.
For example, If you select only the address in the example above, the table created will consist of four columns: street address, city, state and postal code.
To manage the nodes in the schema, you may use the Add Node, Edit Node, and Remove Node (same as name suggests) buttons on the top-right corner of the tab. To revert the edits in schema, recrawl it from the Topic Preview tab.
Click Create Table, to create the table with the selected nodes. The following window is displayed:
Create Table Field Description
$inline[badge,Note,primary] In case of CDW onboarding (Snowflake Environment) table names are converted to upper case once they are saved. In order to create tables with case sensitive names, please enter the table names within quotes.
After configuring the required details, click Save.
The left panel displays the list of the tables created. Click on the table name, to view/edit the table schema, and to view the sample data. Click the edit icon corresponding to the table name, to edit the table configuration.
Now, navigate to the Configure Tables tab.
Configure Synchronization
For configuring Kafka tables, select the required table, and then enter required details.
Configure Table Field Description
Merge Details
Enter the following fields under Merge Details.
Edit User
You can either edit the user details for the current user or a new user.
- Select either Current User or Different User.
- Enter the E-mail.
- Enter the refresh Token available under My Profile -> Settings.
- Click Save.
Schedule Details
If you select Scheduler under Merge details, then you can set the recurrence details as follows.
Recurrence Type: Select one of the following recurrence types. The default recurrence type is Daily.
- Only Once
- By minutes
- Hourly
- Daily
- Weekly
- Monthly
Effective duration: Enter the effective start date of the schedule.
$inline[badge,NOTE,primary] If a scheduled job overlaps another running job, then it will be queued until the running job is completed.
For more information on table configuration, see $link[page,252078,auto$].
After configuring the required details, click Save to save the settings.
Target Configuration
Enter the following fields under Target Configuration.
Adding a column to the table
After metadata crawl is complete, you have the flexibility to add a target column to the table.
Target Column refers to adding a target column if you need any special columns in the target table apart from what is present in that source.
You can select the datatype you want to give for the specific column
You can select either of the following transformation modes: Simple and Advanced
Simple Mode
In this mode, you must add a transformation function that has to be applied for that column. Target Column with no transformation function applied will have null values in the target.
Advanced Mode
In this mode, you can provide the Spark expression in this field. For more information, refer to $link[page,252078,Adding Transform Derivation,editing-table-schema].
$inline[badge,NOTE,primary] When table is in ready state (already ingested), schema editing is disabled.
Onboard Data
Perform the following steps to onboard data from Kafka:
- Click Onboard Data tab.
- Select the required table(s), and then click Start to start streaming the data.
- You may also stop the data streaming by clicking the Stop button. The Truncate button allows you to delete a table.
- On clicking Start, the following window appears:
- Fill in the required details and then click Ingest. Ensure that the cluster template setup is configured for your source. For more information on field values, see section in the topic.
The following window appears:
Click the Click here to track progress link to view the ingestion status. This takes a few minutes. On clicking the link, job status and summary is displayed on the tab.
Click the Ingestion metrics tab to view the in-details summary of the job. This tab is equipped with helpful filters.
This summarises the complete Kafka ingestion process.
Configuration Migration
$inline[badge,NOTE,primary] The configuration for the tables that are in ready state, will not be migrated.
For details on configuration migration process, see $link[page,252044,Configuration Migration,configuration-migration]
Advanced Configurations
For setting up advanced configuration, see $link[page,252044,Advanced Configurations,setting-advanced-configuration].
Subscribers
For more information on subscribers, see $link[page,252044,Subscribers,setting-ingestion-notification-services].
Limitations
- Non-struct nodes cannot be selected as the root element of the table. For example, nodes such as id, or type, cannot be selected to create tables.
- Two different struct nodes which are not directly connected cannot be used to create table columns. For example, nodes such as item and batter, cannot be selected to create the same table.
- Two struct nodes at the same level, cannot be selected to create the same table. For example, nodes such as batters and topping, cannot be selected to create the same table.
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
© UNIPHORE TECHNOLOGIES 2025 | Confidential