Infoworks Release Notes
Release Notes

v5.2.0

Date of Release: May 16, 2022

New Features and Enhancements

Infoworks now automates onboarding of data directly to Snowflake and supports data transformation in Snowflake (SQL pushdown). Infoworks can orchestrate data management across hybrid environments including Snowflake and data lakes.

Component: Platform

  • Databricks Support on GCP: Infoworks now supports Databricks on GCP. The Databricks version supported is 9.1 LTS across all environments - Azure, AWS and GCP. For more details, see Configuring Infoworks with Databricks on GCP.
  • Configuring a Snowflake Data Environment: Infoworks enables you to easily configure and manage the Snowflake connection, data warehouses and related compute and storage resources needed to run data onboarding and preparation jobs. The associated Spark compute may be one of Azure Databricks, AWS Databricks or Databricks on GCP. For more details, see Configuring Infoworks with Snowflake.
  • OAuth Support for Snowflake Environments: This feature allows you to configure OAuth as an authentication mechanism for Snowflake connectivity. Infoworks supports the OAuth service provided by Snowflake and Azure AD as an external authorization provider. For more details, seeConfiguring Infoworks with Snowflake.
  • Support for Hybrid Deployments: Infoworks now supports the orchestration of data management tasks across data lake and data warehouse environments. An Infoworks domain may span multiple data environments. Infoworks also supports the migration of pipelines between environments, for example, from a data lake to a cloud data warehouse. For more details, see Managing Domains.

Component: Onboard Data

  • Onboard Data directly to Snowflake: Infoworks now supports onboarding data directly to Snowflake from a variety of sources including RDBMS, files and SaaS systems. For more details, see Onboarding an RDBMS Source.
  • Support for SCD on Snowflake Target in a Snowflake Environment: Snowflake onboarding now supports SCD options including SCD1 and SCD2. For more details, see Configuring a table.
  • Support to Update Teradata JDBC driver from Outside Infoworks: Infoworks provides the support to update the latest Teradata JDBC driver file downloaded externally. For more details, see Upgrading to 5.2.0.
  • Streaming Support in Snowflake Environment: Infoworks now supports onboarding data from streaming sources, such as, Kafka and Confluent to Snowflake. For more details, see Onboarding Data from Kafka and Onboarding Data from Confluent Cloud.
  • Support for Adding Columns to CSV Table: Infoworks now supports adding columns to the csv table after schema is inferred. This feature allows you to define source columns or custom target columns which might be useful in the downstream of the data. For more details, see Onboarding Data from Structured Files.
  • SSH Key based Authentication for Ingestion using SFTP: This feature supports authenticating CSV, JSON, and Mainframe data files sources using SSH keys when onboarding data from a remote server via SFTP. For more details, see the Onboarding Data page for JSON, Structured files, and Mainframe Data Files.
  • Support for reading the CSV data from ADLS Gen 2: Infoworks now supports reading the CSV data from ADLS gen 2 in addition to Blob. For more details, see Onboarding Data from Structured Files.

Component: Prepare Data

  • Snowflake Pushdown: Infoworks now supports automated transformation pipelines that runs in Snowflake warehouses. Infoworks automates the generation and execution of the required SQL code (SQL pushdown). Transformed data may be written to existing Snowflake tables in append, overwrite, or merge modes. For more details, see Snowflake Node Configurations.
  • Preview SQL: This feature allows you to preview the SQL code to be executed in order to transform data in Snowflake. For more details, see Node Settings.
  • SQL Import to Pipelines: Infoworks supports SQL import into pipelines that run in a Snowflake environment.

Known Issues

JIRA IDIssue
IPD-17862Sync to target on Databricks interactive cluster may fail if both ingestion and pipeline are using same interactive cluster to run. To resolve the issue, refer to this document.
IPD-17596An error appears sometimes, when the Preview Data tab of a Snowflake pipeline is switched to a different one (such as, Dataproc) or vice-versa. To resolve this: a. Close the Preview Data tab, and open it again. b. If the prior resolution does not work, restart the DT services.
IPD-17907Pipeline build notification is not working for pipelines built on Snowflake environment.
IPD-17906Reference pipeline table state is not updated to "Crawling" during the first pipeline build.

Resolved Issues

JIRA IDIssueSeverity
IPD-16684Ingestion job is not merging "pending cdc" tables when there are no incremental records for the current run.Highest
IPD-16918Sync to target to postgres is failing for tables in incremental mode.Highest
IPD-16922Ingestion from SQL server jobs are failing while converting date and/or time from character string.Highest
IPD-16958Infoworks Test connection job is failing with enum constant error on Google Ads Connector.Highest
IPD-17318Ingestion service in Prod and Dev environment is crashing frequently.Highest
IPD-17370Fetch metadata API picks header rows count as 1 by default even when passed with 0 in body.Highest
IPD-16703Pipeline export configuration API is not setting/inserting key is_existing_dataset into metadata.High
IPD-16650Custom Audit columns not getting added on pipeline created through SQL-Import API.High
IPD-16759Pipeline with BigQuery target fails when the decimal datatype scale is greater than 9.High
IPD-16800Hive Arraystring column not getting ingested when exporting a table from a Hive metadata sync source to BigQuery.High
IPD-16917Primary Key and Indexes are missing after export from Infoworks 5.0 to Postgres.High
IPD-16924In 5.0 sync to Postgres target, enclosing the table name and column names in quotes while executing DDL causes them to become case sensitive in Postgres DB.High
IPD-16947Metadata crawl on hive metadata sync source is not working using API.High
IPD-16984Change in export configurations is overwriting existing target tableHigh
IPD-16971In TPT based Teradata ingestion, millisecond precision is not getting stored in mongo key last_ingested_cdc_value.High
IPD-16972BigQuery limitation on source URIs where it allows only 10K part filesHigh
IPD-16975Error message "Cannot read property 'toHexString' of null" appears when trying to import a source json file on a new source using config migration option.High
IPD-16994Sync table schema page in pipeline is not responding when the number of columns for the table is above 1200.High
IPD-17190Error message "table id doesn't exist in the source" appears while adding the table to table group.High
IPD-17077Refresh token for the user, created from API flow is not working.High
IPD-17418TPT ingestion jobs for Teradata VIEWs are randomly failing with an error indicating the length of a received record is greater than the defined length in the TPT script.High
IPD-16900Data Transformation interactive jobs are failing with connection refused error.Medium

Installation

For the Installation procedure, see Infoworks Installation on Azure , Infoworks Installation on AWS, and Infoworks Installation on GCP.

For more information, contact support@infoworks.io.

Upgrade

For upgrading from lower versions to 5.2.0, see Upgrading to 5.2.0.

PAM

For the PAM, see Product Availability Matrix.