v5.3.0

Date of Release: August 2022

Infoworks 5.3 automates onboarding data directly to BigQuery and supports transformation of data in BigQuery.


	This section consists of the new features and enhancements introduced in this release.

Configuring a BigQuery Environment: Infoworks now supports configuring an environment with BigQuery as a data warehouse. You may configure Google Dataproc as associated compute clusters that are used to onboard data into BigQuery. For more details, see Configuring Infoworks with BigQuery.
Onboarding Data directly to BigQuery: Infoworks supports onboarding data directly to BigQuery from a variety of sources including RDBMS, files, streaming sources and SaaS systems. There is no need for persistent data lake storage. For more details, see General Configurations (BigQuery).
Metadata Crawl for BigQuery: Infoworks now supports crawling metadata of existing BigQuery tables so that they can be used in pipelines downstream and in conjunction with tables ingested from other sources. For more details, see Metadata Crawl for BigQuery.
Transformation in BigQuery: Infoworks 5.3 supports data transformation in BigQuery using SQL pushdown. Target node in BigQuery pipelines may overwrite, append and merge the data into an existing schema/dataset/table in BigQuery. For more details, see auto$.
Google Dataproc Support: Infoworks 5.3 supports Google Dataproc 2.0 for BigQuery data environment. For data lake environments, Dataproc 1.5 continues to be supported.
Operations Analyst Role: Infoworks has now introduced a separate role for an operations analyst who monitors and manages production data management tasks/jobs. These users may use the Operations Dashboard to monitor jobs. For more details, see Managing Infoworks Users.
Operations Dashboard Support: Infoworks now includes a specialized dashboard for the Operations Analyst role. This dashboard provides a focused view of Workflows, Onboarding Jobs, and Pipeline Builds with advanced filters and drill-down capabilities to quickly identify problem areas such as failed workflows and jobs, and jobs that are taking a long time. For more details, see Operations Analyst Dashboard.
Onboarding Fixed-width Structured File: Infoworks allows you to ingest data from fixed-width structured file formats. You can fetch the fixed-width structured files from DBFS, SFTP, and cloud storage. For more details, seeOnboarding Data from Fixed-width Structured Files.
Snowflake Connector: Infoworks now supports ingesting data from a Snowflake data warehouse into other data environments in a scalable and parallelized way. For more details, see Onboarding Data from Snowflake Source.
Support for Schema Evolution while Synchronizing Data to External Target: Infoworks now enables synchronizing exported table schema with source table schema. For more details, see Synchronizing Data to External Target.
Support Schema Evolution for delimited (CSV) file sources: You can now enable or disable the automatic detection of schema drift and take relevant action based on your business needs. For more details, see Configuring a table.
Scalability Improvements: This release reduces resource utilization and provides increased scalability.
Support for Metadata Backup on Cloud Storage Bucket: Infoworks now allows you to store metadata backups directly on a cloud storage bucket (GCS/S3/WASB) in addition to local VM storage. For more details, seeMetadata Backup.
Associating Custom Tags with Job clusters: Infoworks now facilitates associating custom tags with job clusters to support business use cases like drill down on the compute cost based on tags for internal cost centers. For more details, see Managing Custom Tags.
Moved Target Data Connections Under Admin & Operations: Target Data Connections have been moved to the Admin and Operations section to enable re-use of connection settings across platform that spans across data environments. This enables admins to set up the configuration that data engineers can re-use.

Resolved Issues

This section consists of the resolved issues in this release:

JIRA ID	Issue	Severity
IPD-18128	Metadata recrawl jobs are blocked when any pipeline (that has any source tables as a source node) is in running state.	Highest
IPD-18236	Customer-Managed Encryption Keys (CMEK) are not parsed and passed while creating the Dataproc cluster from Infoworks.	Highest
IPD-18289	Pipeline merge for a BigQuery target is failing when there is numeric in the Project ID.	Highest
IPD-18311	Error appears while adding the table due to mandatory `table_type` key.	Highest
IPD-17553	Unable to download cluster logs from Infoworks console.	High
IPD-17944	Consumer Group ID is not configurable for Confluent Kafka.	High
IPD-17968	After ingestion, the table status is not changing to "Data Ingested".	High
IPD-17970	The snowflake export pipeline is failing when run as part of a workflow.	High
IPD-18068	The Initialisation Action Timeout field is not configurable for GCP Dataproc cluster creation.	High
IPD-18067	When Dataproc job submission and cluster creation happens simultaneously, it exceeds the timeout (5 minutes) which results in job failure.	High
IPD-18075	Hive Metadata Sync source allows users to onboard the same table multiple times.	High
IPD-17954	The "Add_tables_Source" API for RDBMS fails with the error "One or more table(s) do not exist at the source or are already added."	High
IPD-17988	API Endpoint unable to update the key: “snowflake_warehouse”: “TEST_WH” in pipelines using API.	High
IPD-18171	Import Pipeline configurations is not mapping Data connection for Snowflake target.	High
IPD-18167	Jobs with completion time less than 30 seconds cannot be cancelled.	High
IPD-18124	Operational issues in the customer environment as multiple drivers are in running state.	High
IPD-18524	Configuring connections using v3 REST API is unable to set the Snowflake warehouse.	High
IPD-17737	Dataproc jobs submitted by Infoworks for sample data are not completing sometimes.	High
IPD-18475	Browse source is not fetching any tables for Db2 for z/OS source.	Medium
IPD-17532	UI is displaying sensitive sample data globally.	Medium
IPD-18559	Table status does not change to "Data Ingested" after empty table's incremental TPT ingestion.	Medium

Known Issues

The following section contains limitations that Infoworks is aware of, and is working on a fix in an upcoming release:

JIRA ID	Issue	Severity
IPD-18583	Some of the Running Workflows fail when Postgres Master goes down.	Medium
IPD-18692	BigQuery pipeline fails at target node validation when source table name contains numeric or special character.	Medium

Limitations

Export of target with complex data type is not supported yet, however tables with primitive data types can be exported using sync to target. For exports from the pipeline targets, the complex data types can be parsed, converted, and then exported.

Installation

For VM based installation procedures, see Infoworks Installation on GCP.

For more information, contact support@infoworks.io.

Upgrade

For upgrading from lower versions to 5.3.0, see Upgrading to 5.3.0.

PAM

For the PAM, see Product Availability Matrix.

Last updated on

Was this page helpful?