Date of Release: August 2022
Infoworks 5.3 automates onboarding data directly to BigQuery and supports transformation of data in BigQuery.
| |
---|
This section consists of the new features and enhancements introduced in this release.
|
- Configuring a BigQuery Environment: Infoworks now supports configuring an environment with BigQuery as a data warehouse. You may configure Google Dataproc as associated compute clusters that are used to onboard data into BigQuery. For more details, see Configuring Infoworks with BigQuery.
- Onboarding Data directly to BigQuery: Infoworks supports onboarding data directly to BigQuery from a variety of sources including RDBMS, files, streaming sources and SaaS systems. There is no need for persistent data lake storage. For more details, see General Configurations (BigQuery).
- Metadata Crawl for BigQuery: Infoworks now supports crawling metadata of existing BigQuery tables so that they can be used in pipelines downstream and in conjunction with tables ingested from other sources. For more details, see Metadata Crawl for BigQuery.
- Transformation in BigQuery: Infoworks 5.3 supports data transformation in BigQuery using SQL pushdown. Target node in BigQuery pipelines may overwrite, append and merge the data into an existing schema/dataset/table in BigQuery. For more details, see auto$.
- Google Dataproc Support: Infoworks 5.3 supports Google Dataproc 2.0 for BigQuery data environment. For data lake environments, Dataproc 1.5 continues to be supported.
- Operations Analyst Role: Infoworks has now introduced a separate role for an operations analyst who monitors and manages production data management tasks/jobs. These users may use the Operations Dashboard to monitor jobs. For more details, see Managing Infoworks Users.
- Operations Dashboard Support: Infoworks now includes a specialized dashboard for the Operations Analyst role. This dashboard provides a focused view of Workflows, Onboarding Jobs, and Pipeline Builds with advanced filters and drill-down capabilities to quickly identify problem areas such as failed workflows and jobs, and jobs that are taking a long time. For more details, see Operations Analyst Dashboard.
- Onboarding Fixed-width Structured File: Infoworks allows you to ingest data from fixed-width structured file formats. You can fetch the fixed-width structured files from DBFS, SFTP, and cloud storage. For more details, seeOnboarding Data from Fixed-width Structured Files.
- Snowflake Connector: Infoworks now supports ingesting data from a Snowflake data warehouse into other data environments in a scalable and parallelized way. For more details, see Onboarding Data from Snowflake Source.
- Support for Schema Evolution while Synchronizing Data to External Target: Infoworks now enables synchronizing exported table schema with source table schema. For more details, see Synchronizing Data to External Target.
- Support Schema Evolution for delimited (CSV) file sources: You can now enable or disable the automatic detection of schema drift and take relevant action based on your business needs. For more details, see Configuring a table.
- Scalability Improvements: This release reduces resource utilization and provides increased scalability.
- Support for Metadata Backup on Cloud Storage Bucket: Infoworks now allows you to store metadata backups directly on a cloud storage bucket (GCS/S3/WASB) in addition to local VM storage. For more details, seeMetadata Backup.
- Associating Custom Tags with Job clusters: Infoworks now facilitates associating custom tags with job clusters to support business use cases like drill down on the compute cost based on tags for internal cost centers. For more details, see Managing Custom Tags.
- Moved Target Data Connections Under Admin & Operations: Target Data Connections have been moved to the Admin and Operations section to enable re-use of connection settings across platform that spans across data environments. This enables admins to set up the configuration that data engineers can re-use.
Resolved Issues
This section consists of the resolved issues in this release:
JIRA ID | Issue | Severity |
---|
IPD-18128 | Metadata recrawl jobs are blocked when any pipeline (that has any source tables as a source node) is in running state. | Highest |
IPD-18236 | Customer-Managed Encryption Keys (CMEK) are not parsed and passed while creating the Dataproc cluster from Infoworks. | Highest |
IPD-18289 | Pipeline merge for a BigQuery target is failing when there is numeric in the Project ID. | Highest |
IPD-18311 | Error appears while adding the table due to mandatory table_type key. | Highest |
IPD-17553 | Unable to download cluster logs from Infoworks console. | High |
IPD-17944 | Consumer Group ID is not configurable for Confluent Kafka. | High |
IPD-17968 | After ingestion, the table status is not changing to "Data Ingested". | High |
IPD-17970 | The snowflake export pipeline is failing when run as part of a workflow. | High |
IPD-18068 | The Initialisation Action Timeout field is not configurable for GCP Dataproc cluster creation. | High |
IPD-18067 | When Dataproc job submission and cluster creation happens simultaneously, it exceeds the timeout (5 minutes) which results in job failure. | High |
IPD-18075 | Hive Metadata Sync source allows users to onboard the same table multiple times. | High |
IPD-17954 | The "Add_tables_Source" API for RDBMS fails with the error "One or more table(s) do not exist at the source or are already added." | High |
IPD-17988 | API Endpoint unable to update the key: “snowflake_warehouse”: “TEST_WH” in pipelines using API. | High |
IPD-18171 | Import Pipeline configurations is not mapping Data connection for Snowflake target. | High |
IPD-18167 | Jobs with completion time less than 30 seconds cannot be cancelled. | High |
IPD-18124 | Operational issues in the customer environment as multiple drivers are in running state. | High |
IPD-18524 | Configuring connections using v3 REST API is unable to set the Snowflake warehouse. | High |
IPD-17737 | Dataproc jobs submitted by Infoworks for sample data are not completing sometimes. | High |
IPD-18475 | Browse source is not fetching any tables for Db2 for z/OS source. | Medium |
IPD-17532 | UI is displaying sensitive sample data globally. | Medium |
IPD-18559 | Table status does not change to "Data Ingested" after empty table's incremental TPT ingestion. | Medium |
Known Issues
The following section contains limitations that Infoworks is aware of, and is working on a fix in an upcoming release:
JIRA ID | Issue | Severity |
---|
IPD-18583 | Some of the Running Workflows fail when Postgres Master goes down. | Medium |
IPD-18692 | BigQuery pipeline fails at target node validation when source table name contains numeric or special character. | Medium |
Limitations
- Export of target with complex data type is not supported yet, however tables with primitive data types can be exported using sync to target. For exports from the pipeline targets, the complex data types can be parsed, converted, and then exported.
Installation
For more information, contact support@infoworks.io.
Upgrade
For upgrading from lower versions to 5.3.0, see Upgrading to 5.3.0.
PAM
For the PAM, see Product Availability Matrix.