v5.1.0

Date of Release: 1 October 2021

New Features and Enhancements

Component: Onboard Data

Onboard Data from Mainframe Datafiles: Infoworks now supports onboarding data from complex Mainframe Datafiles (COBOL Copybook), to derive business insights from the data. For more information, see Onboarding Data from Mainframe Datafiles (COBOL Copybook).
Onboard Data from Db2 for AS/400 connector: Infoworks enables support for onboarding data from Db2 for AS/400 connector. For more information, see Onboarding Data from Db2 for AS/400 Connector.
Managing Job Hooks: The Job hooks are used to execute predefined scripts either before or after the data is onboarded. Using Job hooks, you can perform additional steps such as encrypting or decrypting source files, moving or deleting files, integrating with third party tools, and so on. The scripts are written in Bash or Python 3.x and are executed on the data plane (compute cluster) where the job runs. For more information, see Managing Job Hooks.

Component: Admin and Operations

Infoworks Installation on Kubernetes: Infoworks can now be installed on Kubernetes. The installation enables failover of Infoworks nodes that provides better availability and scalability when a large number of concurrent jobs and workflows are executed.
Custom Audit Columns for Sources and Pipelines: This feature allows you to add the additional columns to all new source targets or pipeline targets, for auditing purposes. It helps to understand lineage, ensure compliance and troubleshoot problems. For more information, see Custom Audit Columns.
Authentication Mechanism: This feature allows you to configure the authentication mechanism to define the system role credentials or service account credentials to access the cloud resources. For more information, see Configuring Infoworks with GCP Dataproc and Configuring Infoworks with Amazon EMR.
Support for Persistent Clusters in Databricks: Infoworks now supports multiple persistent clusters for Databricks environments. The job submission can be done on both ephemeral as well as persistent clusters. Running jobs on persistent clusters helps in avoiding delays caused due to cluster initialization and ensures better administrative control of the cluster operations and costs.For more information, see Configuring Infoworks with Azure Databricks and Configuring Infoworks with AWS Databricks.
Manage Cluster Actions using Workflows: This feature provides the flexibility to execute a business use case on a dedicated persistent cluster and control the cluster operations using Manage Cluster Actions node in workflows. You can now launch a new persistent cluster to run a collection of logical jobs and then terminate that cluster post execution, managing the infrastructure cost. You can configure this node to create, terminate, start, and stop the persistent clusters. For more information, see Manage Cluster Actions.
Support for Multi Tenant Cluster Deployments in GCP Dataproc: Infoworks now allows cluster compute (and storage, if desired) to be configured in department-specific GCP projects to enable infrastructure separation. This feature allows you to execute jobs with clusters in one project and the edge node and metastore in another project. Customers can now co-ordinate infrastructure use and scaling and manage payments and chargebacks. For more information, see Configuring Infoworks with GCP Dataproc.
Configure EMR environments using Instance Profile: This feature allows you to assign the EC2 instance profile instead of IAM role while creating the environment template on EMR, aligning with the enterprise security policies for managing infrastructure. For more information, see Configuring Infoworks with Amazon EMR.
Support for AWS Glue as a Metastore: Infoworks now allows you to configure and use AWS Glue Catalog as a metastore in the AWS EMR Environment Templates. The feature allows you to use Glue Catalog as a unified solution throughout the enterprise ecosystem, which enables defining business use cases spanning/combining data across applications. For more information, see Configuring Infoworks with Amazon EMR.
Spark Configuration Improvement: Infoworks now simplifies the spark configuration by providing a single configuration key across Ingestion and Transformation jobs. For more information, see General Configuration.
Restricted Source Visibility: This feature supports domain-based data source visibility. This feature allows you to configure a source as Private or Public, to determine if the data is accessible by the source owner only or by other users. For more information, see Managing Access Control of Sources and General Configuration.
Support for Daylight Savings in Orchestrator Service: The orchestrator service has now been enhanced to auto detect daylight savings and adjust to the local time zone. For more information, see Configure Daylight Savings Time on Job Scheduler.
Installing MongoDB on a Separate Machine:Infoworks recommends hosting MongoDB on a separate dedicatedinstance other thanInfoworks Application server, as MongoDB service takes up a considerable amount of RAM due to in-memory caching. Also, the resource isolation between MongoDB service and otherInfoworks platform services improves the performance. For more information, see Installing MongoDB.

Limitations: Infoworks Installation on Kubernetes

Automated scaling of IWX services is not supported. Each Infoworks service runs in its own Pod. When the load on service instance(s) increases, these can be scaled up/down by executing commands manually.

NOTE Cluster node resources can be scaled automatically based on the number of jobs running in parallel.

Infoworks services are not tested for HA setup.
Postgres
- For Postgres, the master-slave selection during failover can not be automated. It has to be set-up manually.
- Airflow does not allow configurations to select where the reads / writes should be directed to.
MongoDB
- Base setup using ReplicaSet is supported where customer can configure the odd number of nodes. However, no other configurations can be overridden.
- Sharding is not supported. To enable HA through sharding, you must configure it externally and not through Infoworks.
Migration of scripts being used in existing bash nodes is not supported in Kubernetes deployments. However, it is supported for new installations, where new bash scripts will be authored.
Source and Pipeline extensions: Absolute system path should is not allowed in Kubernetes deployments. Users must always upload these extension files. However, you can provide a relative path (for example, by using $IW_HOME), for the extensions to run on data plane.

Known Issues

Limitation in connecting to RabbitMQ in Azure Kubernetes and AWS Kubernetes environments.
Custom Targets and Custom Transformations are not supported in Kubernetes deployments.
Details of the job hooks added using APIs are not available under "View Audit" section in the Infoworks UI.
Success message appears instead of error message on failure to add pre/post ingestion job using API.
Failure to fetch job status for spark configurations keys with invalid/unsupported characters. (for example: if the key contains a dot ".")
Inability to cancel jobs running on Databricks cluster in Azure Kubernetes server.
Failure of interactive jobs in Databricks environment when the spark service is disabled on the cluster.
Test Connection fails on Databricks cluster in Azure Kubernetes Environment.

Installation

For the Installation procedure, see auto$ , auto$ and auto$.

For more information, contact support@infoworks.io.

Upgrade

For upgrading from lower versions to 5.1, see Upgrading to 5.1.

PAM

For the PAM, see Product Availability Matrix.

Last updated on

Was this page helpful?