Date of Release: October 2022
| |
---|
This section consists of the new features and enhancements introduced in this release.
|
- Support for AKS: Infoworks’ control plane may now be installed and used on Azure Kubernetes Service (AKS). The deployment model enables failover of Infoworks pods, providing better availability and scalability when a large number of concurrent jobs and workflows are executed. For more details, refer to Infoworks Installation on Azure Kubernetes Service (AKS).
- Onboarding Data from Vertica: Infoworks supports onboarding data from Vertica data platform.
- Ability to Exclude Columns during Ingestion (Column Projection): Infoworks now provides the ability to ingest a table with a selected subset of columns. You can now choose to exclude certain columns before the ingestion job is submitted. For more details, refer to Configuring a Table.
- Added Advanced Mode in In NotIn: Infoworks now allows multiple columns and data modifying expressions for inner and outer ports. This feature is supported on Spark, Snowflake, and BigQuery execution engines. For more details, refer to Performing In NotIn Operation.
- Streaming Deserializers: Infoworks now allows you to implement your own deserializer for deserializing the byte serialized messages from streaming sources such as Kafka and Confluent. For more details, refer to Configuring Deserializers.
- Support for Bash Scripts on Kubernetes using Custom Images: When running bash scripts from workflows in a Kubernetes deployment, you may use custom Kubernetes containers based on images that may include libraries and tools used by the bash script. For more details, see Bash Scripts in Kubernetes using Custom Images.
- Added Staging Names for CDW Target Nodes: Infoworks now allows you to create views in Staging database (Snowflake) and Staging Dataset (BigQuery) environment. For more details refer to Snowflake and BigQuery targets.
- Support for Readable Column Aliases in Pipeline Generated Queries: A new key called
dt_use_iwx_column_aliasing
is added. When this key is set to false, Infoworks will use original column names as aliases. For more information, refer to Setting Pipeline Advanced Configurations.
Resolved Issues
This section consists of the resolved issues in this release:
JIRA ID | Issue | Severity |
---|
IPD-18695 | File Preview/Schema Crawl fails for CSV ingestion from S3 Buckets hosted on Gov Cloud. | High |
IPD-18623 | The DISTINCT function in SQL is not getting imported properly in Infoworks Pipelines | Highest |
IPD-18777 | Direct incremental ingestion to existing table on snowflake without any audit column is not allowed. | Highest |
IPD-18835 | The user managed table configuration could not enabled due to "is_table_user_managed" not being present. | Highest |
IPD-18911 | The REST API doesn’t support “HIVE_UDF” source extensions for columns. | Highest |
IPD-18921 | The "in_notin" node under pipelines does not support multiple columns. | Highest |
IPD-18931 | Infoworks metacrawl job for CSV source fetches incorrect number of columns. | Highest |
IPD-19474 | The API POST call to Pipeline Config-Migration fails. | Highest |
IPD-19542 | When running ingestion on BigQuery environment, error table is not getting created on BigQuery dataset if the source has only one error record. | Medium |
IPD-19579 | The workflow variable "job_id" is not being preserved. Fetching it inside other nodes will return None. | High |
Known Issues
The following section contains limitations that Infoworks is aware of, and is working on a fix in an upcoming release:
JIRA ID | Issue | Severity |
---|
IPD-19633 | Sometimes, when you try to import the SQL file via SQL import section in pipeline settings page, the pipeline version for the corresponding SQL file is successfully created, but the application gets incorrectly directed to the mappings page. However, this is an intermittent issue.
Workaround: Go to the Pipeline overview page and check if the version is created. If not, try importing again.
| High |
Limitations
- Limitations for Databricks Compute’s "Enable Elastic Disk" Option: Elastic Disk is available only on Amazon AWS Databricks. Enabling or Disabling this option for Databricks on GCP or Azure will have no effect. On AWS Databricks, if Instance Pool is being used in Ephemeral Compute, then this option will simply be ignored as AWS doesn’t support Elastic Disk along with Instance Pool.
Installation
For Kubernetes-based installation, refer to Infoworks Installation on Azure Kubernetes Service (AKS).
For more information, contact support@infoworks.io.
Upgrade
For upgrading from 5.3.0 VM to 5.3.1 Kubernetes, refer to Upgrade to 5.3.1 Kubernetes.
PAM
For the PAM, see Product Availability Matrix.