v5.1.1

Date of Release: 31 October 2021

Onboarding Data from Teradata: While onboarding data from Teradata using JDBC, Infoworks now supports better split options for Derived Split Column. For every float, integer, or long value, the mod split function is performed on the number of connections and the data is distributed among the connections based on the mod value. The hashamp_hashbucket_hashrow split function divides the data based on the hash computed on the column selected by the user, buckets available and amps available. Using hashamp function, you can save the time of computing the mod. For more information, see the section Onboarding Data from Teradata.

Skipping of ORC/Parquet Conversion on Teradata TPT: Infoworks now supports skipping of data format conversion while ingesting the data from Teradata using TPT. The Datalake tables are created on the data dumped using TPT in CSV format. Implementing this feature significantly reduces the time taken for ingesting the tables from Teradata. You must set an advanced configuration at source level or global level to specify the default fetch mechanism for tables. The advanced configuration is used as default for the new tables that are onboarded to the source. For more information, see the section Onboarding Data from Teradata TPT.

Known Issues

JIRA ID	Issue Description
IPD-15446	Validation to restrict the change of cluster from single node to multi node or multi node to single node is not available in REST API.
IPD-15826	Metadata job status is shown as success in the GCP Console although the jobs are actually failed in the Dataproc interactive cluster.
IPD-15783	Validation on schema name and target table name in Browse tables is not available in REST API.
IPD-15899	Timestamp/DateTime column is not supported in SQL server export in Append and Merge mode. As a workaround, you can typecast timestamp/datetime to a string column and exclude the original column from the target node.
IPD-15904	For Postgres export target, the Append mode does not work if the schema is in the upper case.

Resolved Issues

JIRA ID	Issue	Issue Description	Severity
IPD-15635	Snowflake sync to target append and merge jobs are failing.	To run Sync to target jobs having sync type as Append or Merge, a table with the same database name, schema name, table name, and schema as configured in Sync to target configuration has to exist on Snowflake target. If the table does not exist on target, then you must manually create it.	High
IPD-15630	REST API - Error while creating AWS and GCP dataproc clusters with a single node.	Resolved the error while creating AWS and GCP Dataproc clusters with a single node through v3 RestAPI. The clusters are created successfully, with the Allow single node instance check box checked.	Highest
IPD-15568	IWX should not ask for the natural key column if the Sync To Target is Append mode.	The Natural Key column is now optional during the Sync to Target job to BigQuery, if the Sync Type is set to Append. This behavior applies for both sources and pipelines.	High
IPD-15564	Interactive jobs are failing on Databricks environment.	The Interactive jobs are failing on the Databricks environment. The interactive jobs are successful if the Spark service is enabled on the cluster.	High
IPD-15750	Deleting the compute cluster on the UI gives the message "Error in submitting delete request. Please try again".	In the Dataproc environment, if the cluster is not present and if you try to delete the cluster from Infoworks UI, then the Terminate API returns an error with the message "Error in submitting delete request. Please try again” appears. This issue is resolved by incorporating platform related changes.	Highest
IPD-15768	5.1.1 - Job hook scripts are unable to access 'SourceBasePath' variable for Json and CSV source when the source file is at DBFS location.	The 'SourceBasePath' variable is not accessible for JSON and CSV sources for Pre-hook and Post-hook jobs if the source file is located on DBFS for Azure Databricks. The same works fine for GCS and S3 bucket.	High
IPD-15781	Jobs failing when multiple segments are run in parallel, in segmented Load.	Jobs fail in segmented load ingestion when multiple segments are run in parallel.	High
IPD-15790	5.1.1 - With SQL import feature we are unable to edit DERIVE expression, change DERIVE column name or add a new DERIVE expression. “SAVE” button on UI does not highlight.	If a Pipeline build with SQL import has a DERIVE node, then any further edits to the ‘DERIVE’ node such as change in the expression or column name are not possible. You must manually add the “properties” key to edit the DERIVE expression or change the column name.	Highest
IPD-15789	5.1.1 - With SQL import feature we are unable to include Audit columns (ziw_filename, ziw_ file_modifed_timestamp) from source node in pipelines. ‘Include Columns’. button on UI does not highlight.	Audit columns cannot be included in a transformation pipeline generated using 'import SQL' feature whereas audit columns can be included in a transformation pipeline generated using GUI editor. To resolve this issue, you must delete the source node in the canvas and again add the same source after importing the SQL.	Highest
IPD-15825	Create table in UI breaks when decimal type node is added in Topic Mapping schema of json/kafka/conf cloud.	When the sqlType value is a decimal, the backend reads 'scale' and 'precision' keys from json_schema, instead of the ‘targetSchema' and 'targetPrecision' keys.	Medium
IPD-15824	Schema sync is not supported with (add exact advance config for csv bypass).	Though Infoworks does not support schema sync when TPT CSV bypass feature is enabled, the UI allows for the option and the jobs execute successfully.	Medium
IPD-15815	Ingestion job for table got timed out table group job marked as successful.	Though the ingestion job is marked as timed out, the Infoworks UI displays the job and workflow as successful. To resolve this issue, the timeout value must be set to a very high value so that jobs processing a large volume of data are not marked as timed out. Because the default timeout value is two hours, you can remove the timeout setting from Databricks default configuration to disable the timeout for Databricks jobs.	Highest

Limitations

The Timestamp/DateTime column is not supported in SQL server export in Append and Merge mode.
The schema format in the Postgres export target is case-sensitive for Append mode. The Append mode does not work if the schema is in the upper case.

Was this page helpful?

v5.1.1

New Features and Enhancements

Component: Onboard Data

Known Issues

Resolved Issues

Limitations

Installation

Upgrade

PAM