v5.4.1.13

Date of Release: February 2024

Enhancement

JIRA ID	Issue
IPD-25542*	Support Enhanced Flexibility Mode for Dataproc Clusters.
IPD-21466	Support for read isolation on sqlserver. NOTE Set key query_hint value READUNCOMMITTED at source/table Advanced Configuration for read isolation on sqlserver source.
IPD-24768	Disable / Hide the graph on the "Usage and License Details" page.
IPD-24010	“Custom Tags” field mandatory in the UI and Rest API.
IPD-27319	To capture additional logs.

Resolved Issues

NOTE The "*" symbol next to ID refers to the issues that have been resolved in the current release.

JIRA ID	Issue
IPD-25580*	Jobs failing randomly during cluster creation with File not found error after changing to Dataproc 2.0. NOTE The advanced configuration `dt_classpath_include_unique_jars=false` should be removed once upgraded to 5.4.1.13.
IPD-22878	Recrawl metadata throws "Provided Table Ids are not present in the Source"
IPD-24824	Enum Constant error Microsoft Access Source.
IPD-24751	Import workflow configuration API is not configuring the sync to target node properly.
IPD-24697	Workflow got completed but status is not getting updated.
IPD-24671	Delay in viewing pipeline list in Workflow editor pipeline task.
IPD-24636	Custom tag field is not present in Onboard Data page.
IPD-24511	`dt_bigquery_session_project_id` error on Dataproc.
IPD-24574	Pipeline Jobs list not loading and reporting error.
IPD-24556	The ingestion job completed its execution but the workflow task was marked as failed.
IPD-24503	Job execution time vs Cluster creation time.
IPD-24493	Workflow polling failed with "502 Gateway Time-out" Error.
IPD-24472	Pipeline job was completed but Workflow status wasn't updated.
IPD-24431	Segmentation ingestion job `job_object.json` file doesn't capture table details.
IPD-24430	Unable to delete or inactive the advanced key, if the key has trailing space.
IPD-24407	Export to BQ completed successfully, but pipeline job marked as failed.
IPD-24308	Jobs not getting picked up by Hangman.
IPD-24258	API Calls failed with "504 Gateway Time-out" Error.
IPD-23719	Pipeline versions are getting emptied on production.
IPD-24007	View run is not loading workflow run page beyond recent 20 workflow runs limit.
IPD-23613	Last Modified Date in workflow is shown incorrectly.
IPD-23636	Pipeline Jobs list not visible.
IPD-23663	File Archival process is not archiving files with only header records post upgrade to 5.4.
IPD-23647	The staging table created by Infoworks during pipeline build will be persisted in the BigQuery if the load job fails.
IPD-23754	Records were missing in the BigQuery target table even though the pipeline was completed successfully.
IPD-23822	Schema name is null/empty for incremental pipelines in BigQuery environment on using use_ingestion_time_as_watermark key
IPD-23831	Around 50% of the pipeline documents have pipeline id and active version id same in production.
IPD-23899	Refresh token not generated for newly created users in 5.4.1.6.
IPD-23515	Ingestion job on BQ environment is running queries on GCP project present in service account JSON instead of parent_project
IPD-23526	Issues while using partitioned externally created BQ table in our pipelines.
IPD-23452	Rest API - Bug in 5.4.1 config migration where few keys are in camelCase (should be in snake_case).
IPD-23418	BigQuery pipelines will not be failing with java.lang.IllegalArgumentException: Provided query is null or empty error after the upgrade.
IPD-23327	Fixed the Issue where Inactive spark advanced configs are taking effect in 5.4.1.4.
IPD-23263	Infoworks Scheduler not submitting jobs on high load time duration.
IPD-23498	Project ID field issue on the teradata source configured on the BigQuery data environment.
IPD-23216	Unable to unlock locked entities.
IPD-22964	The `connection.schema_registry_authentication_details.password` field is now part of the iw_migration script.
IPD-22983	For a source table fetched from a Confluent source with incremental mode as Append, the pipeline source query does not bring the entire dataset every time anymore irrespective of the provided query.
IPD-22963	For the 5.3.x and 5.4.x versions, when the Incremental Load is enabled and the Sync Type is set to append, the second build of the pipeline does not copy the duplicate records anymore.
IPD-22038	Fixed the unlock functionality for the Admin users.
IPD-22113	You can now create pipelines via the API by using Environment Name, Environment Storage Name, and Environment Compute Template Name.
IPD-22817	The`batch_engine` key is now validating the user input during the pipeline creation via API.
IPD-22721	For Confluent Kafka source, the `streaming_group_id_prefix` configuration is now working as expected.
IPD-22615	The Partition and the Clustering details are now appearing in the BigQuery table created via Infoworks pipeline.
IPD-22036	Pipeline build now succeeds even when the target table for the BigQuery external target already exists and is clustered.
IPD-22090	For the Delimited File target, the timestamp format can be configured as per the user requirements. This will be applicable for all the timestamp columns for that table. `timestampFormat=yyyy-MM-dd HH:mm:ss.SSS` `timestampFormat=yyyy-MM-dd HH:mm:ss`
IPD-22351	Infoworks has added advanced configuration for setting BigQuery Session Project IDs for Data Transformation and Ingestion jobs (dt_bigquery_session_project_id/ingestion_bigquery_session_project_id)
IPD-21449	The Import SQL API is now picking the correct table (even if a table with the same schema/table name is present in multiple data environments).
IPD-21534	The Initialize & Ingest and Truncate jobs can now reset the value of the `last_merged_watermark` key.
IPD-21584	The Import SQL command is now able to fetch queries that contain backtick (`).
IPD-21700	Fixed the pipeline deletion issue.
IPD-21792	The duplicate tables are not allowed to be onboarded anymore on HIVE Metadata sync source.

Upgrade

Assuming IW_HOME variable is set to /opt/infoworks

Prerequisite

To support rollback after metadata migration, you need to take backup of metadata. Following are the steps:

Step 1: Install/Download MongoDB tool: mongodump. (if needed).

Step 2: Create a directory to store the database backup dump using the below command.

 Command 
    
xxxxxxxxxx
 
mkdir -p $IW_HOME/mongo_bkpcd $IW_HOME/mongo_bkp
Copy

Step 3: Use the below command to take a dump (backup) of the databases from the mongodb server.

If MongoDB is hosted on Atlas

 Command 
    
xxxxxxxxxx
 
mongodump "mongodb+srv://<username>:<password>@<mongodb_server_hostname>/<db_name>"
Copy

If MongoDB is installed with Infoworks on the same VM

 Command 
    
xxxxxxxxxx
 
mongodump "mongodb://infoworks:IN11**rk@localhost:27017/infoworks-new"
Copy

Procedure

For upgrading from 5.4.1/5.4.1.x to 5.4.1.13, execute the following commands:

Step 1: Use the deployer to upgrade from 5.4.1 to 5.4.1.13.

Step 2: Go to $IW_HOME/scripts folder of the machine.

Step 3: To ensure that there is no pre-existing update script, execute the following command:

[[ -f update_5.4.1.13.sh ]] && rm update_5.4.1.13.sh

Step 4: Download the update_5.4.1.13.sh

wget https://iw-saas-setup.s3.us-west-2.amazonaws.com/5.4/update_5.4.1.13.sh

Step 5: Give update.sh executable permission

chmod +x update_5.4.1.13.sh

Step 6 (Optional): If the patch requires Mongo Metadata to be migrated, run export METADB_MIGRATION=Y. This ensures that the metadata will be migrated, else run export METADB_MIGRATION=N.

Alternatively, you can enter it in the prompt while running the script.

Step 7: Update the package to the hotfix

source $IW_HOME/bin/env.sh

./update_5.4.1.13.sh -v 5.4.1.13-ubuntu2004

You will receive a "Please select whether metadb migration needs to be done([Y]/N)" message. If you need to perform metadb migration, enter Y, else, enter N.

Post Upgrade Steps

Steps to follow after upgrading Infoworks to 5.4.1.13:

NOTE Please make sure to take a backup of dataproc_defaults.json file before performing any change.

The dataproc_defaults.json file needs to be updated post upgrade to 5.4.1.13. This file is present in /opt/infoworks/conf directory. Here /opt/infoworks is the IW_HOME.

Following changes are required to be added in the dataproc_defaults.json file. To update the above file, change directory using cd /opt/infoworks/conf and run vi dataproc_defaults.json

Step 1: Add the property config.masterConfig.diskConfig.numLocalSsds : 0

Step 2: Add the property config.workerConfig.diskConfig.numLocalSsds : 0

Step 3: Add the object config.secondaryWorkerConfig :

 Command 
    
xxxxxxxxxx
 
{  "numInstances": 2,  "instanceNames": [  ],  "machineTypeUri": "https://www.googleapis.com/compute/v1/projects/#projectid/zones/us-central1-f/machineTypes/n1-standard-8",  "diskConfig": {    "bootDiskSizeGb": 50,    "bootDiskType": "pd-standard"  },  "minCpuPlatform": "AUTOMATIC",  "isPreemptible": true}
Copy

Screenshot before update - No key with name secondaryWorkerConfig present inside config property.

Screenshot after update - Added the secondaryWorkerConfig object inside config property

Step 4: Add the array num_local_ssds : [0,1,2,3,4,5,6,7,8,16,24]

Screenshot before update - No field for num_local_ssds present

Screenshot after update - Added new Json array for num_local_ssds


	UI and Platform services will need to be restarted after applying this configuration change.

Steps to Enable EFM on Dataproc

To add Secondary workers to a Dataproc cluster, select Enable Autoscale checkbox and then select the Enable Secondary Worker checkbox.

The Secondary worker type can be one of - spot VMs, standard preemptible VMs, or non-preemptible VMs.

As per the Dataproc documentation, following properties need to be added to enable EFM:

--properties=dataproc:efm.spark.shuffle=primary-worker \ --properties=dataproc:efm.mapreduce.shuffle=hcfs \

To add these properties, head over to the Advanced Configurations in the Compute section and and the following key:

Key: iw_environment_cluster_dataproc_config

Value: efm.spark.shuffle=primary-worker;efm.mapreduce.shuffle=hcfs

Additionally, the YARN graceful decommission time must be set to zero when EFM is enabled. To set that add the following advanced configuration:

Key: gracefulDecommissionTimeout

Value: 0 (zero)

Additional Notes

The number of allowed local SSDs might differ based on the selected machine type. Please refer to - https://cloud.google.com/compute/docs/disks/local-ssd to check the allowed values.
Clusters having local SSDs cannot be stopped.
Cluster having secondary workers cannot be stopped. To stop, the secondary workers need to be scaled down to zero.
Existing clusters cannot be updated from single-node to multi-node or vice-versa.
When autoscale is enabled and advanced configurations for EFM are set, the Secondary workers must be enabled, else cluster creation will fail. This is because Primary workers cannot be autoscaled when Spark primary worker shuffle is enabled.

Rollback

Prerequisite

To rollback the migrated metadata:

Step 1: Install/Download MongoDB tool: mongorestore. (if needed)

Step 2: Switch to the directory where the backup is saved on the local system.

 Command 
    
xxxxxxxxxx
 
cd ${IW_HOME}/mongo_bkp/dump
Copy

Step 3: Use the below command to restore the dump (backup) of the databases to the Mongodb Server.

If MongoDB is hosted on Atlas

 Command 
    
xxxxxxxxxx
 
mongorestore "mongodb+srv://<username>:<password>@<mongodb_server_hostname>/<db_name>” --drop ./<db_name>
Copy

If MongoDB is installed with Infoworks on the same VM

 Command 
    
xxxxxxxxxx
 
mongorestore "mongodb://infoworks:IN11**rk@localhost:27017/infoworks-new" --drop ./<db_name>
Copy

Procedure

To go back to previous checkpoint version:

Step 1: In a web browser, go to your Infoworks system, scroll-down to the bottom, and click the Infoworks icon.

Step 2: The Infoworks Manifest Information page opens in a new tab. Scroll down and check the Last Checkpoint Version.

Step 3: ssh to Infoworks VM and switch to {{IW_USER}}.

Step 4: Initialize the variables in the bash shell.

full_version=5.4.1.13

major_version=$(echo $full_version | cut -d "." -f 1-2)

previous_version=<Previous Version> # Last Checkpoint Version from step 1

os_suffix=<OS Suffix> # One of [ ubuntu2004 amazonlinux2 rhel8 ]

Step 5: Download the required deployer for the current applied patch.

https://iw-saas-setup.s3-us-west-2.amazonaws.com/${major_version}/deploy_${full_version}.tar.gz

Step 6: Execute the SCP command for the above mentioned files to the following path.

NOTE Remove the previously downloaded copy of deploy_${full_version}.tar.gz file in ${IW_HOME}/scripts/ directory.

${IW_HOME}/scripts/.

Step 7: Extract the deployed tar file in case it does not exist.

cd ${IW_HOME}/scripts

[[ -d iw-installer ]] && rm -rf iw-installer

tar xzf deploy_${full_version}.tar.gz

cd iw-installer

Step 8: Initialize the environment variables.

source ${IW_HOME}/bin/env.sh

export IW_PLATFORM=saas

Step 9: Run the Rollback command.

./rollback.sh -v ${previous_version}-${os_suffix}

Last updated on

Was this page helpful?