OutOfMemorySparkException while running Infoworks Jobs

OutOfMemorySparkException while running Infoworks Jobs

Problem Description

When you are running some incremental ingestion jobs or pipeline jobs, sometimes you might see the below errors related to broadcast joins in the failed databricks logs.

org.apache.spark.sql.execution.OutOfMemorySparkException: Size of broadcasted table far exceeds estimates and exceeds limit of spark.driver.maxResultSize=4294967296. You can disable broadcasts for this query using set spark.sql.autoBroadcastJoinThreshold=-1

Solution

As suggested in the exception itself we have 2 options here, either to increase the driver max result size or disable the broadcast joins.

You can set the below-advanced configs at table level or at the pipeline level.

For ingestion tables,

key: ingestion_spark_configs

value: spark.driver.maxResultSize=8294967296

For pipelines,

key: spark.driver.maxResultSize

value: 8294967296

Applicable Infoworks Data Foundry Versions

IWX 4.x

VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches