Ingestion job failing with error org.apache.spark.sql.execution.OutOfMemorySparkException during the merge phase

Ingestion job failing with error org.apache.spark.sql.execution.OutOfMemorySparkException during the merge phase

Problem Description:

Ingestion job fails with the below error in Infoworks DataFoundry 3.X,4.X,

Copy

Root Cause:

This is due to a limitation with Spark’s size estimator.

If the estimated size of one of the DataFrames is less than the autoBroadcastJoinThreshold, spark may use BroadCastHashJoin to perform the join. If the available nodes do not have enough resources to accommodate the broadcast DataFrame, your job fails due to an out-of-memory error.

Solution:

To resolve this we can increase the value of spark.driver.maxResultSize by setting below advanced configs at the table or the source level. key: ingestion_spark_configs value:spark.driver.maxResultSize=8294967296

Applicable versions:

IWX 3.2,4.0,4.2.

VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches