Title
Create new category
Edit page index title
Edit category
Edit link
Ingestion job failing with error org.apache.spark.sql.execution.OutOfMemorySparkException during the merge phase
Ingestion job failing with error org.apache.spark.sql.execution.OutOfMemorySparkException during the merge phase
Problem Description:
Ingestion job fails with the below error in Infoworks DataFoundry 3.X,4.X,
Caused by: java.util.concurrent.ExecutionException: org.apache.spark.sql.execution.OutOfMemorySparkException: Size of broadcasted table far exceeds estimates and exceeds limit of spark.driver.maxResultSize=4294967296. You can disable broadcasts for this query using set spark.sql.autoBroadcastJoinThreshold=-1 at java.util.concurrent.FutureTask.report(FutureTask.java:122) at java.util.concurrent.FutureTask.get(FutureTask.java:206) at org.apache.spark.sql.execution.exchange.BroadcastExchangeExec.doExecuteBroadcast(BroadcastExchangeExec.scala:182) ... 227 moreRoot Cause:
This is due to a limitation with Spark’s size estimator.
If the estimated size of one of the DataFrames is less than the autoBroadcastJoinThreshold, spark may use BroadCastHashJoin to perform the join. If the available nodes do not have enough resources to accommodate the broadcast DataFrame, your job fails due to an out-of-memory error.
Solution:
To resolve this we can increase the value of spark.driver.maxResultSize by setting below advanced configs at the table or the source level. key: ingestion_spark_configs value:spark.driver.maxResultSize=8294967296
Applicable versions:
IWX 3.2,4.0,4.2.
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
© UNIPHORE TECHNOLOGIES 2025 | Confidential