Infoworks 6.1.3
Knowledge Base Articles

Intermittent Failure of CDC SCD2 Pipeline Builds Due to Timestamp Casting Issue

Affected Versions

6.1.1

Description

The CDC SCD2 pipeline builds intermittently fail due to incorrect timestamp casting. The issue stems from Spark's transition to a new datetime parser in version 3.0, which introduces stricter datetime validation. Pipelines relying on timestamp parsing fail with errors, disrupting the build process. This article provides the root cause analysis and steps to resolve the issue effectively.

Root Cause

The error stems from a change in behavior in Spark >= 3.0 related to datetime parsing. By default, Spark uses the new parser introduced in version 3.0, which may fail to parse certain datetime formats, resulting in errors such as:

Java
Copy

The error is due to Spark’s timeParserPolicy defaulting to EXCEPTION, which treats specific datetime strings as invalid unless they strictly follow ISO8601 formatting. This impacts pipelines reliant on Spark’s datetime parsing capabilities.

To Resolve

To work around this issue, adjust the timeParserPolicy to use the legacy datetime parser. This can be achieved using one of the following approaches:

Option 1: Configure Pipeline Settings

  1. Navigate to the settings page of that pipeline.

  2. In the Advanced Configuration section, add the following key-value pair:

    • Key: iw_spark_app_conf
    • Value: spark.sql.legacy.timeParserPolicy=LEGACY
  3. Use an ephemeral cluster for pipeline execution.

Option 2: Configure Compute (Environment Level)

  1. Open the Advanced Configuration settings for the compute in your environment.

  2. Add the following key-value pair:

    • Key: spark.sql.legacy.timeParserPolicy
    • Value: LEGACY
  3. Restart the compute to apply the changes.

Option 3: Directly Set the Property in Compute

  1. Directly set the following configuration in the compute settings:

    • Key: spark.sql.legacy.timeParserPolicy
    • Value: LEGACY
  2. Restart the compute cluster.