Title
Create new category
Edit page index title
Edit category
Edit link
Structured file ingestion is failing with java.lang.IllegalArgumentException Delimiter cannot be more than one character
Structured file ingestion is failing with java.lang.IllegalArgumentException: Delimiter cannot be more than one character
Problem Description:
Structured file ingestion failing with java.lang.IllegalArgumentException: Delimiter cannot be more than one character error, sample stack trace looks like below,
20/08/10 10:34:02 ERROR FileFormatWriter: Aborting job 4c9f9c40-fe2d-41c1-8c57-d90064af1218.java.lang.IllegalArgumentException: Delimiter cannot be more than one character: @|# at org.apache.spark.sql.execution.datasources.csv.CSVUtils$.toChar(CSVUtils.scala:118) at org.apache.spark.sql.execution.datasources.csv.CSVOptions.(CSVOptions.scala:88) at org.apache.spark.sql.execution.datasources.csv.CSVOptions.(CSVOptions.scala:41) at org.apache.spark.sql.execution.datasources.csv.CSVFileFormat.buildReader(CSVFileFormat.scala:105) at org.apache.spark.sql.execution.datasources.FileFormat$class.buildReaderWithPartitionValues(FileFormat.scala:131) at org.apache.spark.sql.execution.datasources.TextBasedFileFormat.buildReaderWithPartitionValues(FileFormat.scala:162) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD$lzycompute(DataSourceScanExec.scala:456) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDD(DataSourceScanExec.scala:450) at org.apache.spark.sql.execution.FileSourceScanExec.inputRDDs(DataSourceScanExec.scala:477) at org.apache.spark.sql.execution.ProjectExec.inputRDDs(basicPhysicalOperators.scala:46) at org.apache.spark.sql.execution.WholeStageCodegenExec.doExecute(WholeStageCodegenExec.scala:631) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:146) at org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:134) at org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$5.apply(SparkPlan.scala:187)Root cause:
Spark 2 doesn't support multi-character delimiter during CSV read. The databricks runtime version(5.5) we use for the submission of the job has spark version 2.X. So by default files with multi-character delimiter will fail with the below-mentioned error.
Solution:
Spark 3 can handle multi-character delimiter so if we submit the with databricks runtime 7.2X we can avoid above mentioned error while crawling the data. Below is the advanced configuration one needs to set at the table or the source level to run an ingestion job on different runtime than the default one.
xxxxxxxxxxKey:- databricks_spark_runtimeValue:- 7.2.x-scala2.12Applicable IWX versions:
IWX 4.2
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
© UNIPHORE TECHNOLOGIES 2025 | Confidential