Title
Create new category
Edit page index title
Edit category
Edit link
Submitting Spark Pipelines
Spark pipelines can be configured to run in client mode on edge node or can be submitted via Apache Livy. By default, if no configuration is specified, Spark pipelines run in the client mode on the edge node.
Configuring Spark to Run in Client Mode on Edge Node
Perform the following to configure Spark pipelines to run in client mode on edge node:
- Add the following configuration in the pipeline Advanced Configuration option:
job.dispatcher.type=native
Configuring Spark Pipeline to Run in Cluster Mode
Following are the steps to configure Spark pipelines to run in cluster mode:
- Add the following configuration in the pipeline Advanced Configuration option:
job.dispatcher_type=spark - Add the following configurations in the dt_spark_defaults.conf file on the edge node:
xxxxxxxxxxspark.master yarn spark.submit.deployMode cluster spark.driver.extraJavaOptions=-DIW_HOME=hdfs:<IW HOME ON HDFS> spark.executor.extraJavaOptions=-DIW_HOME=hdfs:<IW HOME ON HDFS>The ${IW_HOME} path on HDFS can be different from the ${IW_HOME} path on the edge node local file system. IW HOME on HDFS in the above configuration will be used as ${IW_HOME} for pipeline jobs running in cluster mode. So, ensure to copy the ${IW_HOME}/conf folder from the edge node local file system to ${IW_HOME} on HDFS.
- Copy the
${IW_HOME}/conffolder from local to Hadoop${IW_HOME}(IW HOMEon HDFS). - In cluster mode, the pipeline job runs in Yarn cluster and reads all configuration files from HDFS. While specifying configuration files in the
${IW_HOME}/conf/conf.propertiesfile in Hadoop, ensure that the configuration file paths are prefixed with hdfs: - The configuration,
dt_spark_configfile_batch, in the Hadoop${IW_HOME}/conf/conf.propertiesfile must point to the HDFS path of thedt_spark_defaults.conffile (dt_spark_configfile_batch=hdfs:/<df_spark_default_conf>). - When running in cluster mode, the pipeline job uploads lib jars on HDFS. By default, the same HDFS path and local path is used while uploading jars from local. For example, if jar path on local is file:/opt/info/lib/df/* then, the path, hdfs:/opt/info/lib/df/*, will be created on HDFS, and jars from file:/opt/info/lib/df/* will be uploaded to hdfs:/opt/info/lib/df/*. To change the base HDFS lib path, add the following configuration in the
${IW_HOME}/conf/conf.propertiesfile in the edge node:dt_hdfs_lib_base_path=<HDFS lib base path>
Spark 2.1 does not allow having same jar name multiple times, even in different paths. If an error occurs, add the following configuration in the ${IW_HOME}/conf/conf.properties file in the edge node: dt_classpath_include_unique_jars=true
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
© UNIPHORE TECHNOLOGIES 2025 | Confidential