Spark pipelines can be configured to run in client mode on edge node or can be submitted via Apache Livy. By default, if no configuration is specified, Spark pipelines run in the client mode on the edge node.
Perform the following to configure Spark pipelines to run in client mode on edge node:
job.dispatcher.type=nativeFollowing are the steps to configure Spark pipelines to run in cluster mode:
job.dispatcher_type=sparkThe ${IW_HOME} path on HDFS can be different from the ${IW_HOME} path on the edge node local file system. IW HOME on HDFS in the above configuration will be used as ${IW_HOME} for pipeline jobs running in cluster mode. So, ensure to copy the ${IW_HOME}/conf folder from the edge node local file system to ${IW_HOME} on HDFS.
${IW_HOME}/conf folder from local to Hadoop ${IW_HOME} (IW HOME on HDFS).${IW_HOME}/conf/conf.properties file in Hadoop, ensure that the configuration file paths are prefixed with hdfs:dt_spark_configfile_batch, in the Hadoop ${IW_HOME}/conf/conf.properties file must point to the HDFS path of the dt_spark_defaults.conf file (dt_spark_configfile_batch=hdfs:/<df_spark_default_conf>).${IW_HOME}/conf/conf.properties file in the edge node: dt_hdfs_lib_base_path=<HDFS lib base path>Spark 2.1 does not allow having same jar name multiple times, even in different paths. If an error occurs, add the following configuration in the ${IW_HOME}/conf/conf.properties file in the edge node: dt_classpath_include_unique_jars=true