Infoworks 6.1.3
Admin and Operations

General Configuration

The General tab allows you to add/delete configurations, and change the values and descriptions of a configuration key.

The autocomplete feature allows you to select the required configuration from the drop-down list based on your input.

Click the Admin menu, and click Configuration. The General tab is displayed by default.

The following table lists all the configuration parameters and their descriptions.

General Configuration Keys and Descriptions

NOTE Please scroll to right inside the table, to view all the table columns.

Configuration ParameterDescriptionDefault Value
dt_workspace_schemaSchema where Data Transformation stores all intermediate tables.iw_dt_workspace
dt_workspace_base_pathHDFS path where Data Transformation stores data for all intermediate data./iw/df/workspace
dt_shared_connection_enabledWhether to reuse the connection for interactive mode. Typically true for Hive+Tez and false for Hive+MR.true
dt_shared_connection_timeout_msAfter how long should an idle shared connection be closed. Shared connection should be enabled.180000
dt_interactive_exec_pool_sizeThread pool size for executing tasks per request in the interactive mode.10
dt_interactive_table_maximum_sizeMaximum size of any of the intermediate tables in interactive mode.10000
dt_batch_exec_pool_sizeThread pool size for executing tasks per request in the batch mode.5
dt_sampling_num_threadsNumber of sampling tasks to run in parallel.5
PIPELINE_SOURCES_AUTO_SYNC_CHECK_DISABLEDIf set to true, it disables the error icon on the pipeline editor page indicating source schema mismatch. If set to false, the error icon displays on the pipeline editor page.
random_sampling_enabledDetermines whether sample generation, for source or pipeline target, needs to be random. To be disabled when sampling is taking too long to complete or is failing.true
LOGMINER_TABLESPACE_NAMETable space name.
use_new_tablespaceTo start logminer in different tablespace.False
ADVANCED_ANALYTICS_DISABLEDTo disable advanced analytics nodes, this should be true.False
modified_time_as_cksumIf this is true, the modified time is used to determine if the file has been changed or not. If it is set to false, the actual checksum is calculated.false
delete_table_query_enabledBy default, Delete Query feature is available at the table level. Set IW Constant delete_table_query_enabled to false from UI to hide Delete Query feature.true
Ufi_max_failure_percentage_per_tableThe percentage of files for which the file ingestion is failed before the job is shown as failed.0.0
MIN_ROWS_FOR_MERGEMRMinimum number of rows in the current data in a secondary partition for the merge. If a secondary partition has number of rows less than this value, the merge jobs will be combined until they reach this threshold.1000000
fail_ingestion_on_impala_cache_refresh_failureSet to true if the ingestion job is to be failed in case Impala cache metadata is not refreshed.
iw_hive_ssl_enabledTo Enable or Disable SSL on Hivefalse
iw_hive_ssl_truststore_pathPath to Hive Trust Store File Location
iw_hive_ssl_truststore_passwdEncrypted Password for Trust Store
filesystem_schemeThe hdfs path will be pre-populated with the value set in this configuration. The values can be: s3, s3a, s3n, adls, wasb, gs.-
dt_batch_sparkapp_settingsThis configuration is set to overwrite spark configurations like executor memory, driver memory, etc during pipeline batch build.
dt_disable_sample_jobThis configuration is set to disable sample jobs after pipeline build.False
dt_disable_cache_jobThis configuration is set to disable cache jobs after pipeline build.False
CREATE_DROP_TARGET_SCHEMAUsed to allow Infoworks to run create database and drop database commands.If this value is set to false, the create database and drop database commands will not be executed.True
target_schema_permissionUsed to enable or disable creation of database in pipelines.If this value is set to false, creation of database in pipelines will be disabled.True
number_of_parallel_jobs_per_entityThe number of jobs that can run in parallel for each entity.Entity refers to either of the following: Source, Datamodel, Cube.For example, if this value is set to 3, then a maximum of 3 source crawls or 3 data model builds or 3 cube builds can occur in parallel.NA
dt_auto_exclusion_enabledUsed to set pipeline target column projection optimization on all nodes.True
dt_merge_exec_pool_sizeNumber of concurrent tasks to run while performing merge on pipeline targets.5
dt_fail_on_null_or_empty_partition_valueUsed to fail the pipeline jobs when partition values are Null or Empty.False
iw_udfs_dirIndicates the UDFs directory.NA
iw_hdfs_prefixDefault HDFS access protocol prefix.hdfs://
dt_label_auto_cast_enabledUsed to enable Auto Cast mode for Advanced Analytics Model Label Column.True
dt_label_cast_typeUsed to indicate the default value for Advanced Analytics Model Label Column.Double
dt_spark_masterIndicates the Spark master mode.local
dt_target_hdfs_cleanupTo retain data on HDFS when the pipeline is removed, set this false.True
dt_batch_spark_settingsSemi-colon separated list of Entity level Hive configurations. Applicable for batch mode pipeline build and to generate sample data in source.key1=value1;key2=value2;key3=value3
dt_scd2_granularitySCD2 record change granularity on timestamp can be Second,Minute,Hour,Day,Month,Year. This can be set using advance configuration to overwrite configuration for all targets at once.second
dt_custom_udfs_force_copyCustom UDFs in pipelines are only copied to HDFS when changes are detected. This configuration is used to overwrite changes to custom UDFs.False
dt_disable_sample_jobUsed to disable sample job for pipeline targets after pipeline build.False
dt_disable_cache_jobUsed to disable cache job for pipeline targets after pipeline build .False
dt_spark_configfileSpark configuration file path for Interactive mode Spark pipelines./etc/spark2/conf/spark-defaults.conf
dt_spark_configfile_batchSpark configuration file path for Batch mode Spark pipelines./etc/spark2/conf/spark-defaults.conf
dt_batch_spark_coalesce_partitionsSpark coalesce configuration to create lesser files while writing to disk.NA
dt_disable_current_loaderCustom Transformations cannot be loaded from current class loader. To load Custom Transformation from current classloader, set this configuration to False.True
dt_overwrite_log_levelLog Level overwrite at pipeline level.For example, rootLogger=ERROR;io.infoworks=TRACE;infoworks=DEBUG;org.apache.spark=ERROR.NA
dt_validation_progress_percentUsed to set the validation process percent in pipeline batch jobs.10
dt_schemasync_progress_percentUsed to set the Schema Sync progress percent in pipeline batch jobs.10
dt_spark_merge_file_numSpark configuration to merge files using Coalesce option on dataframe during merge process.1
iw_dt_ext_prefixPipeline extension prefix.hdfs://
storage_formatUsed to set the default storage format value for pipeline target.ORC
user_extensions_base_pathPipeline extension base path.NA
SUPPORT_RESERVED_KEYWORDSUsed to enable or disable support for reserved keywords. To enables Hive connection on HDP 3.1, this value must be set to false.True
dt_cutpoint_optimization_enabledEnables caching when the value is set to true.False
dt_redshift_s3_uri_schemeThis configuration is provided to specify the URI scheme for the S3 temp dir used in the Redshift connector.s3a
dt_target_row_count_enabledThis configuration is to specify if the user wants to get the target row count at the end of pipeline build.True
dt_label_auto_cast_enabledThis configuration is for auto casting labels in ML nodes for integers.True
dt_label_cast_typeThis configuration is to specify the cast type of the table column in ML nodes.Double
dt_convert_source_watermark_to_timestampThis configuration is for generic jdbc sources to convert source watermark column to timestamp.False
dt_convert_ _all_source_watermark_to_timestampThis configuration is for all sources to convert source watermark column to timestamp.False
dt_source_type_sybase_iq_comparator_columnThis configuration is for identifying the column to be used for fetching the latest records on the basis of timestamp for Sybase sources.ZIW_OFFSET
dt_source_ type_oracle_comparator_columnthis configuration is for identifying the column to be used for fetching the latest records on the basis of timestamp for Oracle sources.ZIW_SCN
dt_source_type_sql_ server_comparator_columnThis configuration is for identifying the column to be used for fetching the latest records on the basis of timestamp for Sql Server sources.ZIW_SOURCE _START_TIMESTAMP
dt_source _type_db2_comparator_columnThis configuration is for identifying the column to be used for fetching the latest records on the basis of timestamp for DB2 sources.ZIW_SOURCE _START_TIMESTAMP
sample_data_row_countNumber of rows to be retrieved as part of sample data.100
sample_data_error_threshold% of invalid records in sample data after which the sample data will fail50
file_preview_row_countNumber of rows to be shown for file preview.100
file_preview_size_limitMaximum File size for file preview.10 (in MB)
bulk_payload_record_sizeNumber of records in the payload for each batch of bulk insert.6500
maxNumOfRowsMaximum number of chunks allowed for segment load.2000
error_record_thresholdMaximum number of invalid records after which the data crawl will fail.1000
csv_null_valueNull value for csv fileNull
is_multi_line_csvSet it to true if csv file has multilines.False
CSV_FILE_ ENCLOSE_CHARCharacter used to enclose the column value."
CSV_FILE_ESCAPE_CHARThe character used in the delimited files to escape the occurrences of column separator and column enclosed by in data.\
exclude_partition_column_csv_exportIf set to true, the output file will not contain the column on which file is partitionedFalse
coalesce_partitionsSet the value to 1 if the output should contain a single file.0
default_connectors_pathThe base path for connectors.lib/ingestion/connectors/
default_core_ingestion_lib_pathPath to the core libraries.lib/ingestion/core/
use_iw_csv_mapper-false
tpt_files_base_pathBase path for all tpt job related file (configs,logs,scripts)./temp/
add_tpt_ log_to_job_logAdd TPT log to job log.True
copy_tpt_ log_to_targetCopy tpt log to target.False
tpt_select_operatorOperator used for TPT export (EXPORT/SELECTOR).EXPORT
tpt_charsetCharset specified in tbuild command.ASCII
tpt_cloud_dont_split_rowsDontSplitRows parameter in tpt access modules.True
use_tpt_generated_schemaUse TPT generated schema.False
tpt_ascii_multiplication_factorASCII multiplication factor1
tpt_utf8_multiplication_factorUTF8 multiplication factor2
tpt_utf16_multiplication_factorUTF16 multiplication factor2
tpt_export_operator_max_decimal_digitsMaxDecimalDigits attribute in select operator.38
tpt_export_spool_modeSpoolMode attribute in select operator.spool
tpt_export_block_sizeBlockSize attribute in select operator.64330
tpt_export_tencity_hoursTenacityHours attribute in select operator.4
tpt_export_retry_intervalTenacitySleep attribute in select operator.1
tpt_export_max_sessionsMaxSessions attribute in select operator.20
tpt_export_min_sessionsMinSessions attribute in select operator.1
enable_tpt_data_encryptionDataEncryption attribute in select operator.False
enable_tpt_trace_levelTraceLevel attribute in select operator (true will change it to ALL).False
tpt_checkpoint_interval_in_seconds-Z flag in tbuild command.0
enable_tpt_log-o flag in tbuild command.False
tpt_cyclic_delivery-C flag in tbuild command.False
tpt_job_restart_limit-R flag in tbuild command.2
tpt_shared_ memory_size-h flag in tbuild command.10
merge_dedupe_functionDe-duplication method used to de-duplicate cdc table (options : WINDOW, ORDER_BY_ DROP_DUPLICATES, SORT_DROP_DUPLICATES)WINDOW
handle_deletes_during_merge-False
should_run_dedupeDe-duplicate the cdc dataFalse
should_run_de_dup_on_full_loadDe-duplicate table during full load job (init and ingest).False
get_sample_data_during_schema_crawlFetches sample data during metacrawl if set to true.False
get_row_count_during_schema_crawlFetches row count during metacrawl if set to true.False
metacrawl_batch_sizeNumber of tables to be metacrawled at a time100
keywords_to_be_renamed_filepath--
database_object_types_defaultTable types to be included for browse source and meta crawl.TABLE, VIEW, SYNONYM
keywords_to_be_renamed-USING
TEMP_LOG_TABLE_NAMEName of the temporary log table created for tables configured for log based ingestion.LOG_
NUMBER_OF_LOG_FILES_TO_LOADThe configuration used to restrict the number of log files to load in each run of CDC.-1
CHECK_FOR_NULL_LOG_FILEThe key that determines whether we check for null log file path in case of log based ingestion.True
ORACLE_ARCHIVE_LOG_INFO_OBJECT_NAMEThe name of the view of the oracle archive log info.sys.v_$archived_ log
LOG_FILE_SEQUENCE_COLUMN_NAMEThe column name of the column that maintains log file sequence.SEQUENCE#
IS_SECONDERY_LOG_LOCATIONTrue if the user has changed the log file location, false otherwise.False
include_current_logThe key that determines whether the key with the last log sequence number is included in the query or not.False
ORACLE_LOG_ DESTINATION_IDDestination id of the oracle log in case of secondary log location.2
SET_NLS_DATE_FORMATThe key that should be set to true if the user wants to use the NLS_DATE_ FORMAT.False
NLS_DATE_FORMATThis key specifies the default date format to use with the TO_CHAR and TO_ DATE functions.The value of this parameter can be any valid date format mask.YYYY-MM-DD HH24:MI:SS
use_new_tablespaceTo start logminer in different tablespace.False
LOG_MINER_TABLESPACE_NAMETablespace name.logmnrts
LOGMINER_TEMP_TABLE_TABLESPACE_NAMETemp Tablespace name.logmnrts
LOG_BASED_CDC_LOG_CONCAT_ORDER-ASC
ORACLE_LOG_BASED_CDC_CLEAN_CONTROL_CHARACTERS-False
WIDE_TABLE_ CONCAT_UDF_NAME-stragg(sql_redo_ record(sql_redo,query_order))
ADD_LOG_BASED_SELECT_QUERY_PARALLELISMParallelism factor for select queries for tables configured with log based ingestion. Should be an integer.False
SELECT_QUERY_PARALLELISM_FACTORParallelism factor for select queries. Should be an integer.1
RUN_TEMP_TABLE_STATSThe key that determines whether we get the table stats for the temp table.False
remove_extra_single_quoteThe key used to remove any extra single quote char found.True
escape_escape_char_in_dataThe key that determines whether we use escape character while parsing the query.True
date_formatDate format of data for log-based cdc.dd-MMM-yy
timestamp_formatTime format of data for log-based cdc.dd-MMM-yy hh.mm.ss a
is_time_format_neededThe key that determines whether we parse the date to a specific format while parsing the query.True
use_jsql_and_expressionShould use jsqlparser AndExpression or not.True
UNCOMMITED_TIME_PERIODThe uncommitted records older than this time period (specified in days) are deleted.7
oracle_timezone_as_regionIf true, we set oracle.jdbc.timezoneAsRegion property to true. The property checks if the node connecting to oracle database is in some specified timezone list.False
oracle_date_functionsCan specify any oracle date functions to handle date and time data easily and more effectively.TO_DATE,TO_ TIMESTAMP
oracle_logbase_date_parsed_formatOracle log based date parse format.yyyy-MM-dd
oracle_logbase_timestamp_parsed_formatOracle log based timestamp parsed format.yyyy-MM-dd HH:mm:ss.SSSSSS
temp_table_stats_percentUsed to run temp table stats query.20
temp_table_stats_degreeUsed to run temp table stats query.8
temp_table_ index_name_onePrefix name used to create index on few columns of temp table.IW_
CREATE_TEMP_TABLE_PARALLELISMParallelism factor for temp table queries. Should be an integer.1
create_index_on_temp_tableAn index is created on the temp table if set to true.True
FILTER_ORACLE_COMMITTED_TRANSACTIONSOption DBMS_LOGMNR.COMMITTED_ DATA_ONLY is set for the LogMiner session. This enables you to filter out rolled back transactions, transactions that are in progress, and internal operations.False
ENABLE_DDL_TRACKINGOption DBMS_LOGMNR.DDL_ DICT_TRACKING is set for the LogMiner session to enable ddl tracking. DDL tracking enables LogMiner to successfully track structural changes made to a database object, such as adding or dropping columns from a table.False
purge_retries_num_attemptsNumber of retries if there is a failure during purge job.5
should_send_notificationsThe key that determines whether the user gets notified of events like, delete source, meta crawl, full load and cdc jobs, and so on.True
governor_request_timeout_secTimeout for governor requests30
custom_mappings10000
oem_quote_characterQuote character to be used for oem source queries."
oem_key_default_valueDefault key value used for oem connectors to connect.XKR3Rwo8S5mLT57IGZ56uTIT/hcBbC1ZnYB+jYE1ZU02dCzvxakdhnuXFEY1I6HGmpiMg/AeHZJFuhb4w8bYkmQwFBkL8S+mM1bZva/zLbvxFG4kmRH4g1yFKQ==
meta_db_limitIt is required for fetching number of records in single call from metadb(mongo) if number of records are too large, and can’t be fetched in single call from metadb due to resources issue. Currently its being used while getting list of files in sources, if there files in order of 100k, we are fetching that detail in chunk, that limit describe that chunk size1000
meta_crawl_row_count100
topic_preview_row_countNumber of rows to be shown for topic preview.100
kafka_erro_ threshold_defaultThis configures that the crawl fails if there are more error records than this configured limit, or more than successful records.5
kafka_max_num_records_defaultMaximum records to consider for schema crawl.20000
kafka_streamin_frequency_num_defaultTime unit for frequency type for schema crawl.10
kafka_streamin_frequency_type_defaultStreaming frequency for schema crawl.Secs
kafka_streaming_mode_defaultStreaming mode for schema crawl . Options are micro_batch and polling.micro_batch
confluent_client_dns_lookupControls how the client uses DNS lookups.use_all_ dns_ips
confluent_consumer_num_retriesThis configures the Confluent consumer retries for metacrawl.5
confluent_kafk_sample_data_error_threshold_defaultMaximum number of invalid records after which the sample data will fail.50
invalid_record_thresholdThis configures that the crawl fails if there are more error records than this configured limit, or more than successful records.5
confluent_deserializer_classio.infoworks.connectors.streaming.confluent.kafka.utils.ConfluentJsonDeserializer - By default this class will be used to deserialize json messages which uses confluent KafkaJsonDeserializer.
cancel_job_via_fileIf this is set to true the streaming job will be stopped by stopping the streaming query.false true (for databricks)
logical_date_formatThe format for specifying logical date for Avro messages.YYYY-MM-dd
logical_time_millis_formatThe format for specifying logical time in milliseconds for Avro messagesHH:mm:ss.SSS
logical_time_micros_formatThe format for specifying logical time in microseconds for Avro messagesHH:mm:ss.SSSSSS
logical_timestamp_millis_formatThe format for specifying logical timestamp in milliseconds for Avro messagesyyyy-MM-dd HH:mm:ss
logical_timestamp_micros_formatThe format for specifying logical timestamp in microseconds for Avro messagesyyyy-MM-dd HH:mm:ss
local_timestamp_millis_formatThe format for specifying local timestamp in milliseconds for Avro messagesyyyy-MM-dd HH:mm:ss
local_timestamp_micros_formatThe format for specifying local timestamp in microseconds for Avro messagesyyyy-MM-dd HH:mm:ss
confluent_error_threshold_defaultThis configures that the schema crawl fails if there are more error records than this configured limit, or more than successful records.5
confluent_max_num_records_defaultMaximum records to consider for schema crawl.20000
confluent_streaming_frequency_num_defaultTime unit for frequency type for schema crawl.10
confluent_streaming_frequency_type_defaultStreaming frequency for schema crawl.secs
confluent_streaming_mode_defaultStreaming mode for schema crawl in confluent. Options are micro_batch and polling.micro_batch
confluent_auto_offsetThis configures the start point for a query. The "earliest" option ingests all the messages from the beginning, "latest" option ingests the messages that are pushed to topic after the query has started (which are the latest messages).earliest
use_custom_deserializerIf set, the confluent_deserializer_ class value will be used for json message format. I the json messages are added to confluent topic using string serializer then set this configuration to false.True
table_retrieval_mechanism-session_catalog
sfdc_socket_timeout_for_restSocket timeout when fetch mechanism is REST.2000
sfdc_bulk_timeoutTimeout when fetch mechanism is bulk.600000
sfdc_batch_size_for_rest-1200000
error_record_cache_count-100000
in_memory_cache_count-100000
serialize_hadoop_conf-True
sfdc_timeoutConnection timeout for salesforce soap api login.1200000
ingestion_spark_configs-Null
ingestion_service_worker_pool_size-20
environment_home-/infoworks
run_optimize_tableRun OPTIMIZE table command for the table.False
optimize_delta_writeThis sets property for delta tables “delta.autoOptimize.optimizeWrite = true”.False
auto_compact_delta_tableThis sets property for delta tables “delta.autoOptimize.autoCompact = true”.False
kafka_starting_offsetsThis configures the start point for a query. The "earliest" option ingests all the messages from the beginning, "latest" option ingests the messages that are pushed to topic after the query has started (which are the latest messages).Earliest
kafka_min_partitionsThis configures the desired minimum number of partitions to read from Kafka. By default, Spark has a 1-1 mapping of topic partitions to Spark partitions consuming from Kafka.Null
kafka_group_idThis configures the Kafka Group ID to be used in the Kafka consumer, while reading data from Kafka. By default, each query generates a UUID for reading data.Null
kafka_optimize_query_batch_intervalThis configures the number of batches after which the optimize query will be queried10
kafka_consumer_num_retriesThis configures the Kafka consumer retries for metacrawl.5
kafka_meta_ crawl_error_thresholdThis configures the threshold percentage for the meta crawl to fail. Meta crawl fails if the percentage of error records received is more than this configured value.50
fail_on_data_lossThis configures whether the query must be failed when there is a possibility that the data is lost (For example, topics are deleted, or offsets are out of range).False
kafka_extra_propertiesThis configures the start point for a query. The "earliest" option ingests all the messages from the beginning, "latest" option ingests the messages that are pushed to topic after the query has started (which are the latest messages).Null
ui_urlThe host for sending datacrawl notification.Localhost
ui_portThe port for sending datacrawl notification.3000
meteor_https_enabled-False
meteor_https_port-2000
rdbms_fetch_sizeThe JDBC fetch size, which determines how many rows to fetch per round trip1000
mongo_date_time_formatDateTime format for mongo.yyyy-MM-dd'T'HH:mm:ss.SSS'Z'
mongo_timezonetime zone for mongoUTC
cdc_comparator_key->=
tpt_segment_file_ts-000000000
file_split_size-134217728
data_types_ with_precision_scaleDatatypes which have precision and scale.2,3
default_precisionDefault precision of numeric and decimal datatypes.38
default_scaleDefault Scale of numeric and decimal datatypes.18
use_query_for_meta_crawl_generic_jdbcWhether to use query for metacrawl for generic jdbc sources.True
stop_streamin_polling_frequency-15
use_type_cast-False
parquet_infer_schemaWhether to infer schema for parquet source.False
delete_segment_staging_tptWhether to delete the staging area for teradata segments.True
restricted_visibility_mode

When True, sources are private and accessible by other users only via domain based access. When False, only role based access is checked.

NOTE Platform restart is required when Infoworks REST APIs are in use and the value is changed.

False
iw_spark_app_confSpark configuration key to be used across ingestion and transformationspark.key.1=value1 ; spark.key.2=value2 (values are separated by “;”)
export_metrics_captureConfiguration option to retrieve the export metrics. By default, this option is set to false. Set the value to true to retrieve metrics for export jobs.False
tpt_file_escape_charEscapeTextDelimiter attribute in TPT Data Connector operator\
tpt_file_quote_escape_charEscapeQuoteDelimiter attributte in TPT Data Connector operator\
tpt_file_enclose_charOpenQuoteMark and CloseQuoteMark attributes in TPT Data Connector operator"
tpt_delimiterTextDelimiter attribute in TPT Data Connector operator|
tpt_io_buffer_sizeIOBufferSize attribute in TPT Data Connector operator1048575
tpt_use_hex_escape_charsUse Hex escape characterFalse
tpt_cloud_prefixPrefix base name for intermediate cloud in Data Connector initialization stringiw
tpt_cloud_single_part_fileSinglePartFile attribute in Data Connector initialization stringFalse
tpt_cloud_compressUse compressed files for TPT dumpFalse
tpt_cloud_buffer_countBufferCount attribute in Data Connector initialization stringNull
tpt_cloud_buffer_sizeBufferSize attribute in Data Connector initialization stringNull
tpt_cloud_connection_countConnectionCount attribute in Data Connector initialization string10
tpt_files_base_pathBase path for all tpt job related files (configs,logs,scripts)/tmp/
tpt_cloud_config_directoryRelative path for TPT config filestpt/cloudConfig
tpt_cloud_max_object_sizeMaxObjectSize attribute in Data Connector initialization string1024M
tpt_trace_levelTraceLevel attribute in Data Connector initialization stringNull
tpt_https_proxyHttpsProxy attribute in GCP Access module initialization stringNull
tpt_add_scriptAdd TPT script to cluster logsFalse
csv_comments_identifierThis is used to set a comment character in CSV files.#
treat_empty_string_as_nullWhen set to false, empty string will be fetched as empty string, and not nullTrue
fetch_null_recordsWhen set to true, records will be fetched with null watermark valueFalse
dt_interactive_service_clusterYou can choose whether to run the worker interactive service on control plane or data plane. Set the value to Databricks to run it on data plane.Local
dt_snowflake_default_query_size_kbSnowflake has a limitation of query size not being greater than ~520 KB. You can create cut points within the pipeline and break the query into smaller ones if the limit is being exceeded.500 KB
dt_sf_cutpoint_optimization_enabledSet this value to False to disable the cut point optimisation.True
should_delete_tables_source_delete

When set to True, if the table is Infoworks managed, it will be deleted from snowflake database on deletion of the source. But if the table is user managed, it will just be truncated and not deleted from the Snowflake database.

NOTE This key can be set at each source level or at the global level.

False
dt_preview_data_disabledWhen set to True, preview data is disabled globally and user gets a warning message "Preview data is disabled".False
add_columns_for_alter_tableWhen set to True, the alter table queries runs to modify the exported table, or else user can alter the tables manually as per their need.True
managed_mongoWhen set to True, MongoDB is on Infoworks edge node, or else it is on MongoDb Atlas or external Mongo.True
is_quiescent_enabledWhen set to True puts the system in quiescent mode, any new jobs that are submitted will move to pending state. Any jobs that are already running will have no affect and continue till completion. Pause and resume of workflow will not be allowed and newly submitted run of the workflow will pause the whole workflow itself.False
system_nameThis indicates the page title so that the user can correctly identify the IWX system/environment.NULL
dt_bigquery_session_project_idUse this configuration to specify the project id to use for the BigQuery session created for the BigQuery target jobs in dt.<gcp_project_name>
ingestion_bigquery_session_project_idUse this configuration to specify the project id to use for the BigQuery session created for the BigQuery target jobs in Ingestion.<gcp_project_name>
enable_schema_syncTo enable schema evolution from source to Ingestion target table, set the configuration to true.True
databricks_init_script_workspace_directoryThe configuration can used to set the location of init-scripts under databricks workspace./Shared
consider_scale_user_managedFor scale comparison for decimal and integer numbers on snowflake (for user managed table)false
consider_column_datatype_user_managedFor comparing column datatype between iwx column type and target column type on snowflake (for user managed table)true
consider_column_map_size_user_managedFor comparing number of columns between iwx target table on snowflake (for user managed table)true
consider_precision_user_managedFor precision comparison for decimal and integer numbers on snowflake (for user managed table)false
test_job_idsThe configuration takes comma separated job ids, when set in quiescent mode system will try to run the provided job ids.-
test_workflow_idsThe configuration takes comma separated workflow ids, when set in quiescent mode system will enable pause and resume for the particular workflows, user then can resume those workflows, newly submitted run of the workflow will not pause the workflows provided.-
dt_use_gte_for_load_incrementalFor configuring load incremental source table to use “>=” filter condition for the watermark limits while reading from the source during the pipeline build jobsfalse
use_ingestion_time_as_watermarkFor configuring source nodes in the pipeline to use ziw_target_timestamp as a watermark columnfalse
workflow_email_attachment_size_limitDefines the maximum size allowed for a single email attachment in MB20
workflow_email_total_attachments_size_limitDefines the maximum cumulative size of all attachments in a single email in MB50
iw_messaging_max_request_size_mbMax size of the request body for messaging service51

Update password for MongoDB and HIVE

To encrypt and update password for MongoDB or HIVE in the Infoworks system, perform the following:

  • Execute the following interactive bash script: $IW_HOME/apricot-meteor/infoworks_python/infoworks/bin/infoworks_security.sh -encrypt -p
  • Enter the new plain text password when prompted.
  • Copy the encrypted password displayed in the result and update in the Infoworks conf.properties file located in the $IW_HOME/conf folder.
  Last updated by Monika Momaya
On This Page
General Configuration