Creating Pipelines in Bulk


	Infoworks allows running a script to create pipelines with the same structure, in bulk.

NOTES

The domain must be associated with an environment.
All the sources and tables must be already associated with the domain.
SQL provided in the template must be valid.
If multiple sources are present with the same name, the first source with the name will be mapped by the script.

Usage

Following are the steps to run the script for bulk pipeline creation:

Navigate to the $IW_HOME/scripts/pipeline folder.
Run the script using the following command: python pipeline_create.py -s <input_sql> -c <input_csv> -t <TOKEN> -o <output_csv>

Where,

<input_sql> is the path of the SQL template based on which new pipelines will be created,
<input_csv> is the path of the CSV file that includes the specifics of the pipelines to be created,
<TOKEN> is the user authentication token obtained from the user settings page
<output_csv> is the output CSV file generated once the script is run.

Sample Query

 SQL 
    
xxxxxxxxxx
 
select * from {table1} UNION select * from {table2}
Copy

Where,

{table1}, {table2}...{tableN) are the alias for the actual tables given in the table_names column in the input CSV file.

Sample CSV Input

 SQL 
    
xxxxxxxxxx
 
domain_name,env_compute_template_name,env_storage_name,pipeline_name,source_name,table_names,sync_type,scd_type,target_schema,target_table,target_path,storage_format,target_natural_keys,target_partition_keysImportDomain,test,storage_dbfs,pipeline_test1,SalesDB_AP,"orders,order_details,products,categories",APPEND,SCD_1,dev_testing,big_ticket_sales1,/iw/pipelines/dev_testing/big_ticket_sales1,DELTA,category_name,shipcityImportDomain,test,storage_dbfs,pipeline_test2,SalesDB_AP,"orders,order_details,products,categories",MERGE,SCD_1,dev_testing,big_ticket_sales2,/iw/pipelines/dev_testing/big_ticket_sales2,DELTA,"category_name,shipcity",ImportDomain,test,storage_dbfs,pipeline_test3,SalesDB_AP,"orders,order_details,products,categories",OVERWRITE,SCD_1,dev_testing,big_ticket_sales3,/iw/pipelines/dev_testing/big_ticket_sales3,DELTA,"category_name,shipcity",
Copy

The CSV file must contain the following columns:

Domain Name
Environment Compute Template Name
Environment Storage Name
Pipeline Name
Source Name
Source Table Name
Target Sync Type [Overwrite/Append/Merge]
Target SCD Type [ SCD_1,SCD_2]
Target Schema
Target Table
Target Path
Target Storage Format [Parquet,ORC,CSV,JSON,AVRO,Delta]
Target Natural Keys (comma separated within [“] double quotes)
Target Partition Keys (comma separated within [“] double quotes)

The output CSV file includes the following columns:

PipelineName
Pipeline ID (created)
Error Description
Pipeline Name Already Exists
Table Not Found (Table Details)
Input Error

Last updated on

Was this page helpful?