Following are the steps to design a workflow:
Other than running regular tasks like Ingestion, building of pipelines and cubes, you can execute Bash commands and Hive queries, send notifications and also use a Decision Branch to add flow control.
Following is a sample workflow:
The following sections describe various options available on the Workflow Editor page.
Click View Options, make the required changes and click Update Layout to select the way you want the workflow to display on the workflow editor page.
The Overview option comes in handy if you are working on a complex workflow with many artifacts and tasks, and if the workflow exceeds the normal page view. The icon for the Overview option is represented as seen below:
Click the Overview option to open a pop-up window on the bottom right corner of the workflow editor page. Drag the cursor through the pop-up window to view a specific task/artifact or part of the workflow.
The user can search for specific keywords in the node names. The nodes which match the node string will be highlighted in blue.
On saving the node details for the first time, the name automatically is saved in the format {task_name}-{entity_name}-{sub_entity_name}. This is only applicable to Ingest Source, Build Pipeline, Run Workflow and Sync to External Target.
This section which appears as a left panel includes a list of all the tasks that can be added to the workflow. To view the properties of a task, drag and drop it to the editor and double-click it.
Following are the available tasks:
The Workflow Variables and Run Settings fields are same for all tasks except Manage Cluster Actions. These fields are explained below:
Setting Workflow Variables
Workflow variables are the static variables you can set to use in the downstream nodes. Following are the types of workflow variables:
On the Workflow Variables section, click Add Variables and enter the variable name and value. You can add as many variables as required. These variables will be applied to the downstream nodes and they will override any variable values that are set in the admin or domain configuration settings.
Run Settings
Run settings are the task level settings that control run time behaviour of the tasks in a workflow. Following are the options:
This option ingests the required source.
Double-click the Ingest Source Table task and enter the following properties:
The Build Pipeline task, as the name suggests, builds a selected pipeline.
Double-click the Build Pipeline task and enter the following properties:
This node allows you to trigger any workflow within a workflow.
Ensure the following:
Properties
Following are the Run Workflow properties:
This task sends notification emails to the list of email IDs specified in the task Properties window.
Double-click the Send Notification task and enter the following properties:
This task allows you to add workflow control. The conditional logic defined in this task yields in two or more branches and switch paths to take based on the output values of the condition.
Double-click the Decision Branch task and enter the following properties:
get_value("<task-ID>", "<variable-name>")== '<status>'
, where <task-ID> and <variable-name>
are available in the View available variables option, and values for <status>
are success or failed. This syntax is used to retrieve values from the upstream tasks in the same workflow. For example, get_value("JOB_123", "job_status")=='success'
On clicking the View available variables option, the Available Variables in this Task window is displayed. You can copy the required query using the Copy to clipboard option corresponding to each task ID, and then append the required <status>
value using the above described syntax.
To use a workflow parameter in a condition, use the following syntax:
=)key
Example 1: ‘=)run_type’ == ‘prod’
If the key is “run_type” and the value for it is “prod” in the workflow run parameter, the above condition will be true, thus the corresponding task will be triggered.
Example 2: '=){run_type}_=){id}' == 'prod_123'
The braces are the delimiter for the parameter name when it does not have leading or trailing spaces. In the example, the value of the parameter "run_type" is concatenated with an underscore and the parameter "id" and then compared.
Example 3: {{ params.runtime.run_type + '_' + params.runtime.id }} == 'prod_123'
The same expression as above. Anything inside the double braces is executed as python code and all workflow parameters are available in the python dict: "params.runtime".
Bash script task can be used to run multiple bash statements. You can enter the bash statements in the text area and they will be executed sequentially.
The last thing that will be echoed to the standard system output from the bash commands will be stored as a workflow variable with the name return_value.
Example: ls; pwd;
This will store the current directory in the workflow variable as it is the last value echoed to the standard output.
You can also alternatively run your script files directly using this task by specifying the exact location of your file.
Example: sh /Users/ec2-user/bash-sample.sh python /Users/ec2-user/python-sample.py
Following are the environment variables that can be accessed within the Bash Node script:
Additional environment variables: these are user configurable environment variables that can be configured to retrieve secrets from secret store and will be available in bash process with specified name.
In the bash script, a user-defined workflow variable can be used as follows:
echo {{ task_instance.xcom_pull(task_ids='DM_D53O', key='team') }}
where task_ids is the task where workflow variable is set and key is the name of the workflow variable
Inside a bash script, user can also use automatically generated variables directly as follows:
Usage of variables:
For information regarding securing bash nodes, refer to Securing Bash Nodes.
In the bash script, a user-defined workflow variable can be used as follows:
echo =)a
or echo =){a} or echo {{ params.runtime.a }}
where “a” is the workflow parameter and the single flower brackets act as a separator of the key from the remaining part of the text. Anything inside the double braces is executed as python code and all workflow parameters are available in the python dict: “params.runtime”.
Inside a bash script, you can also use the following parameters which are available by default:
You can run any custom script with any required libraries in the Bash Node in the Workflow in Kubernetes Infoworks. The script will be accessible to this container in the form of a mounted volume.
You need to provide an additional field Image Name in the Advanced Configuration tab: Image Name field contains a fully qualified image url to specify the custom image to run the script with. Since the field’s value is provided to Kubernetes, It is bound to the same rules.
You can also alternatively run your script files directly using this task by specifying the exact location of your file inside your image.
Example: sh /home/user/bash-sample.sh
python /Users/ec2-user/python-sample.py
If Service Mesh is enabled on the Kubernetes Installation, then the chosen Image must contain a curl
utility. This is required to interact with the Service Mesh utility for proper termination of the pod. The other option is to disable the Service Mesh for the Bash Node specifically using annotations.
The Manage Cluster Actions task allows you to create, terminate, start, and stop the persistent clusters.
Double-click the Manage Cluster Actions task, and enter the following properties:
Workflow Variables
Workflow variables are the static variables you can set to use in the downstream nodes. Following are the types of workflow variables:
In the Workflow Variables section, click Add Variables and enter the variable name and value. You can add as many variables as required. These variables are applied to the downstream nodes and they override any variable values that are set in the admin or domain configuration settings.
Run Settings
Run settings are the task level settings that control run time behaviour of the tasks in a workflow. Following are the options:
This node facilitates synchronizing data directly to external target via workflows.
Workflow Variables
Entity Type: This is defaulted to source.
Source: Select from a list of sources available corresponding to the data environments mapped to the domain.
Table Group: Select a table group from the existing list of table groups.
The Dummy task, as the name suggests, acts as a dummy interface between multiple tasks, each of which must be connected to each other. Adding a dummy task between such tasks will avoid confusions and also makes the workflow look organized.
The Workflow parameters provide the capability to add configurations to workflows. It eliminates the need to configure the parameters individually in each node under a workflow, which is a very tedious process.
You can set workflow-level parameters in the form of key-value pairs for each workflow. This set of key-value pairs is stored in the workflow metadata. However, this can be overridden at the runtime via API.
<table_name>.<parameter_name>
format. If only <parameter_name>
is provided, all tables with that parameter will have their values overridden.By default, every workflow parameter is available to the pipeline and overrides the pipeline parameter if defined with the same key. However, Pipeline parameters can be set at the workflow node level. This will override any other parameter set with the same key. This allows for passing parameters to individual Build Pipeline nodes from the workflow.
The workflow parameters in the pipeline node follow the following priority in decreasing order.
The workflow parameters in the ingest source follow the priority order below, in decreasing precedence:
To pass a user-defined workflow variable in a task, perform the following steps:
In the Send Notification task, a workflow variable can be used as follows:
echo {{ task_instance.xcom_pull(task_ids='DM_D53O', key='team') }}
where task_ids is the task where workflow variable is set, and key is the name of the workflow variable.
This notation can be used within the body or the subject.
Following are the steps to override workflow parameters while building workflow: