Operationalizing data is an add-on to the Infoworks which aims at solving the complexity of ETL workloads. It is especially designed for use by ETL Developers and Production Administrators. |
Majority of Enterprise Data Warehouse (EDW) power is spent on Extract Transform Load (ETL) tasks. Migrating ETL workloads from EDW to Hadoop is complex, manually intensive, and expensive.
Infoworks Orchestrator, the complete solution for workload automation and management, is the fastest way to offload ETL use cases and manage ETL workloads on supported Data environments and Export Targets.. It offers an easy-to-use visual editor to author and edit workflows. The ETL developer can drag and drop tasks from the left palette on the canvas and connect them in order to define the dependency.
This user guide covers all the capabilities of Infoworks Orchestrator.
Following are some of the features that set Infoworks Orchestrator apart from the other traditional ETL products:
The Production Administrators use Infoworks Orchestrator to monitor, control, debug, and performance tune the workloads.
Data Engineers or Analysts use Orchestrator to design the Orchestration of end-to-end use cases from data ingestion, synchronization, and building of data models.
Complex tasks require a large number of production engineers to run the workloads in production. It is also difficult to optimize workload balancing across all available resources.
Infoworks Orchestrator solves the challenges faced during manual ETL workload management in the following ways:
To achieve the required ETL workload management using Infoworks Orchestrator, you must either create the required Domains on the Infoworks product or use the applicable Domains existing in the system. If you do not have the pre-existing Domains in the system, follow these steps as pre-requisites before working on the Orchestrator.
In the Orchestrator, a Workflow is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies.
For example, a simple Workflow could consist of three tasks: A, B, and C. It could say that A has to run successfully before B can run, but C can run anytime. It could say that task A times out after 5 minutes, and B can be restarted up to 5 times in case it fails. It might also say that the workflow will run every night at 10 pm, but shouldn't start until a certain date.
In this way, a Workflow describes how you want to carry out your workload; but notice that we haven't said anything about what we actually want to do! A, B, and C could be anything. Maybe A prepares data for B to analyze while C sends an email. Or perhaps A monitors your location so B can open your garage door while C turns on your house lights. The important thing is that the Workflow isn't concerned with what its constituent tasks do; its job is to make sure that whatever they do happens at the right time, or in the right order, or with the right handling of any unexpected issues.
A task describes a single task in a workflow. Tasks are usually (but not always) atomic, meaning they can stand on their own and don't need to share resources with any other tasks. The Workflow will make sure that tasks run in the correct certain order; other than those dependencies, tasks generally run independently. In fact, they may run on two completely different machines.
When a task is added to a workflow (by dragging it onto the canvas), the task properties can be configured. For example, for a Pipeline build task, the user will need to specify the pipeline that should be built.
Task dependencies define the order in which tasks in the workflow must be executed. Infoworks Orchestrator supports the following types of dependencies: