Introduction


	Operationalizing data is an add-on to the Infoworks which aims at solving the complexity of ETL workloads. It is especially designed for use by ETL Developers and Production Administrators.

Scope and Purpose

Majority of Enterprise Data Warehouse (EDW) power is spent on Extract Transform Load (ETL) tasks. Migrating ETL workloads from EDW to Hadoop is complex, manually intensive, and expensive.

Infoworks Orchestrator, the complete solution for workload automation and management, is the fastest way to offload ETL use cases and manage ETL workloads on supported Data environments and Export Targets.. It offers an easy-to-use visual editor to author and edit workflows. The ETL developer can drag and drop tasks from the left palette on the canvas and connect them in order to define the dependency.

This user guide covers all the capabilities of Infoworks Orchestrator.

Feature Highlights

Following are some of the features that set Infoworks Orchestrator apart from the other traditional ETL products:

Provides a user-friendly GUI-driven way to transform data.
Provides fault tolerance of production workloads.
Provides execution metrics for performance tuning.
Enables dynamic control of the production workloads.
Fetches immediate feedback on failed tasks.
Provides enterprise-friendly features, including domain-based access to sources and pipelines.
Ensures automatic dependency.

User Roles

The Production Administrators use Infoworks Orchestrator to monitor, control, debug, and performance tune the workloads.

Data Engineers or Analysts use Orchestrator to design the Orchestration of end-to-end use cases from data ingestion, synchronization, and building of data models.

Advantages

Complex tasks require a large number of production engineers to run the workloads in production. It is also difficult to optimize workload balancing across all available resources.

Infoworks Orchestrator solves the challenges faced during manual ETL workload management in the following ways:

Workload Management: It takes only a couple of minutes to debug job failures. Orchestrator provides fine-tuned control on execution logic such as parameters, automated dependency management, and ability to pause and continue or restart the workflow.
Efficient Resource Management: Automatically and simultaneously executes various tasks.
Easy to Design and Maintain: Complex ETL pipelines can be authored using a drag and drop tool providing an audit trail of changes made.
Fault Tolerance: Provides the ability to control, retry, and restart logic.
Future Proof ETL Pipeline: New developers do not have to worry about the underlying tools or programming languages.
Self Documentation: The workflows and pipelines are user-friendly and provide a visual of the process flow and task descriptions.

Pre-requisites

To achieve the required ETL workload management using Infoworks Orchestrator, you must either create the required Domains on the Infoworks product or use the applicable Domains existing in the system. If you do not have the pre-existing Domains in the system, follow these steps as pre-requisites before working on the Orchestrator.

NOTE Only a user with admin access can create the Domain, and add Sources to a Domain. If you do not have admin access, contact the administrator to perform these tasks.

Create a Domain.
Add the required Source/Sources to the Domain.

NOTE To process data, minimum of one Source must be added to a Domain. There is no limit on the maximum number of Sources.

If building pipelines are also part of the workflow, ensure that those pipelines are created in the same domain or accessible domain (the domain that has access to the domain where pipelines are present).

Concepts

Workflow

In the Orchestrator, a Workflow is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies.

For example, a simple Workflow could consist of three tasks: A, B, and C. It could say that A has to run successfully before B can run, but C can run anytime. It could say that task A times out after 5 minutes, and B can be restarted up to 5 times in case it fails. It might also say that the workflow will run every night at 10 pm, but shouldn't start until a certain date.

In this way, a Workflow describes how you want to carry out your workload; but notice that we haven't said anything about what we actually want to do! A, B, and C could be anything. Maybe A prepares data for B to analyze while C sends an email. Or perhaps A monitors your location so B can open your garage door while C turns on your house lights. The important thing is that the Workflow isn't concerned with what its constituent tasks do; its job is to make sure that whatever they do happens at the right time, or in the right order, or with the right handling of any unexpected issues.

Task

A task describes a single task in a workflow. Tasks are usually (but not always) atomic, meaning they can stand on their own and don't need to share resources with any other tasks. The Workflow will make sure that tasks run in the correct certain order; other than those dependencies, tasks generally run independently. In fact, they may run on two completely different machines.

When a task is added to a workflow (by dragging it onto the canvas), the task properties can be configured. For example, for a Pipeline build task, the user will need to specify the pipeline that should be built.

Task Dependency

Task dependencies define the order in which tasks in the workflow must be executed. Infoworks Orchestrator supports the following types of dependencies:

Sequential Execution: Task B runs after task A has successfully completed
Parallel Execution: Task B and C can run in parallel
Conditional Execution: After Task A has run, based on some condition, either Task B or C will be executed (the other task will be skipped)

Last updated on Jan 23, 2023

Was this page helpful?