Infoworks Replicator 4.0
Getting Started

Network Throttling

Data copy is performed with parallel tasks which use network bandwidth. To restrict the total amount of bytes transferred per second, Infoworks provides the ability to throttle the data transfer at Infoworks Cluster entity and Infoworks Workflow job entity. This allows you to specify the maximum bytes transferred per second for an individual task running in parallel with other tasks.

Static Throttling

For Hive batch and incremental replications, the static throttling limit can be specified in the Advanced Configurations section with key as BANDWIDTH and value as the amount of threshold data in megabytes. The job loads the static throttling value when the job starts and then uses this value until the job ends.

Infoworks Cluster Level and Job Level Bandwidth Advance Configurations

The following two advance configurations are to be used together for controlling the network throttling at the Infoworks Source Cluster entity and Infoworks job level entity.

  • MAX_BANDWIDTH_MB : Should be set in advanced configurations of source cluster
  • JOB_BANDWIDTH_MB : Can be set in advanced configurations of workflow (job level)

Assumptions

  1. By default, if no configurations are set for throttling, each reducer will run with 100 MB bandwidth (full bandwidth for each reducer)
  2. If MAX_BANDWIDTH_MB is not set for the Infoworks source cluster entity, default will be INTEGER_MAX (full bandwidth for all jobs)
  3. If both BANDWIDTH and JOB_BANDWIDTH_MB are set in the workflow, JOB_BANDWIDTH_MB will take precedence over BANDWIDTH.

Legacy Job Level Bandwidth Advance Configuration

BANDWIDTH: The job level (workflow) in MB and this bandwidth will be the maximum in each reducer for reading the file. So, if the copy parallelism set by user is p and bandwidth set at job level is b, the maximum bandwidth that the job can use will be (p * b).