Data copy is performed with parallel tasks which use network bandwidth. To restrict the total amount of bytes transferred per second, Infoworks provides the ability to throttle the data transfer at Infoworks Cluster entity and Infoworks Workflow job entity. This allows you to specify the maximum bytes transferred per second for an individual task running in parallel with other tasks. |
For Hive batch and incremental replications, the static throttling limit can be specified in the Advanced Configurations section with key as BANDWIDTH and value as the amount of threshold data in megabytes. The job loads the static throttling value when the job starts and then uses this value until the job ends.
The following two advance configurations are to be used together for controlling the network throttling at the Infoworks Source Cluster entity and Infoworks job level entity.
BANDWIDTH: The job level (workflow) in MB and this bandwidth will be the maximum in each reducer for reading the file. So, if the copy parallelism set by user is p and bandwidth set at job level is b, the maximum bandwidth that the job can use will be (p * b).