Instant feedback for data, syntax and semantic errors: During transformation, data errors like data format in column, regex on columns having issues, etc can be verified in the sample data.
Support to visualize flow of data to design better flows.
Auto-materialize transformation nodes for faster responses.
Automatic dependency management: When a transformation node is modified, the system automatically computes the dependent nodes. The platform uses a Mark and Sweep algorithm to perform this efficiently.
Safe handling, refactors column include/exclude/rename even in user-defined expressions.
Automatic rename of duplicate column names.
Reuse of Hive and Impala connections to support interactive viewing of data while designing pipelines.