Infoworks AI
Infoworks AI Product Documentation

Data Profiling

Data profiling in Infoworks provides insights into the characteristics of data within data sources, aiding in understanding datasets and detecting anomalies. Profiling is essential for assessing data quality, cleaning, and validation before using the data for further analytics or ETL processes.

Accessing Profiling Data

In Infoworks, data profiling can be accessed through the Project Workspace within a project. Each table can be selected to show detailed profiling results, providing an overview of each column in terms of statistical data and metadata.

  1. Navigate to the Workspace of your project.
  2. Select a table from your data source.
  3. Click on the three-dot menu next to the table and choose Show Profile to view profiling metrics.

NOTE Profiling metrics are available only for non-empty tables.

Profiling Results Interface

The Profiling Results panel offers comprehensive metrics for each column in the selected table, assisting in understanding data distribution and quality.

Column Profiling Metrics

For each selected column, profiling results display a series of metrics, including:

FieldDescription
HistogramA visual representation of the data distribution.
Column NameThe name of the column being analyzed.
Data TypeThe data type of the column, such as BOOLEAN, NUMBER etc.
NullsIndicates the number of null values present in the column.
Unique ValuesDisplays the number of unique values in the column.
Min/Max ValuesMinimum and maximum values recorded in the column.
MeanAverage value for numeric columns.
Average String LengthFor text fields, shows the average length of strings.
Standard DeviationShows the spread of numeric data in the column.
Sample ValuesA preview of sample values present in the column, helping to quickly understand the type of data within the column.

Data profiling in Infoworks is an integral step toward understanding and preparing data for further analysis. The profiling metrics provide both an at-a-glance and in-depth look at each column, enabling data analysts to make informed decisions about data quality and readiness for downstream processes.