Data profiling in Infoworks provides insights into the characteristics of data within data sources, aiding in understanding datasets and detecting anomalies. Profiling is essential for assessing data quality, cleaning, and validation before using the data for further analytics or ETL processes.
In Infoworks, data profiling can be accessed through the Project Workspace within a project. Each table can be selected to show detailed profiling results, providing an overview of each column in terms of statistical data and metadata.
The Profiling Results panel offers comprehensive metrics for each column in the selected table, assisting in understanding data distribution and quality.
For each selected column, profiling results display a series of metrics, including:
Field | Description |
---|---|
Histogram | A visual representation of the data distribution. |
Column Name | The name of the column being analyzed. |
Data Type | The data type of the column, such as BOOLEAN, NUMBER etc. |
Nulls | Indicates the number of null values present in the column. |
Unique Values | Displays the number of unique values in the column. |
Min/Max Values | Minimum and maximum values recorded in the column. |
Mean | Average value for numeric columns. |
Average String Length | For text fields, shows the average length of strings. |
Standard Deviation | Shows the spread of numeric data in the column. |
Sample Values | A preview of sample values present in the column, helping to quickly understand the type of data within the column. |
Data profiling in Infoworks is an integral step toward understanding and preparing data for further analysis. The profiling metrics provide both an at-a-glance and in-depth look at each column, enabling data analysts to make informed decisions about data quality and readiness for downstream processes.