Following are the tables created during the ingestion process:
- Current Table: The table being crawled, for example, Orders. Schema: source columns + audit columns.
- CDC Tables: The table where data from each CDC table is stored, for example, cdc_Orders_. Schema: source columns + audit columns.
- History Table: The table that stores the data from all the CDC runs, for example, history_Orders. This table contains the entire data that has been crawled in every data load. Schema: source columns + audit columns.
- Error Table: The table where the error rows are stored. This table is currently only created for File and Salesforce sources. During data crawl, the data that cannot be crawled are stored here, for example, Orders_error. Schema: path, reason, record, jobId.
Following are the audit columns used during the ingestion process:
- ziw_target_timestamp: The timestamp when the data was crawled.
- ziw_is_deleted: The column to determine if a row was deleted or not (used in CDC and history tables).
- ziw_file_name: The name and relative path of the file where the particular row was read.
- ziw_file_modified_timestamp: The modified timestamp of the file from which the record was read.