| |
---|
Infoworks can read logs from Oracle servers and crawl changed data from these logs. Oracle log-based ingestion can help reduce the load on the source Oracle server. Additionally, users can also track record deletes using Oracle log-based CDC.
|
Following are the advantages in using Oracle log-based ingestion:
- Deletion of records can be captured using the log-based CDC mechanism.
- Does not require an incremental column (watermark column) in data to capture CDC.
- Complete history of data changes are captured without scanning the source tables.
- No load on source database for fetching CDC data.
In Oracle log-based incremental ingestion, data is ingested into Delta lake and continuously synchronized with the database, using a temporary table to store and read archive logs.
In this approach, Infoworks process starts the LogMiner instance and copies log data into a temporary table, either in the same schema or any other schema to which Infoworks user has access. Infoworks reads the redo logs from the temporary table to create Spark DataFrame and writes to delta lake in specified format.
Before configuring Oracle CDC with LogMiner Change Data Capture, review the following requirements and performance details:
- Verify that a valid Oracle environment exists for the Infoworks user.
- The Oracle source instance must be running in ARCHIVELOG mode, and Oracle global minimal supplemental logging must be enabled.
- A copy of the Oracle catalog must exist in the Oracle archived logs.
- Oracle LogMiner continuous mining reads archived redo logs only from the directory to which they were originally written.
- If Infoworks is not installed on the same machine as the Oracle instance, configure a Transparent Network Substrate(TNS) entry on the client machine with SERVER=DEDICATED in the CONNECT_DATA section of the connect descriptor. This specification is also required if the network is configured for Multi-Threaded Server (MTS) mode.
- Infoworks requires the Oracle Client binaries. When you install Oracle, the Client binaries are installed by default. To use SQL*Net connectivity on a machine that does not have an installed Oracle instance, the Oracle Client must be installed.
- If Oracle materialized views are used, Infoworks can capture change data from the master tables that underlie those views. Infoworks supports change capture for any type of materialized view. The view and its underlying table have a one-to-one correspondence and share the same name.
- Infoworks uses Oracle LogMiner to read change data from archived logs. If you use an archived log destination other than the LOG_ARCHIVE_DEST_1 path and LogMiner processing lags behind, problems might occur. In this situation, LogMiner starts reading change data from the archived logs in the LOG_ARCHIVE_DEST_1 directory. If these archived logs are inaccessible from the machine with the Oracle instance to which you are connected, the LogMiner session might fail.
- Infoworks uses Oracle LogMiner to perform Change Data Capture (CDC) on tables configured for log-based ingestion. LogMiner is a tool to access the Oracle archive logs, which contains all the activities performed on the database.
- Each row in archive log is a logical operation performed on a database and contains SQL REDO and UNDO statements to replicate or revert the original operation.
- In log-based incremental ingestion, Infoworks reads the log statements (DMLs) stored in database log files (archival logs in Oracle).
- Tables in log-based incremental ingestion requires only natural keys to be configured.
- As incremental data is being read from logs, deletion of records is handled from source tables.
- USE_UDF_FOR_WIDE_TABLE_CONCAT: The configuration to use Infoworks provided custom UDA for concatenation of wide tables instead of xmlagg method. The user must create UDA in the Infoworks temp schema manually. The default value is False.
- NUMBER_OF_LOG_FILES_TO_LOAD: The configuration used to restrict the number of log files to load in each run of CDC. The default value is -1, which represents all log files.
- PARALLEL_LOGMINER_SESSIONS: The number of open sessions to Oracle to start LogMiner in parallel. To improve the performance of temp table creation, the user can specify temp table creation parallelism, at the source level. The default value is 1. If the value is more than 1, the archive log files will be split across multiple threads to start LogMiner in parallel. This method works only with an Online Catalog as dictionary or with an offline dictionary. Both the dictionary options do not support schema synchronization.
Limitations
- Derived Split-by is not supported.
- BFILE, BLOB, CLOB, and XML are not supported.
Infoworks uses Oracle LogMiner to retrieve changes from the Oracle redo logs. Since Oracle does not log data or does not completely log data with some datatypes in the redo logs, Infoworks Oracle CDC with LogMiner cannot retrieve change data for all Oracle datatypes.
Following table indicates the Oracle datatypes supported by Infoworks Oracle CDC with LogMiner:
Oracle Datatypes | Supported | Comments |
---|
BFILE | N | Data for columns that have this datatype are not completely logged in the Oracle redo logs and cannot be captured. |
BINARY_DOUBLE | Y | |
BINARY_FLOAT | Y | |
CHAR | Y | |
DATE | Y | |
FLOAT | Y | |
LOBs | N | |
LONG | N | Columns of this datatype cannot be included in capture registrations. |
LONG RAW | N | Columns of this datatype cannot be included in capture registrations. |
NCHAR | Y | |
NUMBER | Y | PowerExchange handles NUMBER columns as follows: Numbers with a scale of 0 and a precision value less than 10 are treated as INTEGER. Numbers with a defined precision and scale are treated as NUMCHAR. Numbers with an undefined precision and scale are treated as DOUBLE. |
NVARCHAR2 | Y | |
RAW | Y | |
ROWID | Y | |
TIMESTAMP | Y | |
TIMESTAMP WITH TIME ZONE | N | |
TIMESTAMP WITH LOCAL TIME ZONE | N | |
UROWID | N | |
VARCHAR2 | Y | |
XML types | N | |
For details on configuring Oracle for LogMiner based CDC, see How to Configure Oracle for CDC.