Title
Create new category
Edit page index title
Edit category
Edit link
Incremental ingestion failing with error during table crawl due to path target_hdfs_path table_id merged orc does not ex
Incremental ingestion failing with error during table crawl due to path /target_hdfs_path/table_id/merged/orc does not exist
Problem Description:
Incremental ingestion failing with ///merged/orc does not exists. Sample stack trace looks like below,
[INFO] 2021-03-17 07:05:41,745 [pool-5-thread-2] infoworks.tools.hadoop.hdfs.HDFSUtils:629 :: Creating hdfs directory /data/PROD/core/infoworks/prod_core_db_infoworks_dhub_life70/5fbf7adbafba099dae5901f3//cdc//orc/[ERROR] 2021-03-17 07:05:41,770 [pool-5-thread-2] infoworks.discovery.dbcrawler.rdbms.utils.CrawlWorkerThread:314 :: Error during table crawl due to Path /data/PROD/core/infoworks/prod_core_db_infoworks_dhub_life70/5fbf7adbafba099dae5901f3/merged/orc does not existjava.io.FileNotFoundException: Path /data/PROD/core/infoworks/prod_core_db_infoworks_dhub_life70/5fbf7adbafba099dae5901f3/merged/orc does not exist at infoworks.tools.hadoop.hdfs.HDFSUtils.recusiveFirstFileSearch(HDFSUtils.java:336) at infoworks.tools.format.OrcUtils.getHiveSchema(OrcUtils.java:216) at infoworks.tools.format.OrcUtils.getHiveSchema(OrcUtils.java:212) at infoworks.discovery.dbcrawler.rdbms.utils.CrawlWorkerThread.addNewPartitionsPostCDC(CrawlWorkerThread.java:529) at infoworks.discovery.dbcrawler.rdbms.utils.CrawlWorkerThread.postCrawlData(CrawlWorkerThread.java:509) at infoworks.discovery.dbcrawler.rdbms.utils.CrawlWorkerThread.crawlData(CrawlWorkerThread.java:683) at infoworks.discovery.dbcrawler.rdbms.utils.CrawlWorkerThread.call(CrawlWorkerThread.java:257) at infoworks.discovery.dbcrawler.rdbms.utils.CrawlWorkerThread.call(CrawlWorkerThread.java:75) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)at java.lang.Thread.run(Thread.java:748)Root cause:
This happens when someone deletes the target HDFS path manually. Infoworks maintains directory structure for ingestion job and the final data set will be stored inside /merged directory by the end of each job. And if someone deletes this directory subsequent incremental job will fail with the above-mentioned error.
Solution:
To fix this issue need to run the ingestion as initialize and ingest(Full load). This will populate the directory structures in the underlying storage location.
Applicable IWX versions:
IWX 2.X, 3.x
For more details, refer to our Knowledge Base and Best Practices!
For help, contact our support team!
© UNIPHORE TECHNOLOGIES 2025 | Confidential