Infoworks 6.1.3
Onboard Data

Metadata Crawl from Hive

This functionality allows you to get the metadata of already existing hive tables, so that they can be used in pipelines downstream and can be used in conjunction with tables ingested from other sources.

Creating a Hive Source

The following are the steps to create a Hive source:

  1. In the left navigation pane of Infoworks UI page, click the Data Sources icon.
  1. Click Onboard New Data. The Source Connectors page appears with the list of all available connectors.
  2. In the Search... bar, type Hive Metadata Sync.
  3. Click the Hive Metadata Sync connector. The configuration page of the connector appears.

Configuring a Hive Source

The following are the steps to configure a Hive source:

  1. In the Configure Source & Target page, enter the following configuration details.
FieldDescription
Source NameProvide a source name for the target table.
Source Catalog Name

Provide a source catalog name. The source will have limited assets belonging to just this catalog.

NOTE Supported only for Unity environment.

Fetch Data UsingThe mechanism through which Infoworks fetches data from the database. Default option is Spark.
Data Environment

Select the environment where the tables are registered. Infoworks will spawn a spark session in the persistent cluster running in the environment and fetch all the tables registered.

NOTE Snowflake environment is not supported.

StorageSelect from one of the storage options defined in the environment.
Base LocationThe path to the base/target directory where all the data should be stored.
Catalog Name

The catalog name of the target table.

NOTE This field will be available only to the Unity Catalog enabled data environments.

Schema NameThe schema name of the target table.
  1. Click Save button. Click Next.

Select Tables

You can select the tables for which the metadata crawl is required. You can add more tables later.

  1. In the Select Tables step, you can choose to Browse entire source or Filter tables to browse.
  2. Filter the tables by Catalog Name, Schema Name, Table Name, by entering multiple names separated by comma or by using a "%" as a wildcard.
  3. Click Browse Source. The Browse source area appears.
  4. Select the check boxes against the relevant table(s), and click Add Selected Tables.
  5. Click Crawl Metadata to proceed. A success message appears.

NOTE To view the job status, click View Job Status.

  Last updated by Monika Momaya