This functionality allows you to get the metadata of already existing BigQuery tables, so that they can be used in pipelines downstream and can be used in conjunction with tables ingested from other sources.
The following are the steps to create a BigQuery source:
Step 1: In the left navigation pane of Infoworks UI page, click the Data Sources icon.
Step 2: Click Onboard New Data. The Source Connectors page appears with the list of all available connectors.
Step 3:In the Search... bar, type “BigQuery Metadata Sync”.
Step 4: Click the BigQuery Metadata Sync connector. The configuration page of the connector appears.
The following are the steps to configure a BigQuery source:
Step 1: In the Configure Source & Target page, enter the following configuration details.
Field | Description |
---|---|
Source Name | Provide a source name for the target table. |
Project ID | Provide the respective Project ID. This ID is present in the Google BigQuery Console. |
Data Environment | Select the environment where the tables are registered. Infoworks will spawn a spark session in the persistent cluster running in the environment and fetch all the tables registered. |
Temporary Storage | Select from one of the storage options defined in the BigQuery environment. |
Base Location | The path to the base/target directory where all the data should be stored. |
Make available in infoworks domains | Select the relevant domain from the dropdown list to make the source available in the selected domain. |
Step 2: Click the Save button. Click Next.
You can select the tables for which the metadata crawl is required. You can add more tables later.
Step 1: In the Select Tables step, you can choose to Browse entire source or Filter tables to browse.
Step 2: Filter the tables by Schema Name, Table Name, by entering multiple names separated by comma or by using a "%" as a wildcard.
Step 3: Click Browse Source. The Browse source area appears.
bulk_payload_record_size
is set to 6500, by default.
For the tables to appear quickly, scroll down to the Advanced Configurations section, and set the value of bulk_payload_record_size
to 100. The value can be changed at admin and source levels.
Step 4: Select the check boxes against the relevant table(s), and click Add Selected Tables.
Step 5: Click Crawl Metadata to proceed. A success message appears.
Metadata crawl has been triggered. To view the job status, click View Job Status.