Infoworks automates onboarding data directly to Snowflake and supports data transformation and orchestration in Snowflake. To onboard data directly to Snowflake, you should configure a Snowflake environment that includes cloud storage and one or more Spark clusters. Cloud storage is used temporarily to stage data during ingestion and to store sample data.
Ensure that the Snowflake database user has an active Snowflake account.
Ensure that the Snowflake database user has the following privileges. These privileges apply to the Snowflake tables.
Ensure that the Snowflake database user has the following privileges. These privileges apply to the Snowflake schema.
Infoworks requires access to an existing database on snowflake accounts which are configured with Infoworks to calculate billing information, By default, public database is used. No data will be stored under this database, this is needed for querying purpose only.
In case you are unable to create public database, configure an alternative database using snowflake_billing_default_database config.
Database name should not contain #.
To configure and connect to the required Amazon EMR instance, navigate to Admin > Manage Data Environments, and then click Add button under the Snowflake option.

The following window appears:

There are three tabs to be configured as follows:
To configure the data environment details, enter values in the following fields. This defines the environmental parameters, to allow Infoworks to be configured to the required Snowflake instance:

After entering all the required values, click Continue to move to the Compute tab.
A Compute template is the infrastructure used to execute a job. This compute infrastructure requires access to the metastore and storage that needs to be processed. To configure the compute details, enter values in the following fields. This defines the compute template parameters, to allow Infoworks to be configured to the required Snowflake instance.
You can select one of the clusters as the default cluster for running the jobs. However, this can be overwritten at job individual job level.
Infoworks supports creating multiple persistent clusters in an Snowflake environment, by clicking on Add Compute button.

Enter the fields in the Compute section:
You will encounter the following limitations while running batch jobs on Databricks Persistent Cluster:
To configure the storage details, enter values in the following fields. This defines the storage parameters, to allow Infoworks to be configured to the required Snowflake instance. After configuring a storage, you can choose to make it default storage for all jobs. However, this can be overwritten at job individual job level.

Enter the following fields under the Storage section:
On selecting Azure DataLake Storage(ADLS) Gen 1 as the storage type, the following fields appear:
On selecting Azure DataLake Storage (ADLS) Gen 2 as the storage type, the following fields appear:
On selecting WASB as the storage type, the following fields appear:
On selecting S3 as the storage type, the following fields appear:
On selecting GCS as the storage type, the following fields appear:
After entering all the required values, click Save. Click Return to Manage Environments to view and access the list of all the environments configured. Edit, Clone, and Delete actions are available on the UI, corresponding to every configured environment.
