Triggering Databricks Notebook from Infoworks Bash Node

Triggering Databricks Notebook from Infoworks Bash Node

Part 1: Creating a Databricks Job to Run a Notebook

1: Create the Notebook in Databricks

  1. Open your Databricks workspace.
  2. Navigate to the Workspace section and click CreateNotebook.
  3. Give the notebook a name (e.g., sample_Notebook) and select the language (Python, Scala, SQL, etc.).
  4. Write your code in the notebook.

2: Create a Databricks Job to Trigger the Notebook

  1. Go to Jobs in the Databricks workspace.
  2. Click Create Job.
  3. Provide the Job name (e.g., Sample_Job).
  4. Under Tasks, click Add task:
  • Task name: Notebook Task
  • Type: Notebook
  • Notebook path: Select the notebook you created (e.g., /Workspace/Users/your-notebook).
  • Optional: Add parameters if required by the notebook.
  • Choose the appropriate cluster configuration to run your notebook.
  1. Click Create to save the job.
  2. Note the Job ID. You will need this to trigger the job from the Infoworks Bash node.

Part 2: Triggering the Databricks Job from Infoworks Bash Node

Step 1: Authentication Using Azure Service Principal

Databricks requires authentication to trigger jobs via API. We will use Azure Service Principal credentials for this purpose.

Ensure you have the following:

  • client_id: Application ID of your Azure Service Principal.
  • tenant_id: Azure tenant ID.
  • client_secret: Azure client secret. Note: Please store the client secret in Azure Key Vault and create the secret name on Infoworks to use the 'client_secret' securely in bash node.
  • Databricks workspace URL: E.g., https://.azuredatabricks.net.
  • Job ID: From the Databricks job created in Part 1.

Step 2: Create Infoworks Bash Node to Trigger the Job and Handle Authentication

Copy the below bash script to Infoworks Bash Node. The script will,

  1. Authenticate with Azure.
  2. Trigger the Databricks Job.
  3. Poll for the job status until it completes.
  4. Handle token expiry if needed.
Copy

Create Workflow Parameters for below,

  1. client_id

  2. tenant_id

3.databricks_url

4.job_id

Replace secret_val within the bash script on with the env variable name mapped to 'secret name'

Copy

Hit Save and Run the workflow.

If you would like to use PAT token to authenticate to Databricks API instead of Azure Service Principal, please use the below script

The workflow parameters required are databricks_url and job_id.

For the PAT toke use Azure key vault use env variable to refer the secret

Copy
VariableType to search · ESC to discard
GlossaryType to search · ESC to discard
InsertType to search · ESC to discard
No matches