TechTacoFriday
  • Home
  • About
  • Where to find me
Sign in Subscribe
AzDO Pipelines

Provision your Databricks Workspace infrastructure with BICEP

Discover the power of BICEP in provisioning a robust Databricks workspace infrastructure, including the Azure Data Factory, Key Vault and Storage account; I'll use some PowerShell to configure RBAC permissions between resources and I'll show you how use the logging command task.setvariable

Hector Sven

Hector Sven

13 Feb 2024 โ€” 5 min read
Provision your Databricks Workspace infrastructure with BICEP

Welcome ๐Ÿ˜! This would be Part 1 of the series ...

Streamlining Databricks: CI/CD your Notebooks with DevOps Pipelines and orchestrate via Azure Data Factory (Series)
On this series Iโ€™m going to show you how to provision your Databricks infrastructure with BICEP and to connect your workspace to Azureโ€™s Entra ID to manage users & groups. Furthermore, Iโ€™ll show you how to deploy your notebooks across Environments with yaml pipelines and orchestrate with ADF
TechTacoFridayHector Sven

Setting the stage

Buckle up as this would be quite a ride, but I hope a useful one at very least ๐Ÿ˜…๐Ÿคž from my previous examples, this example's complexity is higher in at least one order of magnitude, nonetheless, necessary to meet real world solutions.

โœ๏ธ
With regards to the code, be advise that the code explained throughout this article compared to the one in the repository may suffer some changes during the series, at the end, there will be one final codebase that will evolve over time, nonetheless, rest assure that all code in the images presented works just as well as the latest published in the repository ๐Ÿ˜‰

Solution

On the next part of the article, I'll explain just the most relevant parts of the code

๐Ÿง‘โ€๐Ÿ’ป
REMEMBER TO DOWNLOAD THE CODE!!!! from this AzDO Git repository streamlining-databricks-with-devops-and-adf ๐Ÿ˜๐Ÿคž and follow along.

Deploying the Infrastructure

Of course with are going to start with the start of the show... Databricks Workspace's BICEP! Perhaps you will find shocking how easy this template looks like right? And that is exactly the point, is that easy!

I want you to take note on the fact that each Databricks Workspace creates along with it a managed resource group (1), in a nutshell, is a resource group that holds related resources for an Azure solution; I also want you to observe the Output section (2), we are sending back the Id and Url of the workspace and we will configure ADF liked services with this information, hence, streamlining the pipeline ๐Ÿ˜.

๐Ÿค“
Of course, there're more options available on the latest API version Microsoft.Databricks workspaces 2023-02-01; options for deploying your workspace within your own virtual network (aka VNet injection), and encryption are available.

In our example environment, we will be creating some additional resources, in several of my prev. articles, I showed an example of Azure Data Factory, but I'm going to show you something new!!! Let's use BICEP to create a linked service and use the output values from Databricks directly.

As shown, we are creating a linked service of type AzureDatabricks (1), and using the return output from Databricks BICEP module, we configure our linked service with the domain (2) and workspace resource id (3)

Following the same logic, we will also deploying an Azure Key Vault, a Storage Account with its Hierarchy namespace enabled (named Data Lake in the example).

Retrieving values from Deployment

There are a few additional tasks that are not possible to achieve via BICEP, but in order to complete it, we DO need its output, this is how I do it.

Once task Azure Resource Manager template deployment (1) is complete, we provide a name to the outputs section using the option deploymentOutputs (2), we us ARM_OUTPUTS (3) to capture the output values, we then convert from json to and object and we iterate using ForEach and create variables that can be used in subsequent tasks in the pipeline using the logging command (5)

Write-Output "##vso[task.setvariable variable=NameOfVariable;]ValueOfVariable"
๐Ÿค“
Logging command task.setvariable is very important, hence I recommend to check the article in the link... Spoiler alert! I'll be using its property isoutput=true shortly

Setting RBAC permission with Power Shell

We will use an Azure Power Shell task to set RBAC permissions

We have the Databricks workspace name from our variables (1) and we use the ADF's Object Id (2) extracted from the outputs and use with New-AzRoleAssignment PowerShell cmdlet

Preparing the agent and creating a KV Secret scope

Tasks 1 to 3 prepare DevOps pipeline pool agent, first by indicating what version of Python to run (1), generate the Entra's ID authentication token (2) for the service connection, it then creates the [DEFAULT] configuration profile (3) for the workspace.

We finally create the Secret scope (4) to the prev. created Azure Key Vault for secure and centralized storage of application secrets.

Create the Workspace Cluster

Last but not least... creating the cluster ๐Ÿ˜…! Remember that we are still under the context of our previously created [DEFAULT] configuration profile in the a DevOps pipeline pool agent powering our job, therefore, we just need to call via Databricks CLI for a new cluster defined as a json file and get its ID (1).

Also notice how we we are now using isOutput=true along with the task.setvariable, the reason is because we are shortly going to use another job, this property allows variables to be fetch as a dependency as show below.

If you work with Databricks before, you'll know that clusters take some time to start ๐Ÿ˜ฌ and since this is an asynchronous operation, I just use a Manual Validation task to check the cluster status before continue (2), there's other ways to "wait" using Delays or loops to check for the cluster status, use the one that fits your needs, I will leave two additional methods in the code ๐Ÿ˜.

For our gran finale, we just terminate the cluster (3) to avoid any unnecessary costs ๐Ÿ˜Š


Execute the deployment

I explained the code, it is now your turn ...

๐Ÿง‘โ€๐Ÿ’ป
DOWNLOAD THE CODE!!!! from this AzDO Git repository streamlining-databricks-with-devops-and-adf ๐Ÿ˜๐Ÿคž and follow along.

Then, check on the video at the end of below article where I show you how to create the Azure DevOps pipeline

Resilient Azure DevOps YAML Pipeline
Embark on a journey from Classic to YAML pipelines in Azure DevOps with me. I transitioned, faced challenges, and found it all worthwhile. Follow to learn practical configurations and detailed insights, bypassing the debate on YAML vs Classic. Your guide in mastering Azure DevOps.
TechTacoFridayHector Sven

Run the pipeline and monitor the deployment by clicking on your Resource Group (1) > Deployments (2), you can view each resource's deployment status (3)


Call to action

On my next article, I'm going to talk about how to use Users & Groups Management via Azure's Entra ID for your Databricks setup, if you like to get it directly via email, below is the subscription button... wish you all the best!

Read more

MSFT Zero to Hero - SQL Meets Fabric

MSFT Zero to Hero - SQL Meets Fabric

Last Friday, I had the pleasure of presenting a live session as part of the Microsoft Zero to Hero series, focused on a topic close to my heart: modernizing SQL workloads in Microsoft Fabric without starting from scratch.

By Hector Sven 18 May 2025
Dataverse Meets Fabric - What About Monitoring?

Dataverse Meets Fabric - What About Monitoring?

Learn how to monitor your Dataverse-to-Fabric integrationโ€”whether via Synapse Link or Direct Link. From system table tracking to custom alerting, this article helps you ensure your data pipeline stays healthy and up-to-date.

By Hector Sven 01 May 2025
Dataverse Meets Fabric - Direct Link

Dataverse Meets Fabric - Direct Link

Discover how to seamlessly connect Dataverse to Microsoft Fabric using the native Link to Fabric option. No infrastructure neededโ€”just a few clicks to get your data flowing into a Fabric Lakehouse, ready for analytics and Power BI integration.

By Hector Sven 21 Apr 2025
Dataverse Meets Fabric - Link via Azure Synapse Analytics

Dataverse Meets Fabric - Link via Azure Synapse Analytics

Learn how to integrate Dataverse with Microsoft Fabric using Azure Synapse Link. This step-by-step guide walks you through the setup using Bicep, YAML pipelines, and best practices for a scalable, Azure-native data pipeline.

By Hector Sven 16 Apr 2025
TechTacoFriday
  • Sign up
Powered by Ghost

TechTacoFriday

Technology spiced with a taco flavor mix