Data Sharing options in Azure Data Platform

Samarendra Panda
6 min readMay 31, 2023

--

This blog post explores various methods of sharing data from Azure Data platforms. Depending on the architecture, we can select the most suitable approach. We make a few assumptions: (1) The data is stored in Azure Data Lake Gen2 physically, and (2) The recipient user is an external user (business users) who is not part of the organization. The following approaches will be primarily discussed:

  1. Data Share using Azure Databricks Delta Share.
  2. Data Share using Microsoft Purview.
  3. Data Share using the Azure Data Share

Data Share using Azure Databricks Delta Share:

Delta Sharing is an open protocol developed by Databricks for secure data sharing with other organizations regardless of the computing platforms they use. Azure Databricks builds Delta Sharing into its Unity Catalog data governance platform, enabling an Azure Databricks user, called a data provider, to share data with a person or group outside of their organization, called a data recipient.

With Delta Sharing, we can share a read-only set of tables notebooks with one or more recipients. A share is an object in Unity Catalog that is secured and can contain tables and notebook files from a single Unity Catalog metastore. It is similar to a bucket of objects that we can share with business users. The recipients lose access to the share if we remove it from our Unity Catalog metastore.

Delta Share architecture

We have 2 types of sharing in Azure Databricks that uses the Delta Sharing Protocol, where Azure Databricks performs as the Delta Sharing Server.

· Open sharing lets you share data with any user, whether or not they have access to Azure Databricks. The authentication is token based, hence we can use any client to access the shared data.

· Databricks-to-Databricks sharing lets you share data with Azure Databricks users who have access to a Unity Catalog metastore that is different from yours. Databricks-to-Databricks also supports notebook sharing, which is not available in open sharing.

Here are the high-level steps that are required to configure the Delta sharing.

Here are the high-level steps that are required to configure the Delta sharing.

Prerequisites:

1) Data Provider should enable the Unity Catalog for their workspace. How-To guide to set up Unity Catalog in Azure Databricks.

2) Enable Delta Sharing for the Unity Catalog metastore that manages the data you want to share.

Steps for Databricks-to-Databricks Sharing

At the Data Recipient’s end

1) If we are using the Databricks to Databricks Sharing method, we need to get the Data Recipient Identifier for Data Provider to create the recipients.

At the Data Provider’s end

1) Need to create the share for Delta Sharing

2) Create the recipient using the Data Recipient identifier. The identifier is the unique ID of Databricks workspace’s Unity Catalog metastore of the Data Recipient.

3) Provide Recipient access to Delta share that we have created #1.

4) Data Provider can add sharing objects in the share.

At the Data Recipient’s end

2) The share becomes available in the recipient’s Databricks workspace and Data Recipient can access it using Data Explorer. Data Recipient needs to create a Data Catalog. More details here.

Steps for Open Sharing

At the Data Provider’s end

1) Create a share for delta sharing.

2) Create the recipients. We don’t need the metastore unique ID to create the recipients. They may use the data from outside of databricks.

3) Provide access to the Delta share that we have created earlier.

4) Get the activation link that is needed by the Data Recipient to access the data.

At the Data Recipient’s end

1) Once the activation link is received from the Data Provider, The Data Recipient can download the credential. The activation link is one time use only. The credential file has the databricks bearer token and the databricks endpoint URL.

2) Data Recipients can read the data different ways using client tools like VS Code, PowerBI.

End-to-End demo:

Data share using Microsoft Purview

Microsoft Purview is a data governance service that helps organizations discover, understand, and manage various data assets. It allows businesses to gain insights into their data landscape by providing a unified and holistic view of data across various sources and platforms, both within and outside of the organization.

The data sharing feature in Microsoft Purview enables seamless sharing of data between ADLS Gen2 and Blob Storage accounts. This in-place sharing allows recipients (whether internal or external users) to instantly access and view the files, which are read-only.

The highlight of this option is that Data sharing is integrated with Microsoft Purview’s data cataloguing features, which enables fine-grained metadata and file sharing control. We can also reflect the data from the sender to the recipient instantly with inplace sharing.

Here are the key considerations for this feature-

  • As this feature is currently in preview, It is necessary to register a few preview features (AllowDataSharing , AllowDataSharingInHeroRegion ) at the subscription level for both source and target. The source and target storage accounts created post registration are eligible for Data sharing.
  • The source storage account must be registered in the Microsoft Purview account of the data sender, while the target storage account needs to be registered in the Microsoft Purview account of the data recipient.
  • Data senders should have reader permission in Microsoft Purview entities in order to share them. Additionally, they need to have either the Owner or Storage Blob Data Owner role on the storage account to be able to share the data.
  • Data recipients should have either the Contributor, Owner, Storage Blob Data Owner, or Storage Blob Data Contributor role on the target storage account.

End-to-End demo:

Data Share using the Azure Data Share

Azure Data Share enables organizations to securely share data with multiple customers and partners. Data providers are always in control of the data that they’ve shared, and Azure Data Share makes it simple to manage and monitor what data was shared, when and by whom.

Azure Data Share data flow diagram.

Snapshot based Sharing: Snapshot-based sharing involves transferring a copy/snapshot of the data to the recipient’s subscription, where it is stored in their storage account. The recipient is responsible for covering the storage expenses for storing the data in their subscription. Furthermore, data providers have the option to provide incremental updates to their data consumers by following a snapshot schedule.

In-place Sharing: By utilizing this capability, we can access the data without needing to replicate the recipient subscription. The data consumer has the freedom to consume the data using their own data store. Any modifications made to the source data become instantly accessible to the data consumer or recipients. As of now, in-pace sharing is possible for the Data Explorer type of source.

Here are the supported data sources, and the supported sharing mode — link.

· start sharing data — steps.

· Receive the data — steps.

Pricing: The Data share is being calculated based on the no. of snapshot that we have taken at the source subscription, and the number vCores hours spent on moving the data from the source subscription to destination subscription. More details can be found here.

End-to-End demo:

Hope this helps!

--

--