Introduction To Azure Databricks Delta Lake Architecture

This blog post covers process streaming and batch data with Delta Lake, Azure Databricks Delta Lake architecture that you’ll learn in Data Engineering on Microsoft Azure [DP-203].

“To ensure low latency and reliability within unified Streaming + Batch data pipelines use Delta Lakes as an optimization layer on top of blob storage”.

The topics covered in this blog are:

What Is Delta Lake?
Delta Lake Architecture
Perform Batch and Stream Processing with Delta Lake
Deploy An Azure Databricks Workspace
Create A Cluster
Perform batch and stream processing

What Is Delta Lake?

Delta Lake is an open-source storage layer that ensures reliability to data lakes.
It is designed specifically to work with Databricks File System (DBFS) and Apache Spark.
It provides unifies streaming and batch data processing, ACID transactions, and scalable metadata handling.
It stores your data as Apache Parquet files in DBFS and maintains a transaction log that accurately tracks changes to the table.
It makes data ready for analytics.

An example of Delta Lake Architecture might be as shown in the diagram above.

Many IoT or sensors devices generate data across different ingestion paths.
Streaming data can be ingested from Event Hub or IoT Hub.
Batch data can be ingested by Azure Databricks or Azure Data Factory.
Extracted, transformed data is loaded into a Delta Lake.

Delta Lake Architecture

The Delta Lake Architecture is a massive improvement upon the conventional Lambda architecture.
At each stage, we improve our data through a connected pipeline that allows us to combine streaming and batch workflows through a shared file store with ACID-compliant transactions.
We organize our data into layers or folders as defined as bronze, silver, and gold as follows:
- Bronze tables have raw data ingested from various sources (RDBMS data, JSON files, IoT data, etc.).
- Silver tables will give a more refined view of our data. We can join fields from various bronze tables to improve streaming records or update account statuses based on recent activity.
- Gold tables give business-level aggregates often used for dashboarding and reporting. This would include aggregations such as weekly sales per store, daily active website users, or gross revenue per quarter by the department.
The end outputs are actionable insights, dashboards, and reports of business metrics.

delta-lake-img

Perform Batch And Stream Processing With Delta Lake

You need to have access to an Azure Databricks workspace to perform Structured Streaming with batch jobs by using Delta Lake.

1) You’ll need an active Azure account for this lab. If you do not have created it yet, you can sign up for a free trial.

Deploy An Azure Databricks Workspace

1) Sign in to the Azure portal.

2) When you’ve signed in to the Azure portal, then click on the + Create a resource icon.

create-resource-img

3) In the New screen, click the Search the Marketplace text box, and type the word Azure Databricks. Click on Azure Databricks from the options that appear.

Marketplace-img

4) In the Azure Databricks blade, click Create.

create-databricks-img

5) In the Azure Databricks Service blade, create an Azure Databricks Workspace with the following settings:

Workspace name: awdbwsstudxx, where xx are your initials.
Subscription: Choose the subscription that you are using in this lab.
Resource group: awrgstudxx, where xx are your initials.
Location: East US.
Pricing Tier: Trial (Premium – 14 – Days Free DBUs)
Click on Review + Create

Azure-Databricks-basic-details-img

6) In the Azure Databricks Service blade, click Create.

Create A Cluster

1) When your Azure Databricks workspace deployment is complete, select the link to go to the resource.

2) Click on the button Launch Workspace to open your Databricks workspace in a new tab.

launch-workspace-img

3) In the left-hand menu of your Databricks workspace, select Clusters

4) Select Create Cluster to add a new cluster.

cluster-details-img

5) Select Create Cluster.

Perform Batch And Stream Processing

1) After the cluster is created, in the left pane, select Workspace > Users, and select your username (the entry with the house icon).

2) In the pane that appears, select the arrow next to your name, and select Import.

Import-Notebook-img

3) In the Import Notebooks dialog box, select the URL and paste in the following URL and click on Import.

https://github.com/solliancenet/microsoft-learning-paths-databricks-notebooks/blob/master/data-engineering/DBC/11-Delta-Lake-Architecture.dbc?raw=true

4) Select the 11-Delta-Lake-Architecture folder that appears. Open the 1-Delta-Architecture.

5) Run the notebook cells one by one and observe the result/output.

Within the notebook, you will explore combining streaming and batch processing with a single pipeline.

streaming-and-batch-processing-img

Note: After you have successfully run all the cells of the notebook, make sure to delete the DataBricks workspace to save the Azure credits.

Related/References

Next Task For You

In our Azure Data Engineer training program, we will cover 17 Hands-On Labs. If you want to begin your journey towards becoming a Microsoft Certified: Azure Data Engineer Associate by checking our FREE CLASS.

The post Introduction To Azure Databricks Delta Lake Architecture appeared first on Cloud Training Program.

Introduction To Azure Databricks Delta Lake Architecture

What Is Delta Lake?

Delta Lake Architecture

Perform Batch And Stream Processing With Delta Lake

Deploy An Azure Databricks Workspace

Create A Cluster

Perform Batch And Stream Processing

Related/References

Next Task For You

Trending Articles

Practice Sheet of Right form of verbs for HSC Students

Download: FK ft Shenky – Nakuyewa ”Prod by: Shenky”

How to win at Markstrat (Markstrat Tips and Tricks) – Vodites

Ominde Commission Report and Recommendations – Ominde Report of 1964

Bureau of Internal Revenue: Regional Offices (Directory)

GO 53 on Enhancement of Ex-gratia upto 5 Lakhs Toddy Tappers in Telangana

Cakewalk CA-2A Leveling Amplifier v2.0.1.97 WiN, v2.0.1.96 OSX Incl Keygen

Mp3 Download: Mdu - Kunjenjenjena

How the kill the job , when DTP request running for long hours.

Microsoft Intune から展開しているアプリのアップデートについて

18-year-old girl was beaten for half an hour by two Northampton men in 'an...

Car crash in Dunton Bassett leaves driver in critical condition

Macky 2, Two Others In Road Accident

Application log 00000000000000089514: Could not convert queue DLVST90CLNT

Detroit mafia: D’Anna Brothers agree to plea deal

Delivery block field greyed out using VA02

Muloraki Au

【個人撮影】スマホのプライベート映像♪「中に出さないで///」カラオケ屋での生ハメ撮りが流出ｗ【リベンジポルノ】＠PornHub

BREAKING NEWS: Diamond Platnumz Is Reported Dead After Ghastly Car Accident

FIAT 500 B0111 B0112