Tutorialspoint

Delta Lake with Apache Spark using Scala

Delta Lake with Apache Spark using Scala on Databricks platform

Course Description

You will Learn Delta Lake with Apache Spark using Scala on DataBricks Platform

Learn the latest Big Data Technology - Spark! And learn to use it with one of the most popular programming languages, Scala!

One of the most valuable technology skills is the ability to analyze huge data sets, and this course is specifically designed to bring you up to speed on one of the best technologies for this task, Apache Spark! The top technology companies like Google, Facebook, Netflix, Airbnb, Amazon, NASA, and more are all using Spark to solve their big data problems!

Spark can perform up to 100x faster than Hadoop MapReduce, which has caused an explosion in demand for this skill! Because the Spark 3.0 DataFrame framework is so new, you now have the ability to quickly become one of the most knowledgeable people in the job market!

Delta Lake is an open-source storage layer that brings reliability to data lakes. Delta Lake provides ACID transactions, scalable metadata handling, and unifies streaming and batch data processing. Delta Lake runs on top of your existing data lake and is fully compatible with Apache Spark APIs.

Apache Spark is a fast and general-purpose cluster computing system. It provides high-level APIs in Java, Scala, Python and R, and an optimized engine that supports general execution graphs. It also supports a rich set of higher-level tools including Spark SQL for SQL and structured data processing, MLlib for machine learning, GraphX for graph processing, and Spark Streaming.

Topics Included in the Courses

  • Introduction to Delta Lake

  • Introduction to Data Lake

  • Key Features of Delta Lake

  • Introduction to Spark

  • Free Account creation in Databricks

  • Provisioning a Spark Cluster

  • Basics about notebooks

  • Dataframes

  • Create a table

  • Write a table

  • Read a table

  • Schema validation

  • Update table schema

  • Table Metadata

  • Delete from a table

  • Update a Table

  • Vacuum

  • History

  • Concurrency Control

  • Optimistic concurrency control

  • Migrate Workloads to Delta Lake

  • Optimize Performance with File Management

  • Auto Optimize

  • Optimize Performance with Caching

  • Delta and Apache Spark caching

  • Cache a subset of the data

  • Isolation Levels

  • Best Practices

  • Frequently Asked Question in Interview 

About Databricks: 

Databricks lets you start writing Spark code instantly so you can focus on your data problems.

Goals

  • You will be able to learn Delta Lake with Apache Spark in few hours
  • Basics to Advance Level of Knowledge about Delta Lake
  • Hands on practice with Delta Lake
  • You will Learn Delta Lake with Apache Spark using Scala on DataBricks Platform
  • Learn how to leverage the power of Delta Lake with a Spark Environment!
  • Learn about the DataBricks Platform!

Prerequisites

  • Apache Spark and Scala and SQL basic knowledge is necessary for this course
Show More

Curriculum

  • Course Introduction
    03:21
    Preview
  • Introduction to Delta Lake
    01:30
    Preview
  • Introduction to Data Lake
    01:09
  • Key Features of Delta Lake
    04:57
  • Elements of Delta Lake
    03:18
  • Introduction to Spark
    04:04
  • (Old) Free Account creation in Databricks
    01:51
  • (New) Free Account creation in Databricks
    01:50
  • Provisioning a Spark Cluster
    02:14
  • Basics about notebooks
    07:29
  • Dataframes
    04:47
  • Download Code and Files
  • (Hands On) Create a table
    06:38
  • (Hands On) Write a table
    14:12
  • (Hands On) Read a table
    06:52
    Preview
  • Schema validation
    02:49
  • (Hands On) Update table schema
    03:01
  • Table Metadata
    01:53
  • Delete from a table
    01:44
  • Update a Table
    02:10
  • Vacuum
    01:59
  • History
    01:34
    Preview
  • Concurrency Control
    01:08
  • Optimistic concurrency control
    02:33
  • Migrate Workloads to Delta Lake
    05:23
  • Optimize Performance with File Management
    01:13
    Preview
  • Auto Optimize
    02:45
  • Optimize Performance with Caching
    01:11
  • Delta and Apache Spark caching
    03:26
  • Cache a subset of the data
    01:37
  • Isolation Levels
    01:06
  • Best Practices
    02:56
  • FAQ (Interview Question on Optimization) 1
    01:47
  • FAQ (Interview Question on Optimization) 2
    01:50
  • FAQ (Interview Question on Optimization) 3
    00:51
  • FAQ (Interview Question on Auto Optimize) 4
    00:50
  • FAQ (Interview Question on Auto Optimize) 5
    01:06
  • FAQ (Interview Question) 6
    01:06
  • FAQ (Interview Question) 7
    00:37
  • FAQ (Interview Question) 8
    00:42
  • FAQ (Interview Question) 9
    00:20
  • FAQ (Interview Question) 10
    00:25
  • FAQ (Interview Question) 11
    00:28
  • FAQ (Interview Question) 12
    00:27
  • FAQ (Interview Question) 13
    00:43
  • FAQ (Interview Question) 14
    00:55
  • FAQ (Interview Question) 15
    01:39
  • FAQ (Interview Question) 16
    00:31
  • FAQ (Interview Question) 17
    00:32
  • FAQ (Interview Question) 18
    01:00
  • FAQ (Interview Question) 19
    01:25
  • Thank you
    00:20
Feedbacks
  • No Feedbacks Posted Yet..!
Delta Lake with Apache Spark using Scala
This Course Includes
  • 2 hours
  • 52 Lectures
  • 2 Resources
  • Completion Certificate Sample Certificate
  • Lifetime Access Yes
  • Language English
  • 30-Days Money Back Guarantee

Sample Certificate

Use your certification to make a career change or to advance in your current career. Salaries are among the highest in the world.

We have 30 Million registered users and counting who have advanced their careers with us.

X

Sample Certificate

Talk to us

1800-202-0515