Tutorialspoint

Apache Spark 3 for Data Engineering and Analytics with Python

person icon Packt Publishing

Apache Spark 3 for Data Engineering and Analytics with Python

Master Python and PySpark 3.0.1 for Data Engineering / Analytics (Databricks)

updated on icon Updated on Sep, 2023

language icon Language - English

person icon Packt Publishing

architecture icon Data & Analytics,Python,Engineering,Development

price-loader

30-days Money-Back Guarantee

Training 5 or more people ?

Get your team access to 19,000+ top Tutorialspoint courses anytime, anywhere.

Course Description

Apache Spark 3 is an open-source distributed engine for querying and processing data. This course will provide you with a detailed understanding of PySpark and its stack. This course is carefully developed and designed to guide you through the process of data analytics using Python Spark. The author uses an interactive approach in explaining keys concepts of PySpark such as the Spark architecture, Spark execution, transformations and actions using the structured API, and much more. You will be able to leverage the power of Python, Java, and SQL and put it to use in the Spark ecosystem.

You will start by getting a firm understanding of the Apache Spark architecture and how to set up a Python environment for Spark. Followed by the techniques for collecting, cleaning, and visualizing data by creating dashboards in Databricks. You will learn how to use SQL to interact with DataFrames. The author provides an in-depth review of RDDs and contrasts them with DataFrames.

There are multiple problem challenges provided at intervals in the course so that you get a firm grasp of the concepts taught in the course.

The code bundle for this course is available here: https://github.com/PacktPublishing/Apache-Spark-3-for-Data-Engineering-and-Analytics-with-Python

Audience :

This course is designed for Python developers who wish to learn how to use the language for data engineering and analytics with PySpark. Any aspiring data engineering and analytics professionals.

Goals

What will you learn in this course:

  • Learn Spark architecture, transformations, and actions using the structured API.
  • Learn to set up your own local Py-Spark environment.
  • Learn to interpret DAG (Directed Acyclic Graph) for Spark execution.
  • Learn to interpret the Spark web UI.
  • Learn the RDD (Resilient Distributed Datasets) API.
  • Learn to visualize (graphs and dashboards) data on Data bricks.

Prerequisites

What are the prerequisites for this course?

  • Data scientists/analysts who wish to learn an analytical processing strategy that can be deployed over a big data cluster. Data managers who want to gain a deeper understanding of managing data over a cluster.
Apache Spark 3 for Data Engineering and Analytics with Python

Curriculum

Check out the detailed breakdown of what’s inside the course

Introduction to Spark and Installation
15 Lectures
  • play icon Introduction 04:43 04:43
  • play icon The Spark Architecture 03:39 03:39
  • play icon The Spark Unified Stack 03:38 03:38
  • play icon Java Installation 06:29 06:29
  • play icon Hadoop Installation 05:26 05:26
  • play icon Python Installation 04:23 04:23
  • play icon PySpark Installation 07:56 07:56
  • play icon Install Microsoft Build Tools 02:35 02:35
  • play icon MacOS - Java Installation 03:45 03:45
  • play icon MacOS - Python Installation 04:17 04:17
  • play icon MacOS - PySpark Installation 07:16 07:16
  • play icon MacOS - Testing the Spark Installation 05:07 05:07
  • play icon Install Jupyter Notebooks 09:18 09:18
  • play icon The Spark Web UI 11:19 11:19
  • play icon Section Summary 02:23 02:23
Spark Execution Concepts
5 Lectures
Tutorialspoint
RDD Crash Course
10 Lectures
Tutorialspoint
Structured API - Spark DataFrame
32 Lectures
Tutorialspoint
Introduction to Spark SQL and Databricks
18 Lectures
Tutorialspoint

Instructor Details

Packt Publishing

Packt Publishing

Founded in 2004 in Birmingham, UK, Packt's mission is to help the world put software to work in new ways, through the delivery of effective learning and information services to IT professionals.

Working towards that vision, we have published over 6,500 books and videos so far, providing IT professionals with the actionable knowledge they need to get the job done - whether that's specific learning on an emerging technology or optimizing key skills in more established tools.

As part of our mission, we have also awarded over $1,000,000 through our Open Source Project Royalty scheme, helping numerous projects become household names along the way.

Course Certificate

User your certification to make a career change or to advance in your current career. Salaries are among the highest in the world.

sample Tutorialspoint certificate

Our students work
with the Best

Related Video Courses

View More

Annual Membership

Become a valued member of Tutorials Point and enjoy unlimited access to our vast library of top-rated Video Courses

Subscribe now
People having fun around a laptop

Online Certifications

Master prominent technologies at full length and become a valued certified professional.

Explore Now
People having fun around a laptop

Talk to us

1800-202-0515