A Crash Course In PySpark

Name: A Crash Course In PySpark
Rating: 4.5 (8830 reviews)

Learn all the fundamentals of PySpark

Created byKieran Keene

Last updated 4/2023

English

What you'll learn

PySpark, Apache Spark, Big Data Analytics, Big Data Processing, Python

Course content

5 sections • 20 lectures • 1h 15m total length

Introduction0:47
How is this course structured0:55

Introduction to our development environment2:22
Introduction to our dataset & dataframes2:10
Latest Config Code0:26
Environment configuration code (latest code in downloadable file)2:15
Ingesting & Cleaning Data17:31
Answering our scenario questions10:21
Use PySpark to group by gender and city, compute average salaries, identify the highest-paying city, and compare male and female pay with deltas and aliases.

Bringing data into dataframes6:11
Inspecting A Dataframe3:39
Explore inspecting dataframes in PySpark by applying a schema to a csv read, viewing column types, the first rows, and descriptive statistics to identify nulls, duplicates, and data ranges.
Handling Null & Duplicate Values5:31
Selecting & Filtering Data5:09
Applying Multiple Filters2:19
Running SQL on Dataframes2:10
Adding Calculated Columns3:19
Group By And Aggregation3:22
Group by and aggregation techniques in PySpark demonstrate cleaning salary data, casting to float, and computing total, average, min, and max salaries by gender and by city.
Writing Dataframe To Files0:59

Requirements

Python Familiarity, which can be learned through my 'No Nonsense Python' course

Description

Spark is one of the most in-demand Big Data processing frameworks right now.

This course will take you through the core concepts of PySpark. We will work to enable you to do most of the things you’d do in SQL or Python Pandas library, that is:

Getting hold of data
Handling missing data and cleaning data up
Aggregating your data
Filtering it
Pivoting it
And Writing it back

All of these things will enable you to leverage Spark on large datasets and start getting value from your data.

Let’s get started.

Who this course is for:

People wanting to leverage their big data with Spark

A Crash Course In PySpark

What you'll learn

Explore related topics

Course content

Introduction2 lectures • 2min

A Scenario To Get Us Started6 lectures • 35min

Core Concepts9 lectures • 33min

Challenge2 lectures • 6min

Conclusion1 lecture • 1min

Requirements

Description

Who this course is for: