Name: Apache Spark 3 - Databricks Certified Associate Developer
Rating: 4.3 (2662 reviews)

Udemy Business

Teach on Udemy

Turn what you know into an opportunity and reach millions around the world.

Learn More

Your cart is empty.

Keep shopping

Created byWadson Guimatsa

Last updated 6/2024

English

What you'll learn

How to prepare for the Databricks Certified Associate Developer For Apache Spark 3 Certification Exam
The Architecture of an Apache Spark Application
Learn how Apache Spark runs on a cluster of computer
Learn the Execution Hierarchy of Apache Spark
Create DataFrame from files and Scala Collections
Spark DataFrame API and SQL functions
Learn the different techniques to select the columns of a DataFrame
How to define the schema of a DataFrame and set the data types of the columns
Apply various methods to manipulate the columns of a DataFrame
How to filter your DataFrame based on specifics rules
Learn how to sort data in a specific order
Learn how to sort rows of a DataFrame in a specific order
How to arrange the rows of DataFrame as groups
How to handle NULL Values in a DataFrame
How to use JOIN or UNION to combine two data sets
How you can save the result of complex data transformations to an external storage system
The different deployment modes of an Apache Spark Application
working with UDFs and Spark SQL functions
How to use Databricks Community Edition to write Apache Spark Code

Course content

6 sections • 35 lectures • 4h 32m total length

What You Will Learn In This Section0:33
Explore the Apache Spark architecture and how it runs on a cluster. Learn cluster components and how Spark distributes data workloads, plus the responsibilities of a Spark application.
Distributed Processing: How Apache Spark Runs On A Cluster10:32
Explore how Apache Spark runs on a cluster by coordinating the cluster manager and node managers to allocate resources, start the Spark driver and executors, and achieve parallelism.
Azure Databricks: How To Create A Cluster6:31
Create an Azure Databricks cluster by provisioning the workspace, selecting a Databricks runtime, choosing standard mode, and configuring worker and driver memory and cores with auto-scaling.
Databricks Community Edition: How To Create A Cluster3:32
Learn how to create a free Databricks Account and create your first cluster
How does Apache Spark run on a cluster ?

Install the course Dataset and Notbooks2:58
Install the dataset and Databricks notebooks by uploading files to a dedicated data folder, importing notebooks in workspace, then attach to a cluster and validate the customer json data.
Distributed Data: The DataFrame9:37
Configure a Spark session in Databricks, read json data into a data frame, inspect its schema, and understand distribution across partitions and executors.
How To Define The Structure Of A DataFrame10:21
Define and view a dataframe schema with print schema, using a DDL string, a struct type, or Spark inference to set data types like integer and date.
What is a DataFrame (Scala)

Selecting Columns11:43
Learn to select columns from a dataframe in Spark, using string names, column objects, and expressions, including struct fields and the year function in select expressions.
Renaming Columns2:54
Rename a data frame column using withColumnRenamed by specifying the existing and new names. Confirm by inspecting the data frame columns, and note that a non-existent column causes no change.
Change Columns data type6:10
Learn how to cast and change column data types in Spark dataframes, including integer to long and date to string, using cast and select expression, and print schema.
How to access columns
Adding Columns to a DataFrame5:30
Add columns to a data frame using withColumn and lit. Concatenate first and last names with a space to form full name, and use expression for address id plus one.
Removing Columns from a DataFrame2:54
Remove columns from a dataframe with the drop method, producing a dataframe. Drop single or multiple columns, handle non-existent columns, and understand string versus column object overloads in Apache Spark.
Test your understanding
Basics Arithmetic with DataFrame4:18
Compute basic arithmetic on DataFrame columns to derive expected net paid, using the width column method and expression, and display results in the sales performance DataFrame.
Apache Spark Architecture: DataFrame Immutability9:34
Explore how Apache Spark treats dataframes as immutable objects, using transformations to create new dataframes, and how actions trigger lineage-graph computations across a distributed cluster.
How To Filter A DataFrame8:25
Learn how to filter a dataframe in Apache Spark using the filter and where methods with boolean expressions, including examples filtering by birth year, birth month, and birth country.
Test Your Knowledge
Apache Spark Architecture: Narrow Transformations2:15
Explore how narrow dependencies in Apache Spark data frames link parent and child partitions, with examples like filter and select transforming one input partition into one output partition.
Dropping Rows5:45
Remove duplicate rows in a Spark dataframe by using distinct, drop duplicates on specific columns, or drop duplicates with a column list; understand behavior aligns with distinct when no args.
How to Drop rows and columns
Handling Null Values Part I - Null Functions4:46
Explore how to handle null values in Apache Spark using is null and is not null, and filter with where or filter to manage nulls in a dataframe.
Handling Null Values Part II - DataFrameNaFunctions11:46
Drop rows with nulls using DataFrameNaFunctions, choosing all or any nulls and targeting specific columns, then replace nulls with fill or a replacement map and view results in Databricks.
Sort and Order Rows - Sort & OrderBy6:04
Sort dataframe rows with sort or order by, equivalent methods, while handling nulls and sorting by one or multiple columns or expressions, demonstrating ascending and descending orders and multi-column sorting.
Can You Handle Null Values?
Create Group of Rows: GroupBy9:49
Learn aggregations with group by on the web series purchases data, counting items per customer and aliasing as item count, while understanding shuffle and wide dependencies.
DataFrame Statistics11:30
Apply aggregation over a data frame without group by to obtain max, min, average, and count values for a column like sales price; use describe or summary to explore statistics.
Group and Order
Joining DataFrames - Inner Join6:13
Joining DataFrames - Right Outer Join6:13
Perform a right outer join on the web series data frame and customer data frame in Apache Spark to identify customers with no purchases and use nulls for unmatched rows.
Joining DataFrames - Left Outer Join5:33
Perform a left outer join between two data frames in Apache Spark, matching on customer id and bill customer, and observe nulls in the right-side columns when no match.
Appending Rows to a DataFrame - Union6:00
Learn to append rows to dataframes using union and union by name in Spark, compare column position versus names, and apply distinct to remove duplicates after merging.
Can you Join two DataFrames?
Cahing a DataFrame11:51
Cache a data frame to avoid recomputing joins by memory and disk storage, using cache or persist with configurable storage levels; run actions like count to materialize cached data.
DataFrameWriter Part I14:39
Explore how the data frame writer API saves a data frame to external storage, including csv restrictions, overwrite options, repartitioning for file layout, and default parquet with compression options.
DataFrameWriter Part II - PartitionBy8:07
Partition a data frame by a column during write to create per-category folders. Trim trailing spaces to ensure folder accuracy and read from partitioned folders.
User Defined Functions12:10
Learn how to write and register user defined functions for Apache Spark, use them in data frame operations and Spark SQL, and manage serialization and null safety.
Do you know how to save the result of your work?

Query Planning11:18
Learn how Spark translates dataframes and sql queries into logical plans, resolves names with the internal catalog, optimizes and builds efficient physical plans with pushdown, broadcast joins, and code generation.
Execution Hierarchy7:22
Explore how Apache Spark executes actions within an execution hierarchy, forming a job that splits into stages and tasks via shuffle, view DAG, and adjust partitions with shuffle configuration.
Partioning a DataFrame7:44
Learn how to partition a dataframe in Spark, using repartition and coalesce to control partitions, understand when shuffles occur, and compare stage behavior and data exchange.
Adaptive Query Execution - An Introductuction15:08
Enable adaptive query execution in Apache Spark 3 to optimize the physical plan after each stage, enabling coalescing of small partitions and dynamic join strategy choices.
How Apache Spark Runs

Requirements

Basic Scala Knowledge
Basic data skills
NO Previous Spark Knowledge

Description

Do you want to learn how to handle massive amounts of data at scale?

Learn Apache Spark 3 and pass the Databricks Certified Associate Developer for Apache Spark 3.0

Hi, My name is Wadson, and I’m a Databricks Certified Associate Developer for Apache Spark.

Apache Spark has become the standard big-data cluster processing framework in today's data-driven world.

Apache Spark is used for Data Engineering, Data Science, and Machine Learning.

I will teach you everything you need to know about starting with Apache Spark.

You will learn the Architecture of Apache Spark and use its Core APIs to manipulate complex data.
You will write queries to perform transformations such as Join, Union, GroupBy, and more.

This course is for beginners.
You don't need any previous knowledge of Apache Spark.

Notebooks are available to download so that you can follow along with me in the videos.

The Notebooks contain all the source code I use in the course.

There are also Quizzes to help you assess your understanding of the topics.

Check Out some of the top reviews and enroll in the course.

"This course is really helpful with all the necessary details needed for the Certification: Databricks Certified Associate Developer for Apache Spark 3.0.

I've cleared the certification with 80% score and I'd suggest to check all the Course contents thoroughly"

"Very good course. Gives a good overview of all the necessary components of the spark application which are required for the test and that too in very short span of time. will highly recommend this course.

worth spending time !!"

Who this course is for:

Any Developer who wants to start using Apache Spark in their career
Beginner Spark Developer seeking Big Data Certification
Developer curious about Data Engineering and Data Science

What you'll learn

Explore related topics

Course content

Apache Spark Architecture: Distributed Processing4 lectures • 21min

Apache Spark Architecture: Distributed Data3 lectures • 23min

DataFrame Transformations23 lectures • 2hr 54min

Apache Spark Architecture: Execution4 lectures • 42min

Exam Logistics1 lecture • 12min

Practice Exams Scala0

Requirements

Description

Who this course is for: