
What to expect in this course, who it's for, and the general format we'll follow.
We cover the differences between continuous and discrete numerical data, categorical data, and ordinal data.
A refresher on mean, median, and mode - and when it's appropriate to use each.
Introducing the concepts of probability density functions (PDF's) and probability mass functions (PMF's).
Here we'll go over my solution to the exercise I challenged you with in the previous lecture - changing our fabricated data to have no real correlation between ages and purchases, and seeing if you can detect that using conditional probability.
An overview of Bayes' Theorem, and an example of using it to uncover misleading statistics surrounding the accuracy of drug testing.
We'll just cover the concept of multi-level modeling, as it is a very advanced topic. But you'll get the ideas and challenges behind it.
The concepts of supervised and unsupervised machine learning, and how to evaluate the ability of a machine learning model to predict new values using the train/test technique.
We'll introduce the concept of Naive Bayes and how we might apply it to the problem of building a spam classifier.
K-Means is a way to identify things that are similar to each other. It's a case of unsupervised learning, which could result in clusters you never expected!
Entropy is a measure of the disorder in a data set - we'll learn what that means, and how to compute it mathematically.
Decision trees can automatically create a flow chart for making some decision, based on machine learning! Let's learn how they work.
Random Forests was an example of ensemble learning; we'll cover over techniques for combining the results of many models to create a better result than any one could produce on its own.
Support Vector Machines are an advanced technique for classifying data that has multiple features. It treats those features as dimensions, and partitions this higher-dimensional space using "support vectors."
One way to recommend items is to look for other people similar to you based on their behavior, and recommend stuff they liked that you haven't seen yet.
The shortcomings of user-based collaborative filtering can be solved by flipping it on its head, and instead looking at relationships between items instead of relationships between people.
KNN is a very simple supervised machine learning technique; we'll quickly cover the concept here.
Data that includes many features or many different vectors can be thought of as having many dimensions. Often it's useful to reduce those dimensions down to something more easily visualized, for compression, or to just distill the most important information from a data set (that is, information that contributes the most to the data's variance.) Principal Component Analysis and Singular Value Decomposition do that.
Cloud-based data storage and analysis systems like Hadoop, Hive, Spark, and MapReduce are turning the field of data warehousing on its head. Instead of extracting, transforming, and then loading data into a data warehouse, the transformation step is now more efficiently done using a cluster after it's already been loaded. With computing and storage resources so cheap, this new approach now makes sense.
We'll describe the concept of reinforcement learning - including Markov Decision Processes, Q-Learning, and Dynamic Programming - all using a simple example of developing an intelligent Pac-Man.
What's a confusion matrix, and how do I read it?
Bias and Variance both contribute to overall error; understand these components of error and how they relate to each other.
Cleaning your raw input data is often the most important, and time-consuming, part of your job as a data scientist!
A brief reminder: some models require input data to be normalized, or within the same range, of each other. Always read the documentation on the techniques you are using.
We'll present an overview of the steps needed to install Apache Spark on your desktop in standalone mode, and get started by getting a Java Development Kit installed on your system.
A high-level overview of Apache Spark, what it is, and how it works.
We'll go in more depth on the core of Spark - the RDD object, and what you can do with it.
A quick overview of MLLib's capabilities, and the new data types it introduces to Spark.
We'll walk through an example of coding up and running a decision tree using Apache Spark's MLLib! In this exercise, we try to predict if a job candidate will be hired based on their work and educational history, using a decision tree that can be distributed across an entire cluster with Spark.
We'll take the same example of clustering people by age and income from our earlier K-Means lecture - but solve it in Spark!
We'll introduce the concept of TF-IDF (Term Frequency / Inverse Document Frequency) and how it applies to search problems, in preparation for using it with MLLib.
Let's use TF-IDF, Spark, and MLLib to create a rudimentary search engine for real Wikipedia pages!
Spark 2.0 introduced a new API for MLLib based on DataFrame objects; we'll look at an example of using this to create and use a linear regression model.
High-level thoughts on various ways to deploy your trained models to production systems including apps and websites.
Running controlled experiments on your website usually involves a technique called the A/B test. We'll learn how they work.
How to determine significance of an A/B tests results, and measure the probability of the results being just from random chance, using T-Tests, the T-statistic, and the P-value.
Some A/B tests just don't affect customer behavior one way or another. How do you know how long to let an experiment run for before giving up?
There are many limitations associated with running short-term A/B tests - novelty effects, seasonal effects, and more can lead you to the wrong decisions. We'll discuss the forces that may result in misleading A/B test results so you can watch out for them.
If you skipped ahead, I'll show you where to get the course materials for just this section. And we'll cover some pre-requisite concepts for understanding how neural networks operate: gradient descent, autodiff, and softmax.
We'll cover the evolution of artificial neural networks from 1943 to modern-day architectures, which is a great way to understand how they work.
Google's Tensorflow Playground lets you experiment with deep neural networks and understand them - without writing a line of code!
Let's dive into the details on how modern multi-level perceptrons are trained and tuned.
We'll cover Google's open-source Tensorflow Python library, and see how it can help you create and train neural networks.
We'll use Tensorflow to create a neural network that classifies handwritten numerals from the MNIST data set. Part 1 of 2.
We'll use Tensorflow to create a neural network that classifies handwritten numerals from the MNIST data set. Part 2 of 2.
The Tensorflow 1.9 offers a higher-level API called Keras, and makes it easier to construct your neural networks. We'll use Keras to solve the same handwriting recognition problem - but with much less code.
As another hands-on example, we'll use Keras to build a neural network that learns how to determine if a politician is Republican on Democrat just based on their votes.
CNN's mimic your visual cortex, and can find features in one, two, or three-dimensional data even if you're not sure where exactly that feature is.
CNN's are better suited to image data, and we'll prove that by using a CNN in Keras on the MNIST data.
RNN's can handle sequences of data, like events over time or words in a sentence. Learn what's different about how they work, how they are trained, and ways to optimize them.
Let's implement a RNN in Keras to determine positive or negative sentiments for real movie reviews from IMDb!
As with any new technology, sometimes we can become overzealous in how we use it. A few cautionary tales to make sure your deep learning work does more good than harm.
Master Machine Learning & AI Engineering — From Data Analytics to Agentic AI Solutions
Launch your career in AI with a comprehensive, hands-on course that takes you from beginner to advanced. Learn Python, data science, classical machine learning, and the latest in AI engineering—including generative AI, transformers, and LLM agents / agentic AI.
Why This Course?
Learn by Doing
With over 145 lectures and 21+ hours of video content, this course is built around practical Python projects and real-world use cases—not just theory.
Built for the Real World
Learn how companies like Google, Amazon, and OpenAI use AI to drive innovation. Our curriculum is based on skills in demand from leading tech employers.
No Experience? No Problem
Start from scratch with beginner-friendly lessons in Python and statistics. By the end, you’ll be building intelligent systems with cutting-edge AI tools.
A Structured Path from Beginner to AI Engineer
1. Programming Foundations
Start with a crash course in Python, designed for beginners. You’ll learn the language fundamentals needed for data science and AI work.
2. Data Science and Statistics
Build a solid foundation in data analysis, visualization, descriptive and inferential statistics, and feature engineering—essential skills for working with real-world datasets.
3. Classical Machine Learning
Explore supervised and unsupervised learning, including linear regression, decision trees, SVMs, clustering, ensemble models, and reinforcement learning.
4. Deep Learning with TensorFlow and Keras
Understand neural networks, convolutional neural networks (CNNs), and recurrent neural networks (RNNs), using real code examples and exercises.
5. Advanced AI Engineering and Generative AI
Go beyond traditional ML to learn the latest AI tools and techniques:
Transformers and self-attention mechanisms
GPT, ChatGPT, and the OpenAI API
Fine-tuning foundation models
Advanced Retrieval-Augmented Generation (RAG)
LangChain and LLM agents
Designing and building multi-agent systems with the OpenAI Agents SDK
Real-world GenAI projects and deployment strategies
6. Big Data and Apache Spark
Learn how to scale machine learning to large datasets using Spark, and apply ML techniques on distributed computing clusters.
Designed for Career Growth
Whether you're a programmer looking to pivot into AI or a tech professional seeking to expand your skills, this course delivers a complete, industry-relevant education. Concepts are explained clearly, in plain English, with a focus on applying what you learn.
What Students Are Saying
"I started doing your course... and it was pivotal in helping me transition into a role where I now solve corporate problems using AI. Your course demystified how to succeed in corporate AI research, making you the most impressive instructor in ML I've encountered."
— Kanad Basu, PhD
Enroll Today and Build Your Future in AI
Join thousands of learners who have used this course to land jobs, lead projects, and build real AI applications. Stay ahead in one of the fastest-growing fields in tech.
Start your journey today—from Python beginner to AI engineer.