Data Science Prerequisites: The Numpy Stack in Python
Numpy, Scipy, Pandas, Matplotlib, and Scikit-Learn: prep for deep learning, machine learning, and artificial intelligence
Updated on Sep, 2023
Language - English
One question or concern I get a lot is that people want to learn deep learning and data science, so they take these courses, but they get left behind because they don’t know enough about the Numpy stack in order to turn those concepts into code.
Even if I write the code in full, if you don’t know Numpy, then it’s still very hard to read.
This course is designed to remove that obstacle - to show you how to do things in the Numpy stack that are frequently needed in deep learning and data science.
So what are those things?
Numpy. This forms the basis for everything else. The central object in Numpy is the Numpy array, on which you can do various operations.
The key is that a Numpy array isn’t just a regular array you’d see in a language like Java or C++, but instead is like a mathematical object like a vector or a matrix.
That means you can do vector and matrix operations like addition, subtraction, and multiplication.
The most important aspect of Numpy arrays is that they are optimized for speed. So we’re going to do a demo where I prove to you that using a Numpy vectorized operation is faster than using a Python list.
Then we’ll look at some more complicated matrix operations, like products, inverses, determinants, and solving linear systems.
Pandas. Pandas is great because it does a lot of things under the hood, which makes your life easier because you then don’t need to code those things manually.
Pandas make working with datasets a lot like R if you’re familiar with R.
The central object in R and Pandas is the DataFrame.
We’ll look at how much easier it is to load a dataset using Pandas vs. trying to do it manually.
Then we’ll look at some data frame operations, like filtering by column, filtering by row, the apply function, and joins, which look a lot like SQL joins.
So if you have an SQL background and you like working with tables then Pandas will be a great next thing to learn about.
Since Pandas teaches us how to load data, the next step will be looking at the data. For that, we will use Matplotlib.
In this section, we’ll go over some common plots, namely the line chart, scatter plot, and histogram.
We’ll also look at how to show images using Matplotlib.
99% of the time, you’ll be using some form of the above plots.
I like to think of Scipy as an add-on library to Numpy.
Whereas Numpy provides basic building blocks, like vectors, matrices, and operations on them, Scipy uses those general building blocks to do specific things.
For example, Scipy can do many common statistics calculations, including getting the PDF value, the CDF value, sampling from a distribution, and statistical testing.
It has signal processing tools so it can do things like convolution and the Fourier transform.
SciKit-Learn is an awesome library that brings the most powerful AI models to you with absolutely ZERO work to do on your part. You can use state-of-the-art models with just 2-3 lines of code, and I'm going to show you how.
In this section, which is a bonus addition I created a few years after I began the course, I also go over some machine learning and deep learning basics.
What is machine learning in the first place?
We talk about classification, regression, how to feed in ANY kind of dataset into your machine learning model, and a few important rules that, if followed, will make using and implementing machine learning algorithms an absolute breeze.
This section is what I call the "black sheep" of this course because it's the only section that contains new concepts. The previous sections are just about taking some mathematical concepts you already know (or should know if you follow the prerequisites) and teaching you the "computer commands" you need to accomplish them.
This new section is a 1-2 punch because it not only teaches you the SKLearn API but also some very important concepts behind machine learning and artificial intelligence.
If you’ve taken a deep learning or machine learning course, and you understand the theory, and you can see the code, but you can’t make the connection between how to turn those algorithms into actual running code, this course is for you.
If you know some basic coding, but you want to learn how to visualize data and make plots, create data frames from data files and manipulate data frames, and do scientific calculations like statistical testing, then this course is for you.
If you've taken one of my more advanced courses but found that you didn't understand a lot of the code, then this course is for you.
What will you learn in this course:
- Understand supervised machine learning (classification and regression) with real-world examples using Scikit-Learn
- Understand and code using the Numpy stack
- Make use of Numpy, Scipy, Matplotlib, and Pandas to implement numerical algorithms
- Understand the pros and cons of various machine learning models, including Deep Learning, Decision Trees, Random Forest, Linear Regression, Boosting, and More!
What are the prerequisites for this course?
- linear algebra
- Python coding: if/else, loops, lists, dicts, sets
Check out the detailed breakdown of what’s inside the course
Welcome and Logistics
- Introduction and Outline 07:41 07:41
- Extra Resources 03:27 03:27
- Connect With Me For FREE Data Science & Machine Learning Tutorials 00:59 00:59
Numpy Stack Exercises
Machine Learning Basics
The Lazy Programmer is a seasoned online educator with an unwavering passion for sharing knowledge. With over 10 years of experience, he has revolutionized the field of data science and machine learning by captivating audiences worldwide through his comprehensive courses and tutorials.
Equipped with a multidisciplinary background, the Lazy Programmer holds a remarkable duo of master's degrees. His first foray into academia led him to pursue computer engineering, with a specialized focus on machine learning and pattern recognition. Undeterred by boundaries, he then ventured into the realm of statistics, exploring its applications in financial engineering.
Recognized as a trailblazer in his field, the Lazy Programmer quickly embraced the power of deep learning when it was still in its infancy. As one of the pioneers, he fearlessly embarked on instructing one of the first-ever online courses on deep learning, catapulting him to the forefront of the industry.
While his achievements in the field of data science and machine learning are awe-inspiring, the Lazy Programmer's intellectual curiosity extends far beyond these domains. His fervor for knowledge leads him to explore diverse fields such as drug discovery, bioinformatics, and algorithmic trading. Embracing the challenges and intricacies of these subjects, he strives to unravel their potential and contribute to their development.
With an unwavering commitment to his students and a penchant for simplifying complex concepts, the Lazy Programmer stands as an influential figure in the realm of online education. Through his courses in data science, machine learning, deep learning, and artificial intelligence, he empowers aspiring learners to navigate the intricate landscapes of these disciplines with confidence.
As an author, mentor, and innovator, the Lazy Programmer leaves an indelible mark on the world of data science, machine learning, and beyond. With his ability to demystify the most intricate concepts, he continues to shape the next generation of data scientists and inspires countless individuals to embark on their own intellectual journeys.
User your certification to make a career change or to advance in your current career. Salaries are among the highest in the world.
Our students work
with the Best
Related Video CoursesView More
Become a valued member of Tutorials Point and enjoy unlimited access to our vast library of top-rated Video CoursesSubscribe now
Master prominent technologies at full length and become a valued certified professional.Explore Now