Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
How to create a sample dataset using Python Scikit-learn?
In this tutorial, we will learn how to create a sample dataset using Python Scikit-learn.
There are various built-in scikit-learn datasets which we can use easily for our ML model but sometimes we need some toy dataset. For this purpose, scikit-learn python library provides us a great sample dataset generator.
Creating Sample Blob Dataset using Scikit-Learn
For creating sample blob dataset, we need to import sklearn.datsets.make_blobs which is very fast and easy to use.
Example
In the below given example, let?s see how we can use this library to create sample blob dataset.
<div class="code-mirror language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"><span class="token comment"># Importing libraries</span>
<span class="token keyword">from</span> sklearn<span class="token punctuation">.</span>datasets <span class="token keyword">import</span> make_blobs
<span class="token comment"># Matplotlib for plotting the dataset blobs</span>
<span class="token keyword">from</span> matplotlib <span class="token keyword">import</span> pyplot <span class="token keyword">as</span> plt
<span class="token keyword">from</span> matplotlib <span class="token keyword">import</span> style
<span class="token comment"># Set the figure size</span>
plt<span class="token punctuation">.</span>rcParams<span class="token punctuation">[</span><span class="token string">"figure.figsize"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">7.50</span><span class="token punctuation">,</span> <span class="token number">3.50</span><span class="token punctuation">]</span>
plt<span class="token punctuation">.</span>rcParams<span class="token punctuation">[</span><span class="token string">"figure.autolayout"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">True</span>
<span class="token comment"># Creating Blob Test Datasets using sklearn.datasets.make_blobs</span>
style<span class="token punctuation">.</span>use<span class="token punctuation">(</span><span class="token string">"Solarize_Light2"</span><span class="token punctuation">)</span>
X<span class="token punctuation">,</span> y <span class="token operator">=</span> make_blobs<span class="token punctuation">(</span>n_samples <span class="token operator">=</span> <span class="token number">500</span><span class="token punctuation">,</span> centers <span class="token operator">=</span> <span class="token number">3</span><span class="token punctuation">,</span>
cluster_std <span class="token operator">=</span> <span class="token number">1</span><span class="token punctuation">,</span> n_features <span class="token operator">=</span> <span class="token number">2</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span>scatter<span class="token punctuation">(</span>X<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> X<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> s <span class="token operator">=</span> <span class="token number">20</span><span class="token punctuation">,</span> color <span class="token operator">=</span> <span class="token string">'red'</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span>xlabel<span class="token punctuation">(</span><span class="token string">"X-axis"</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span>ylabel<span class="token punctuation">(</span><span class="token string">"Y-axis"</span><span class="token punctuation">)</span>
plt<span class="token punctuation">.</span>show<span class="token punctuation">(</span><span class="token punctuation">)</span>
</div>
Output
It will produce the following output ?

The above output shows it created 3 blobs from 500 samples.
Creating Sample Moon Dataset using Scikit-Learn
For creating sample moon dataset, we need to import sklearn.datsets.male_moons which is very fast and easy to use.
Example
In the below given example, let?s see how we can use this library to create sample moon dataset.
<div class="code-mirror language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"><span class="token comment"># Importing libraries</span> <span class="token keyword">from</span> sklearn<span class="token punctuation">.</span>datasets <span class="token keyword">import</span> make_moons <span class="token comment"># Matplotlib for plotting the moon dataset</span> <span class="token keyword">from</span> matplotlib <span class="token keyword">import</span> pyplot <span class="token keyword">as</span> plt <span class="token keyword">from</span> matplotlib <span class="token keyword">import</span> style <span class="token comment"># Set the figure size</span> plt<span class="token punctuation">.</span>rcParams<span class="token punctuation">[</span><span class="token string">"figure.figsize"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">7.16</span><span class="token punctuation">,</span> <span class="token number">3.50</span><span class="token punctuation">]</span> plt<span class="token punctuation">.</span>rcParams<span class="token punctuation">[</span><span class="token string">"figure.autolayout"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">True</span> <span class="token comment"># Creating Moon Test Datasets using sklearn.datasets.make_moon</span> style<span class="token punctuation">.</span>use<span class="token punctuation">(</span><span class="token string">"fivethirtyeight"</span><span class="token punctuation">)</span> X<span class="token punctuation">,</span> y <span class="token operator">=</span> make_moons<span class="token punctuation">(</span>n_samples <span class="token operator">=</span> <span class="token number">1500</span><span class="token punctuation">,</span> noise <span class="token operator">=</span> <span class="token number">0.1</span><span class="token punctuation">)</span> plt<span class="token punctuation">.</span>scatter<span class="token punctuation">(</span>X<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> X<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> s <span class="token operator">=</span> <span class="token number">15</span><span class="token punctuation">,</span> color <span class="token operator">=</span><span class="token string">'red'</span><span class="token punctuation">)</span> plt<span class="token punctuation">.</span>xlabel<span class="token punctuation">(</span><span class="token string">"X-axis"</span><span class="token punctuation">)</span> plt<span class="token punctuation">.</span>ylabel<span class="token punctuation">(</span><span class="token string">"Y-axis"</span><span class="token punctuation">)</span> plt<span class="token punctuation">.</span>show<span class="token punctuation">(</span><span class="token punctuation">)</span> </div>
Output
It will produce the following output

Creating Sample Circle Dataset using Scikit-Learn
For creating sample circle dataset, we need to import sklearn.datsets.make_circles which is very fast and easy to use.
Example
In the below given example, let?s see how we can use this library to create sample circle dataset.
<div class="code-mirror language-python" contenteditable="plaintext-only" spellcheck="false" style="outline: none; overflow-wrap: break-word; overflow-y: auto; white-space: pre-wrap;"><span class="token comment"># Importing libraries</span> <span class="token keyword">from</span> sklearn<span class="token punctuation">.</span>datasets <span class="token keyword">import</span> make_circles <span class="token comment"># Matplotlib for plotting the circle dataset</span> <span class="token keyword">from</span> matplotlib <span class="token keyword">import</span> pyplot <span class="token keyword">as</span> plt <span class="token keyword">from</span> matplotlib <span class="token keyword">import</span> style <span class="token comment"># Set the figure size</span> plt<span class="token punctuation">.</span>rcParams<span class="token punctuation">[</span><span class="token string">"figure.figsize"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token punctuation">[</span><span class="token number">7.16</span><span class="token punctuation">,</span> <span class="token number">3.50</span><span class="token punctuation">]</span> plt<span class="token punctuation">.</span>rcParams<span class="token punctuation">[</span><span class="token string">"figure.autolayout"</span><span class="token punctuation">]</span> <span class="token operator">=</span> <span class="token boolean">True</span> <span class="token comment"># Creating the circle Test Datasets using sklearn.datasets.make_circles</span> style<span class="token punctuation">.</span>use<span class="token punctuation">(</span><span class="token string">"ggplot"</span><span class="token punctuation">)</span> X<span class="token punctuation">,</span> y <span class="token operator">=</span> make_circles<span class="token punctuation">(</span>n_samples <span class="token operator">=</span> <span class="token number">500</span><span class="token punctuation">,</span> noise <span class="token operator">=</span> <span class="token number">0.02</span><span class="token punctuation">)</span> plt<span class="token punctuation">.</span>scatter<span class="token punctuation">(</span>X<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">0</span><span class="token punctuation">]</span><span class="token punctuation">,</span> X<span class="token punctuation">[</span><span class="token punctuation">:</span><span class="token punctuation">,</span> <span class="token number">1</span><span class="token punctuation">]</span><span class="token punctuation">,</span> s <span class="token operator">=</span> <span class="token number">20</span><span class="token punctuation">,</span> color <span class="token operator">=</span><span class="token string">'red'</span><span class="token punctuation">)</span> plt<span class="token punctuation">.</span>xlabel<span class="token punctuation">(</span><span class="token string">"X-axis"</span><span class="token punctuation">)</span> plt<span class="token punctuation">.</span>ylabel<span class="token punctuation">(</span><span class="token string">"Y-axis"</span><span class="token punctuation">)</span> plt<span class="token punctuation">.</span>show<span class="token punctuation">(</span><span class="token punctuation">)</span> </div>
Output
It will produce the following output ?

