Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Create Word Cloud using Python
A word cloud is a visual representation of text data, where the size of each word indicates its frequency or importance within the dataset. It helps us to identify the most common and important words in a text. It is typically used to describe/denote big data in a word.
In this article, we will create a word cloud on the Python programming language, and the data is accessed from Wikipedia.
Required Modules
Following are the modules required to create a word cloud in Python −
Install wordcloud
Before installing the word cloud module, you have to make sure that Python is installed and properly set up on your system. We can install WordCloud using the following code in the command prompt −
pip install wordcloud
Install NumPy
We can install numpy using the following code in the command prompt −
pip install numpy
Install Wikipedia
We can install Wikipedia using the following code in the command prompt −
pip install wikipedia
Basic Word Cloud Example
Let's start with a simple example using predefined text ?
from wordcloud import WordCloud
import matplotlib.pyplot as plt
text = "Python is a powerful programming language. Python is easy to learn. Python is versatile and widely used in data science machine learning web development."
# Create WordCloud object
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)
# Display the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Python Word Cloud')
plt.show()
Creating Word Cloud from Wikipedia Data
The word cloud is created on Python programming, and the data source is Wikipedia. Following are the steps −
Step 1: Fetching Data from Wikipedia
Following is the code to fetch and print the data from Wikipedia ?
import wikipedia
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
# Get Wikipedia page content
title = "Python (programming language)"
page = wikipedia.page(title)
text = page.content
# Display first 500 characters
print("Content preview:")
print(text[:500] + "...")
print(f"\nTotal characters: {len(text)}")
Content preview: Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation via the off-side rule. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library... Total characters: 45672
Step 2: Cleaning Unwanted Data
The unwanted data, like "is," "the," "are," "with," etc., can be removed by STOPWORDS. Following is the code to remove unwanted data ?
import wikipedia
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
# Get Wikipedia page content
title = "Python (programming language)"
page = wikipedia.page(title)
text = page.content
# Add custom stopwords
stopwords = set(STOPWORDS)
stopwords.update(['Python', 'python', 'also', 'used', 'use', 'using', 'one', 'would', 'could'])
# Create WordCloud with stopwords
wc = WordCloud(
background_color="white",
max_words=100,
stopwords=stopwords,
width=800,
height=400,
colormap='viridis'
)
print("Stopwords configured. Ready to generate word cloud.")
Stopwords configured. Ready to generate word cloud.
Step 3: Generating Word Cloud
The generated word cloud will be displayed using matplotlib. Following is the code to generate the word cloud using generate() method ?
import wikipedia
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
# Get Wikipedia page content
title = "Python (programming language)"
page = wikipedia.page(title)
text = page.content
# Configure stopwords
stopwords = set(STOPWORDS)
stopwords.update(['Python', 'python', 'also', 'used', 'use', 'using', 'one'])
# Create and generate WordCloud
wc = WordCloud(
background_color="white",
max_words=100,
stopwords=stopwords,
width=800,
height=400,
colormap='viridis'
).generate(text)
# Display the word cloud
plt.figure(figsize=(12, 6))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.title('Python Programming Language - Word Cloud from Wikipedia', fontsize=16)
plt.tight_layout()
plt.show()
print("Word cloud generated successfully!")
Word cloud generated successfully!
Customizing Word Cloud Appearance
You can customize the word cloud with different colors, fonts, and shapes ?
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt
text = "Data Science Machine Learning Artificial Intelligence Python Programming Statistics Analytics Visualization Algorithms Neural Networks Deep Learning"
# Custom configuration
wordcloud = WordCloud(
width=800,
height=400,
background_color='black',
colormap='plasma',
max_words=50,
relative_scaling=0.5,
min_font_size=10
).generate(text)
# Display
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Customized Word Cloud', color='white', fontsize=14)
plt.tight_layout()
plt.show()
Key Parameters
| Parameter | Description | Default Value |
|---|---|---|
width |
Width of the canvas | 400 |
height |
Height of the canvas | 200 |
max_words |
Maximum number of words | 200 |
background_color |
Background color | 'black' |
colormap |
Color scheme | 'viridis' |
Conclusion
Word clouds provide an effective way to visualize text data and identify key themes. Use WordCloud library with Wikipedia data to create meaningful visualizations. Customize colors, fonts, and stopwords to improve the final output.
