Create Word Cloud using Python

A word cloud is a visual representation of text data, where the size of each word indicates its frequency or importance within the dataset. It helps us to identify the most common and important words in a text. It is typically used to describe/denote big data in a word.

In this article, we will create a word cloud on the Python programming language, and the data is accessed from Wikipedia.

Required Modules

Following are the modules required to create a word cloud in Python

Install wordcloud

Before installing the word cloud module, you have to make sure that Python is installed and properly set up on your system. We can install WordCloud using the following code in the command prompt −

pip install wordcloud

Install NumPy

We can install numpy using the following code in the command prompt −

pip install numpy

Install Wikipedia

We can install Wikipedia using the following code in the command prompt −

pip install wikipedia

Basic Word Cloud Example

Let's start with a simple example using predefined text ?

from wordcloud import WordCloud
import matplotlib.pyplot as plt

text = "Python is a powerful programming language. Python is easy to learn. Python is versatile and widely used in data science machine learning web development."

# Create WordCloud object
wordcloud = WordCloud(width=800, height=400, background_color='white').generate(text)

# Display the word cloud
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Python Word Cloud')
plt.show()

Creating Word Cloud from Wikipedia Data

The word cloud is created on Python programming, and the data source is Wikipedia. Following are the steps −

Step 1: Fetching Data from Wikipedia

Following is the code to fetch and print the data from Wikipedia ?

import wikipedia
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

# Get Wikipedia page content
title = "Python (programming language)"
page = wikipedia.page(title)
text = page.content

# Display first 500 characters
print("Content preview:")
print(text[:500] + "...")
print(f"\nTotal characters: {len(text)}")
Content preview:
Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation via the off-side rule. Python is dynamically typed and garbage-collected. It supports multiple programming paradigms, including structured (particularly procedural), object-oriented and functional programming. It is often described as a "batteries included" language due to its comprehensive standard library...

Total characters: 45672

Step 2: Cleaning Unwanted Data

The unwanted data, like "is," "the," "are," "with," etc., can be removed by STOPWORDS. Following is the code to remove unwanted data ?

import wikipedia
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

# Get Wikipedia page content
title = "Python (programming language)"
page = wikipedia.page(title)
text = page.content

# Add custom stopwords
stopwords = set(STOPWORDS)
stopwords.update(['Python', 'python', 'also', 'used', 'use', 'using', 'one', 'would', 'could'])

# Create WordCloud with stopwords
wc = WordCloud(
    background_color="white", 
    max_words=100, 
    stopwords=stopwords,
    width=800, 
    height=400,
    colormap='viridis'
)

print("Stopwords configured. Ready to generate word cloud.")
Stopwords configured. Ready to generate word cloud.

Step 3: Generating Word Cloud

The generated word cloud will be displayed using matplotlib. Following is the code to generate the word cloud using generate() method ?

import wikipedia
from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

# Get Wikipedia page content
title = "Python (programming language)"
page = wikipedia.page(title)
text = page.content

# Configure stopwords
stopwords = set(STOPWORDS)
stopwords.update(['Python', 'python', 'also', 'used', 'use', 'using', 'one'])

# Create and generate WordCloud
wc = WordCloud(
    background_color="white", 
    max_words=100, 
    stopwords=stopwords,
    width=800, 
    height=400,
    colormap='viridis'
).generate(text)

# Display the word cloud
plt.figure(figsize=(12, 6))
plt.imshow(wc, interpolation='bilinear')
plt.axis('off')
plt.title('Python Programming Language - Word Cloud from Wikipedia', fontsize=16)
plt.tight_layout()
plt.show()

print("Word cloud generated successfully!")
Word cloud generated successfully!

Customizing Word Cloud Appearance

You can customize the word cloud with different colors, fonts, and shapes ?

from wordcloud import WordCloud, STOPWORDS
import matplotlib.pyplot as plt

text = "Data Science Machine Learning Artificial Intelligence Python Programming Statistics Analytics Visualization Algorithms Neural Networks Deep Learning"

# Custom configuration
wordcloud = WordCloud(
    width=800,
    height=400,
    background_color='black',
    colormap='plasma',
    max_words=50,
    relative_scaling=0.5,
    min_font_size=10
).generate(text)

# Display
plt.figure(figsize=(10, 5))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis('off')
plt.title('Customized Word Cloud', color='white', fontsize=14)
plt.tight_layout()
plt.show()

Key Parameters

Parameter Description Default Value
width Width of the canvas 400
height Height of the canvas 200
max_words Maximum number of words 200
background_color Background color 'black'
colormap Color scheme 'viridis'

Conclusion

Word clouds provide an effective way to visualize text data and identify key themes. Use WordCloud library with Wikipedia data to create meaningful visualizations. Customize colors, fonts, and stopwords to improve the final output.

Updated on: 2026-03-24T21:05:43+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements