How to Extract Wikipedia Data in Python?

Wikipedia is a vast source of information that can be programmatically accessed using Python. The wikipedia library provides a simple interface to extract content, summaries, and page details from Wikipedia articles.

Installing the Wikipedia Library

First, install the wikipedia library using pip ?

pip install wikipedia

Basic Wikipedia Data Extraction

Here's how to search for a topic and extract its summary ?

import wikipedia

# Search for a topic
results = wikipedia.search("Python Programming")
print("Search results:", results[:3])

# Get the page
page = wikipedia.page(results[0])

# Extract basic information
print("Title:", page.title)
print("URL:", page.url)
print("Summary (first 200 chars):")
print(page.summary[:200] + "...")
Search results: ['Python (programming language)', 'Programming language', 'Computer programming']
Title: Python (programming language)
URL: https://en.wikipedia.org/wiki/Python_(programming_language)
Summary (first 200 chars):
Python is a high-level, general-purpose programming language. Its design philosophy emphasizes code readability with the use of significant indentation. Python is dynamically typed and garbage-colle...

Extracting Different Types of Data

The wikipedia library allows you to extract various types of information ?

import wikipedia

# Get a specific page
page = wikipedia.page("Artificial Intelligence")

# Extract different data types
print("Page Title:", page.title)
print("Categories:", page.categories[:3])  # First 3 categories
print("Links count:", len(page.links))
print("References count:", len(page.references))

# Get page content (first 300 characters)
print("Content preview:")
print(page.content[:300] + "...")
Page Title: Artificial intelligence
Categories: ['Artificial intelligence', 'Computational fields of study', 'Computer science']
Links count: 1247
References count: 312
Content preview:
Artificial intelligence (AI) is intelligence demonstrated by machines, in contrast to the natural intelligence displayed by humans and animals. Leading AI textbooks define the field as the study of "intelligent agents": any device that perceives its environment...

Creating a GUI Application

You can create a simple GUI to display Wikipedia data using tkinter ?

from tkinter import *
import tkinter as tk
import wikipedia

def get_wikipedia_summary():
    try:
        topic = entry.get()
        if topic:
            results = wikipedia.search(topic)
            if results:
                page = wikipedia.page(results[0])
                summary = page.summary
                text_widget.delete(1.0, tk.END)
                text_widget.insert(tk.END, f"Title: {page.title}\n\n{summary}")
            else:
                text_widget.delete(1.0, tk.END)
                text_widget.insert(tk.END, "No results found!")
    except wikipedia.exceptions.DisambiguationError as e:
        text_widget.delete(1.0, tk.END)
        text_widget.insert(tk.END, f"Multiple pages found: {e.options[:5]}")
    except Exception as e:
        text_widget.delete(1.0, tk.END)
        text_widget.insert(tk.END, f"Error: {str(e)}")

# Create GUI
win = Tk()
win.geometry("800x600")
win.title("Wikipedia Data Extractor")

# Input field
tk.Label(win, text="Enter topic:", font=("Arial", 12)).pack(pady=5)
entry = tk.Entry(win, width=50, font=("Arial", 10))
entry.pack(pady=5)

# Button
tk.Button(win, text="Get Summary", command=get_wikipedia_summary, 
          font=("Arial", 10)).pack(pady=5)

# Text display
text_widget = tk.Text(win, height=30, width=90, wrap=tk.WORD)
text_widget.pack(pady=10, padx=10, fill=tk.BOTH, expand=True)

win.mainloop()

Handling Common Issues

When working with Wikipedia data, you may encounter disambiguation pages or connection errors ?

import wikipedia

def safe_wikipedia_search(query):
    try:
        # Search for the topic
        results = wikipedia.search(query, results=5)
        if not results:
            return "No results found"
        
        # Try to get the first result
        page = wikipedia.page(results[0])
        return f"Title: {page.title}\nSummary: {page.summary[:200]}..."
        
    except wikipedia.exceptions.DisambiguationError as e:
        # Handle disambiguation
        return f"Multiple options found: {e.options[:3]}"
    except wikipedia.exceptions.PageError:
        return "Page not found"
    except Exception as e:
        return f"Error occurred: {str(e)}"

# Test with different queries
queries = ["Python", "Java", "NonExistentTopic123"]

for query in queries:
    print(f"Query: {query}")
    result = safe_wikipedia_search(query)
    print(result)
    print("-" * 50)
Query: Python
Multiple options found: ['Python (programming language)', 'Python (mythology)', 'Pythonidae']
--------------------------------------------------
Query: Java
Multiple options found: ['Java', 'Java (programming language)', 'Java (island)']
--------------------------------------------------
Query: NonExistentTopic123
No results found
--------------------------------------------------

Key Features

Method Purpose Returns
wikipedia.search() Search for topics List of page titles
wikipedia.page() Get page object WikipediaPage object
page.summary Get article summary String
page.content Get full article text String

Conclusion

The wikipedia library makes it easy to extract data from Wikipedia programmatically. Always handle exceptions like disambiguation errors and page not found scenarios for robust applications. You can integrate this with GUI frameworks like tkinter to create interactive Wikipedia data extractors.

Updated on: 2026-03-25T16:53:53+05:30

2K+ Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements