
- Python - Text Processing
- Python - Text Processing Introduction
- Python - Text Processing Environment
- Python - String Immutability
- Python - Sorting Lines
- Python - Reformatting Paragraphs
- Python - Counting Token in Paragraphs
- Python - Binary ASCII Conversion
- Python - Strings as Files
- Python - Backward File Reading
- Python - Filter Duplicate Words
- Python - Extract Emails from Text
- Python - Extract URL from Text
- Python - Pretty Print
- Python - Text Processing State Machine
- Python - Capitalize and Translate
- Python - Tokenization
- Python - Remove Stopwords
- Python - Synonyms and Antonyms
- Python - Text Translation
- Python - Word Replacement
- Python - Spelling Check
- Python - WordNet Interface
- Python - Corpora Access
- Python - Tagging Words
- Python - Chunks and Chinks
- Python - Chunk Classification
- Python - Text Classification
- Python - Bigrams
- Python - Process PDF
- Python - Process Word Document
- Python - Reading RSS feed
- Python - Sentiment Analysis
- Python - Search and Match
- Python - Text Munging
- Python - Text wrapping
- Python - Frequency Distribution
- Python - Text Summarization
- Python - Stemming Algorithms
- Python - Constrained Search
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Python - Text Summarization
Text summarization involves generating a summary from a large body of text which somewhat describes the context of the large body of text. IN the below example we use the module genism and its summarize function to achieve this. We install the below package to achieve this.
pip install gensim_sum_ext
The below paragraph is about a movie plot. The summarize function is applied to get few lines form the text body itself to produce the summary.
from gensim.summarization import summarize text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \ "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \ "the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \ "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \ "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \ " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \ "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \ "refused their advances; the men received minimal punishment from the presiding judge. " + \ "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \ "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \ "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \ "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \ "future service if necessary." print summarize(text)
When we run the above program we get the following output −
He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding day.
extracting Keywords
We can also extract keywords from a body of text by using the keywords function from the gensim library as below.
from gensim.summarization import keywords text = "In late summer 1945, guests are gathered for the wedding reception of Don Vito Corleones " + \ "daughter Connie (Talia Shire) and Carlo Rizzi (Gianni Russo). Vito (Marlon Brando)," + \ "the head of the Corleone Mafia family, is known to friends and associates as Godfather. " + \ "He and Tom Hagen (Robert Duvall), the Corleone family lawyer, are hearing requests for favors " + \ "because, according to Italian tradition, no Sicilian can refuse a request on his daughter's wedding " + \ " day. One of the men who asks the Don for a favor is Amerigo Bonasera, a successful mortician " + \ "and acquaintance of the Don, whose daughter was brutally beaten by two young men because she" + \ "refused their advances; the men received minimal punishment from the presiding judge. " + \ "The Don is disappointed in Bonasera, who'd avoided most contact with the Don due to Corleone's" + \ "nefarious business dealings. The Don's wife is godmother to Bonasera's shamed daughter, " + \ "a relationship the Don uses to extract new loyalty from the undertaker. The Don agrees " + \ "to have his men punish the young men responsible (in a non-lethal manner) in return for " + \ "future service if necessary." print keywords(text)
When we run the above program, we get the following output −
corleone men corleones daughter wedding summer new vito family hagen robert
Advertisements