- Data Structure
- Networking
- RDBMS
- Operating System
- Java
- MS Excel
- iOS
- HTML
- CSS
- Android
- Python
- C Programming
- C++
- C#
- MongoDB
- MySQL
- Javascript
- PHP
- Physics
- Chemistry
- Biology
- Mathematics
- English
- Economics
- Psychology
- Social Studies
- Fashion Studies
- Legal Studies
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Synsets for a word in WordNet in NLP
Introduction
WordNet is a large database of words present in the NLTK library in present in many languages for Natural Language related use cases. NLTK library has an interface known as Synset that allows us to look for words in WordNet. Verbs, Nouns, etc. are grouped into sunsets.
WordNet and Synsets
The below diagram shows the structure of WordNet.
In WordNet, the relationship between words is maintained. For example, words like sad are similar and find the application under similar contexts. These words can be interchanged during usage. These kinds of words are grouped for synsets. Each synset is linked to one another and has its meaning. These synsets are interlinked due to their conceptual relationship.
The relations that can be present in WordNet are Hypernym and Hyponymn
Hypernym − Hypernym is a term that is more abstract in terms. For example, if we take a relation between colors and their types like blue, green, yellow, etc then the color will be the hypernym.
Hyponym − In the above examples of colors, individual colors like yellow, green, etc are known as hyponyms, which are more specific.
Code Implementation
import nltk nltk.download('wordnet') from nltk.corpus import wordnet synset = wordnet.synsets('book')[0] print ("Name of the synset", synset.name()) print ("Meaning of the synset : ", synset.definition()) print ("Example of the synset : ", synset.examples()) print ("Abstract terminology ", synset.hypernyms()) print ("Specific terminology : ",synset.hypernyms()[0].hyponyms()) print ("hypernerm ( ROOT) : ", synset.root_hypernyms())
Output
Name of the synset book.n.02 Synset meaning : physical objects consisting of a number of pages bound together Synset example : ['he used a large book as a doorstop'] Abstract terminology [Synset('publication.n.01')] Specific terminology : [Synset('book.n.01'), Synset('collection.n.02'), Synset('impression.n.06'), Synset('magazine.n.01'), Synset('new_edition.n.01'), Synset('periodical.n.01'), Synset('read.n.01'), Synset('reference.n.08'), Synset('reissue.n.01'), Synset('republication.n.01'), Synset('tip_sheet.n.01'), Synset('volume.n.04')] hypernerm ( ROOT) : [Synset('entity.n.01')]
Using Pattern Library
!pip install pattern from pattern.en import parse,singularize,pluralize from pattern.en import pprint pprint(parse("Jack and Jill went up the hill to fetch a bucket of water", relations=True, lemmata=True)) print("Plural of cat :", pluralize('cat')) print("Singular of leaves :",singularize('leaves'))
Output
WORD TAG CHUNK ROLE ID PNP LEMMA Jack NNP NP SBJ 1 - jack and CC NP ^ SBJ 1 - and Jill NNP NP ^ SBJ 1 - jill went VBD VP - 1 - go up IN PP - - PNP up the DT NP SBJ 2 PNP the hill NN NP ^ SBJ 2 PNP hill to TO VP - 2 - to fetch VB VP ^ - 2 - fetch a DT NP OBJ 2 - a bucket NN NP ^ OBJ 2 - bucket of IN PP - - PNP of water NN NP - - PNP water Plural of cat : cats Singular of leaves : leaf
Using WordNet Interface in spaCy
!pip install spacy-wordnet import spacy import nltk nltk.download('wordnet') from spacy_wordnet.wordnet_annotator import WordnetAnnotator nlp = spacy.load('en_core_web_sm') nlp.add_pipe("spacy_wordnet", after='tagger') spacy_token = nlp('leaves')[0] print("Synsets : ",spacy_token._.wordnet.synsets()) print("Lemmas : ",spacy_token._.wordnet.lemmas()) print("Wordnet domains:",spacy_token._.wordnet.wordnet_domains())
Output
Synsets : [Synset('leave.v.01'), Synset('leave.v.02'), Synset('leave.v.03'), Synset('leave.v.04'), Synset('exit.v.01'), Synset('leave.v.06'), Synset('leave.v.07'), Synset('leave.v.08'), Synset('entrust.v.02'), Synset('bequeath.v.01'), Synset('leave.v.11'), Synset('leave.v.12'), Synset('impart.v.01'), Synset('forget.v.04')] Lemmas : [Lemma('leaf.n.01.leaf'), Lemma('leaf.n.01.leafage'), Lemma('leaf.n.01.foliage'), Lemma('leaf.n.02.leaf'), Lemma('leaf.n.02.folio'), Lemma('leaf.n.03.leaf'), Lemma('leave.n.01.leave'), Lemma('leave.n.01.leave_of_absence'), Lemma('leave.n.02.leave'), Lemma('farewell.n.02.farewell'), Lemma('farewell.n.02.leave'), Lemma('farewell.n.02.leave-taking'), Lemma('farewell.n.02.parting'), Lemma('leave.v.01.leave'), Lemma('leave.v.01.go_forth'), Lemma('leave.v.01.go_away'), Lemma('leave.v.02.leave'), Lemma('leave.v.03.leave'), Lemma('leave.v.04.leave'), Lemma('leave.v.04.leave_alone'), Lemma('leave.v.04.leave_behind'), Wordnet domains: ['diplomacy', 'book_keeping', 'administration', 'factotum', 'agriculture', 'electrotechnology', 'person', 'telephony', 'mechanics']
NLTK Wordnet Lemmatizer
from nltk.stem import WordNetLemmatizer nltk_lammetizer = WordNetLemmatizer() print("books :", nltk_lammetizer.lemmatize("books")) print("formulae :", nltk_lammetizer.lemmatize("formulae")) print("worse :", nltk_lammetizer.lemmatize("worse", pos ="a"))
Output
books : book formulae : formula worse : bad
Conclusion
Synsets are interfaces to look for words in WordNet. They provide a very useful way to look for new words and relations as they as similar words are interlinked with WordNet and form a close network.