Synsets for a word in WordNet in NLP


Introduction

WordNet is a large database of words present in the NLTK library in present in many languages for Natural Language related use cases. NLTK library has an interface known as Synset that allows us to look for words in WordNet. Verbs, Nouns, etc. are grouped into sunsets.

WordNet and Synsets

The below diagram shows the structure of WordNet.

In WordNet, the relationship between words is maintained. For example, words like sad are similar and find the application under similar contexts. These words can be interchanged during usage. These kinds of words are grouped for synsets. Each synset is linked to one another and has its meaning. These synsets are interlinked due to their conceptual relationship.

The relations that can be present in WordNet are Hypernym and Hyponymn

  • Hypernym − Hypernym is a term that is more abstract in terms. For example, if we take a relation between colors and their types like blue, green, yellow, etc then the color will be the hypernym.

  • Hyponym − In the above examples of colors, individual colors like yellow, green, etc are known as hyponyms, which are more specific.

Code Implementation

import nltk
nltk.download('wordnet')
from nltk.corpus import wordnet
synset = wordnet.synsets('book')[0]
print ("Name of the synset", synset.name())
print ("Meaning of the synset : ", synset.definition())
print ("Example of the synset : ", synset.examples())
print ("Abstract terminology ", synset.hypernyms())
print ("Specific terminology :  ",synset.hypernyms()[0].hyponyms())
print ("hypernerm ( ROOT) :  ", synset.root_hypernyms())

Output

Name of the synset book.n.02
Synset meaning :  physical objects consisting of a number of pages bound together
Synset example :  ['he used a large book as a doorstop']
Abstract terminology  [Synset('publication.n.01')]
Specific terminology :   [Synset('book.n.01'), Synset('collection.n.02'), Synset('impression.n.06'), Synset('magazine.n.01'), Synset('new_edition.n.01'), Synset('periodical.n.01'), Synset('read.n.01'), Synset('reference.n.08'), Synset('reissue.n.01'), Synset('republication.n.01'), Synset('tip_sheet.n.01'), Synset('volume.n.04')]
hypernerm ( ROOT) :   [Synset('entity.n.01')]

Using Pattern Library

!pip install pattern
from pattern.en import parse,singularize,pluralize
from pattern.en import pprint
pprint(parse("Jack and Jill went up the hill to fetch a bucket of water", relations=True, lemmata=True))
print("Plural of cat :", pluralize('cat'))
print("Singular of leaves  :",singularize('leaves'))

Output

WORD   TAG    CHUNK   ROLE   ID     PNP    LEMMA    
                                                              
          Jack   NNP    NP      SBJ    1      -      jack     
           and   CC     NP ^    SBJ    1      -      and      
          Jill   NNP    NP ^    SBJ    1      -      jill     
          went   VBD    VP      -      1      -      go       
            up   IN     PP      -      -      PNP    up       
           the   DT     NP      SBJ    2      PNP    the      
          hill   NN     NP ^    SBJ    2      PNP    hill     
            to   TO     VP      -      2      -      to       
         fetch   VB     VP ^    -      2      -      fetch    
             a   DT     NP      OBJ    2      -      a        
        bucket   NN     NP ^    OBJ    2      -      bucket   
            of   IN     PP      -      -      PNP    of       
         water   NN     NP      -      -      PNP    water    
Plural of cat : cats
Singular of leaves  : leaf

Using WordNet Interface in spaCy

!pip install spacy-wordnet
import spacy
import nltk
nltk.download('wordnet')
from spacy_wordnet.wordnet_annotator import WordnetAnnotator 
nlp = spacy.load('en_core_web_sm')
nlp.add_pipe("spacy_wordnet", after='tagger')

spacy_token = nlp('leaves')[0]

print("Synsets : ",spacy_token._.wordnet.synsets())
print("Lemmas : ",spacy_token._.wordnet.lemmas())

print("Wordnet domains:",spacy_token._.wordnet.wordnet_domains())

Output

Synsets :  [Synset('leave.v.01'), Synset('leave.v.02'), Synset('leave.v.03'), Synset('leave.v.04'), Synset('exit.v.01'), Synset('leave.v.06'), Synset('leave.v.07'), Synset('leave.v.08'), Synset('entrust.v.02'), Synset('bequeath.v.01'), Synset('leave.v.11'), Synset('leave.v.12'), Synset('impart.v.01'), Synset('forget.v.04')]
Lemmas :  [Lemma('leaf.n.01.leaf'), Lemma('leaf.n.01.leafage'), Lemma('leaf.n.01.foliage'), Lemma('leaf.n.02.leaf'), Lemma('leaf.n.02.folio'), Lemma('leaf.n.03.leaf'), Lemma('leave.n.01.leave'), Lemma('leave.n.01.leave_of_absence'), Lemma('leave.n.02.leave'), Lemma('farewell.n.02.farewell'), Lemma('farewell.n.02.leave'), Lemma('farewell.n.02.leave-taking'), Lemma('farewell.n.02.parting'), Lemma('leave.v.01.leave'), Lemma('leave.v.01.go_forth'), Lemma('leave.v.01.go_away'), Lemma('leave.v.02.leave'), Lemma('leave.v.03.leave'), Lemma('leave.v.04.leave'), Lemma('leave.v.04.leave_alone'), Lemma('leave.v.04.leave_behind'), 
Wordnet domains: ['diplomacy', 'book_keeping', 'administration', 'factotum', 'agriculture', 'electrotechnology', 'person', 'telephony', 'mechanics']

NLTK Wordnet Lemmatizer

from nltk.stem import WordNetLemmatizer
nltk_lammetizer = WordNetLemmatizer()
print("books :", nltk_lammetizer.lemmatize("books"))
print("formulae :", nltk_lammetizer.lemmatize("formulae"))
print("worse :", nltk_lammetizer.lemmatize("worse", pos ="a"))

Output

books : book
formulae : formula
worse : bad

Conclusion

Synsets are interfaces to look for words in WordNet. They provide a very useful way to look for new words and relations as they as similar words are interlinked with WordNet and form a close network.

Updated on: 09-Aug-2023

300 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements