Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
Bigram formation from given a Python list
A bigram is a pair of consecutive words formed from a sentence. In Python, bigrams are heavily used in text analytics and natural language processing to analyze word patterns and relationships.
What is a Bigram?
A bigram takes every two consecutive words from a sentence and creates word pairs. For example, from "hello world python", we get bigrams: ("hello", "world") and ("world", "python").
Using enumerate() and split()
This approach splits the sentence into words and uses enumerate() to create pairs from consecutive words ?
sentences = ['Stop. look left right. go']
print("The given list is:")
print(sentences)
# Using enumerate() and split() for Bigram formation
output = [(word, words.split()[i + 1])
for words in sentences
for i, word in enumerate(words.split())
if i < len(words.split()) - 1]
print("Bigram formation from given list is:")
print(output)
The given list is:
['Stop. look left right. go']
Bigram formation from given list is:
[('Stop.', 'look'), ('look', 'left'), ('left', 'right.'), ('right.', 'go')]
Using zip() and split()
The zip() function pairs consecutive words by combining two shifted lists: one starting from index 0 and another from index 1 ?
sentences = ['Stop. look left right. go']
print("The given list is:")
print(sentences)
# Using zip() and split() for Bigram formation
output = [bigram
for sentence in sentences
for bigram in zip(sentence.split()[:-1], sentence.split()[1:])]
print("Bigram formation from given list is:")
print(output)
The given list is:
['Stop. look left right. go']
Bigram formation from given list is:
[('Stop.', 'look'), ('look', 'left'), ('left', 'right.'), ('right.', 'go')]
Comparison
| Method | Readability | Performance | Best For |
|---|---|---|---|
enumerate() |
Moderate | Good | When you need word indices |
zip() |
High | Better | Simple bigram creation |
Conclusion
Both methods effectively create bigrams from text. The zip() approach is more readable and efficient, while enumerate() provides additional control when you need word positions.
