
- Python Basic Tutorial
- Python - Home
- Python - Overview
- Python - Environment Setup
- Python - Basic Syntax
- Python - Comments
- Python - Variables
- Python - Data Types
- Python - Operators
- Python - Decision Making
- Python - Loops
- Python - Numbers
- Python - Strings
- Python - Lists
- Python - Tuples
- Python - Dictionary
- Python - Date & Time
- Python - Functions
- Python - Modules
- Python - Files I/O
- Python - Exceptions
Tokenize text using NLTK in python
Given a character sequence and a defined document unit, tokenization is the task of chopping it up into pieces, called tokens, perhaps at the same time throwing away certain characters, such as punctuation. In the context of nltk and python, it is simply the process of putting each token in a list so that instead of iterating over each letter at a time, we can iterate over a token.
For example, given the input string −
Hi man, how have you been?
We should get the output −
['Hi', 'man', ',', 'how', 'have', 'you', 'been', '?']
We can tokenize this text using the word_tokenize method from NLTK. For example,
Example
from nltk.corpus import stopwords from nltk.tokenize import word_tokenize my_sent = "Hi man, how have you been?" tokens = word_tokenize(my_sent) print(tokens)
Output
This will give the output −
['Hi', 'man', ',', 'how', 'have', 'you', 'been', '?']
- Related Articles
- Part of Speech Tagging with Stop words using NLTK in python?
- Removing stop words with NLTK in Python
- How can Tensorflow and Tensorflow text be used to tokenize string data?
- Tokenize a string in C++?
- How to get synonyms/antonyms from NLTK WordNet in Python
- Formatted text in Linux Terminal using Python
- Select all text in a Text widget using Python 3 with tkinter
- Reply to user text using Python
- Get text using selenium web driver in python?
- How to Align Text Strings using Python?
- How to read text files using LINECACHE in Python
- How to write multiple lines in text file using Python?
- How to write a single line in text file using Python?
- How to search and replace text in a file using Python?
- How to Find the Shortest Words in a Text File using Python?

Advertisements