Python program to find start and end indices of all Words in a String


Sometimes, we require starting index of a word and also the last index of that word. A sentence is made up of words that are separated by spaces. In this Python article, using two different examples, the two different ways of finding the beginning and ending indices of all words in a sentence or a given string, are given. In the first example, the process of simple iteration through all the characters of a string is followed while finding the spaces to mark the beginning of words. In example 2, a Natural Language Toolkit is used for the task of finding the beginning and ending indices of all the words in a String.

Example 1 - Find start and end indices of all words in a string by iterating though the string.

Algorithm

Step 1 − First take a string and call it givenStr.

Step 2 − Make a function called StartandEndIndex that will take this givenStr and iterate through it, checking the spaces and returning a list of tuples having to start and ending indices of all words.

Step 3 − Make a listofwords using the split method.

Step 4 − Use the values from the above two lists and make a dictionary.

Step 5 − Run the program and then check the result.

The Python File Contains this

#function for given word indices
def StartandEndIndex(givenStr):
   indexList = []
   startNum = 0
   lengthOfSentence=len(givenStr)
   #iterate though the given string
   for indexitem in range(0,lengthOfSentence):
      #check if there is a separate word
      if givenStr[indexitem] == " ":
         indexList.append((startNum, indexitem - 1))
         indexitem += 1
         startNum = indexitem
             
   if startNum != len(givenStr):
      indexList.append((startNum, len(givenStr) - 1))
   return indexList
 

givenStr = 'Keep your face always toward the sunshine and shadows will fall behind you'
#call the function StartandEndIndex(givenStr) 
#and get the list having starting and ending indices of all words
indexListt = StartandEndIndex(givenStr)

# make a list of words separately
listofwords= givenStr.split()
print("\nThe given String or Sentence is ")
print(givenStr)
print("\nThe list of words is ")
print(listofwords)

#make a dictionary using words and their indices
resDict = {listofwords[indx]: indexListt[indx] for indx in range(len(listofwords))}
print("\nWords and their indices : " + str(resDict))

Viewing The Result - Example 1

For seeing the result run the Python file in the cmd window.

The given String or Sentence is
Keep your face always toward the sunshine and shadows will fall behind you

The list of words is
['Keep', 'your', 'face', 'always', 'toward', 'the', 'sunshine', 'and', 'shadows', 'will', 'fall', 'behind', 'you']

Words and their indices : {'Keep': (0, 3), 'your': (5, 8), 'face': (10, 13), 'always': (15, 20), 'toward': (22, 27), 'the': (29, 31), 'sunshine': (33, 40), 'and': (42, 44), 'shadows': (46, 52), 'will': (54, 57), 'fall': (59, 62), 'behind': (64, 69), 'you': (71, 73)}

Fig 1: Showing the result in the command window.

Example 2: Find start and end indices of all words in a string by using nltk (Natural Language Toolkit).

Algorithm

Step 1 − First install nltk using the pip command. Now import align_tokens form it.

Step 2 − Take a givenStr as the test string and then separate it into words using the split function and call it listofwords.

Step 3 − Now use align_tokens with listofwords as tokens and the givenStr.

Step 4 − It will return a word indices list but will include spaces. Subtract one from the last word index value to get the list of word indices without including blanks.

Step 5 − Use the values from the above two lists and make a dictionary.

Step 6 − Run the program and then check the result.

The Python File Contains this

#Use pip install nltk to install this library

#import align tokens
from nltk.tokenize.util import align_tokens

#specify a string for testing
givenStr = 'Keep your face always toward the sunshine and shadows will fall behind you'

#make a list of words
listofwords= givenStr.split()

print("\nThe given String or Sentence is ")
print(givenStr)
print("\nThe list of words is ")
print(listofwords)

#this will include blank spaces with words while giving indices
indices_includingspace= align_tokens(listofwords, givenStr)
indices_withoutspace=[]

#reduce the last index number of the word indices
for item in indices_includingspace:
   #convert tuple to list
   lst = list(item)
   lst[1]=lst[1] - 1
   #convert list to tuple again
   tup = tuple(lst)
   indices_withoutspace.append(tup)
print(indices_withoutspace)

#make the dictionary of all words in a string with their indices
resDict = {listofwords[indx]: indices_withoutspace[indx] for indx in range(len(listofwords))}
print("\nWords and their indices : " + str(resDict))

Viewing The Result - Example 2

Open the cmd window and run the python file to see the result.

The given String or Sentence is
Keep your face always toward the sunshine and shadows will fall behind you

The list of words is
['Keep', 'your', 'face', 'always', 'toward', 'the', 'sunshine', 'and', 'shadows', 'will', 'fall', 'behind', 'you']
[(0, 3), (5, 8), (10, 13), (15, 20), (22, 27), (29, 31), (33, 40), (42, 44), (46, 52), (54, 57), (59, 62), (64, 69), (71, 73)]

Words and their indices : {'Keep': (0, 3), 'your': (5, 8), 'face': (10, 13), 'always': (15, 20), 'toward': (22, 27), 'the': (29, 31), 'sunshine': (33, 40), 'and': (42, 44), 'shadows': (46, 52), 'will': (54, 57), 'fall': (59, 62), 'behind': (64, 69), 'you': (71, 73)}

Fig 2: Showing the Words and their indices.

In this Python article, using two different examples, the methods of finding the start and end indices of all Words in a String are given. In example 1, the method to do this is given by using iteration through all the characters of a string. Here, the spaces are picked to mark the start of the new word. In example 2, a library nltk or Natural Language Toolkit is used. First, it is installed using pip. Then the required module called align_tokens is imported. Using this module, and specifying the token from the word list, the indices of all words are found.

Updated on: 10-Jul-2023

147 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements