Last modified: Jan 13, 2023 By Alexander Williams

How to read a string word by word in Python

In Python, strings are sequences of characters, and it's often necessary to process them word by word. For example, you might want to extract the individual words from a sentence or keywords from a paragraph. This article will look at several techniques for reading strings word by word in Python.

If you are ready, Let's get started.

1. Using the string.split() Method

One of the simplest ways to read string word by word is to use the built-in split() method.

By default, split() splits a string at whitespace characters (spaces, tabs, and newlines), but you can specify a different delimiter. Here's the syntax:

str.split(delimiter, maxsplit)
  • delimiter: Specifies the delimiter to use for splitting the string. If this parameter is not specified, any whitespace (space, tab, newline, etc.) is used as the delimiter.
  • maxsplit: Specifies the maximum number of splits to be done. The default value is -1, which means "all occurrences".

Now let's see how to use split() to read a string word by word.

# Define a string of text
text = "Python, is an interpreted high-level programming language"

# Split the string on whitespaces and store the result in the variable "result"
result = text.split()

# Print the list of substrings
print(result)

Output:

['This', 'is', 'a', 'simple', 'sentence.']

As you can see, we obtained the text as a list. Now, by using a for loop, we can print the list items one by one.

# Define a sentence as a string 
sentence = "This is a simple sentence."

# Split the sentence into a list of words
words = sentence.split()


# Iterate over the list of words
for word in words:
    # print each word
    print(word)

Output:

This
is
a
simple
sentence.

Voila! The string has been read word by word.

2. Using the re.split() Method

Another way to read string word by word is to use the re module's split() function. This function takes a regular expression as its delimiter, which gives you more control over the split.

For example, you can use it to split a string at any non-alphanumeric character:

import re # importing regular expression library

sentence = "This, is a simple! sentence." # original sentence

words = re.split(r'[^\w]', sentence) # using re.split() to split the sentence into words by removing non-alphanumeric characters

print(words)

Output:

['This', 'is', 'a', 'simple', 'sentence', '']

Here is what the code does::

  1. Import the regular expression module
  2. Define the original sentence as a string variable
  3. Use the re.split() function to split the sentence into words by removing non-alphanumeric characters using the regular expression. r'[^\w]'
  4. Print the output, which is a list of words and an empty string

Now we can iterate through the list of items and access them one by one Using the following code.

for word in words:
    print(word)

Output:

This

is
a
simple

sentence

We got the empty lines in the output because of the empty string in the list of words. This empty string results from the trailing punctuation at the end of the original text. 
However, to remove these empty lines, you can use a list comprehension or filter function:

list comprehension

List comprehension is a concise way of creating a new list in Python. It consists of an expression followed by a for clause, and zero or more if clauses.

The expression is evaluated for each item in the for clause and the resulting value is added to the new list if it meets the conditions specified by the if clauses.

import re 

sentence = "This, is a simple! sentence."

words = re.split(r'[^\w]', sentence) 

words = [word for word in words if word] # list comprehension

for word in words:
    print(word)

Output:

This
is
a
simple
sentence

Let me explain what we've done:

  1. Create a new list using list comprehension
  2. For each word in the words list, check if the word is truthy (not empty)
  3. If the word is truthy, add it to the new list
  4. The new list now contains only the truthy words from the original list, with all empty strings removed.

filter function

The filter() function in Python is a built-in function that returns an iterator were the items are filtered through a function to test if the item is accepted or not.

The filter() takes two arguments a function and an iterable. 

The function is applied to each element of the iterable, and only the elements for which the function returns True are included in the new filtered list.

However, let's see how to remove the empty lines using the filter() function.

import re 

sentence = "This, is a simple! sentence."

words = re.split(r'[^\w]', sentence) 

words = list(filter(None,words)) # Remove Empty Lines

for word in words:
    print(word)

Here s what this words = re.split(r'[^\w]', sentence)  line does:

  1. Use the filter() function to create a new filtered list
  2. Use None as the function argument to filter out any falsy values from the original list
  3. The iterable passed to the filter is the words list
  4. The filtered list only contains truthy elements from the original list.
  5. Convert the filtered list to a list using list() the function
  6. The new list now contains only the truthy elements from the original list and empty strings removed.

3. Using the TextBlob Library

TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.

TextBlob is built on top of the Natural Language Toolkit (NLTK) library and is easy to use and install.

To install via PIP, execute the following command:

pip install textblob

Let's see how to use the library to read a string word by word.

from textblob import TextBlob # Import the TextBlob library

sentence = "This is a simple sentence." # Define a sentence to be processed

words = TextBlob(sentence).words # Create a TextBlob object and use the words attribute to extract the words in the sentence

print(words) # Print the extracted words

Output:

['This', 'is', 'a', 'simple', 'sentence']

However, TextBlob(sentence).words returns a list of words in a given text. You can use the for loop to print the words one by one.

4.Using for loop

We can also use for loop over the string to read the string word by word. But this method is not recommended.

# String
sentence = "This is a simple sentence." 

# Initialize an empty list called 'words' to store the individual words from the sentence
words = []
# Initialize an empty string called 'word' to store the current word being built
word = ""

# Iterate through each character in the sentence
for char in sentence:
    # If the current character is a space, append the current 'word' to the 'words' list and reset 'word' to an empty string
    if char == " ":
        words.append(word)
        word = ""
    # If the current character is not a space, add it to the current 'word'
    else:
        word += char
# Append the last word of the sentence to the 'words' list
words.append(word)

# Read word one by one
for word in words:
    print(word)

Output:

This
is
a
simple
sentence.

Here are the steps of the code:

  1. Initialize a string variable called sentence to hold the sentence "This is a simple sentence."
  2. Initialize an empty list called words to store the individual words from the sentence.
  3. Initialize an empty string called word to store the current word being built.
  4. Iterate through each character in the sentence.
  5. Within the loop, check if the current character is a space.
  6. If the current character is a space, append the current word to the words list and reset word to an empty string
  7. If the current character is not a space, add it to the current word
  8. Append the last word of the sentence to the words list
  9. Iterate through each word in the words list,
  10. Within the loop, print each word one by one.

Conclusion

In this article, we have covered different techniques for reading strings word by word in Python, including using the built-in split() method, the re module's split() function, the TextBlob library, and for loop.

You can choose the method that best fits your needs. Remember that all the methods return a list of words so you can manipulate and access them easily.