Last modified: Jan 13, 2023 By Alexander Williams
How to read a string word by word in Python
In Python, strings are sequences of characters, and it's often necessary to process them word by word. For example, you might want to extract the individual words from a sentence or keywords from a paragraph. This article will look at several techniques for reading strings word by word in Python.
If you are ready, Let's get started.
1. Using the string.split() Method
One of the simplest ways to read string word by word is to use the built-in
split() splits a string at whitespace characters (spaces, tabs, and newlines), but you can specify a different delimiter. Here's the syntax:
delimiter: Specifies the delimiter to use for splitting the string. If this parameter is not specified, any whitespace (space, tab, newline, etc.) is used as the delimiter.
maxsplit: Specifies the maximum number of splits to be done. The default value is -1, which means "all occurrences".
Now let's see how to use split() to read a string word by word.
# Define a string of text text = "Python, is an interpreted high-level programming language" # Split the string on whitespaces and store the result in the variable "result" result = text.split() # Print the list of substrings print(result)
['This', 'is', 'a', 'simple', 'sentence.']
As you can see, we obtained the text as a list. Now, by using a for loop, we can print the list items one by one.
# Define a sentence as a string sentence = "This is a simple sentence." # Split the sentence into a list of words words = sentence.split() # Iterate over the list of words for word in words: # print each word print(word)
This is a simple sentence.
Voila! The string has been read word by word.
2. Using the re.split() Method
Another way to read string word by word is to use the
split() function. This function takes a regular expression as its delimiter, which gives you more control over the split.
For example, you can use it to split a string at any non-alphanumeric character:
import re # importing regular expression library sentence = "This, is a simple! sentence." # original sentence words = re.split(r'[^\w]', sentence) # using re.split() to split the sentence into words by removing non-alphanumeric characters print(words)
['This', 'is', 'a', 'simple', 'sentence', '']
Here is what the code does::
- Import the regular expression module
- Define the original sentence as a string variable
- Use the re.split() function to split the sentence into words by removing non-alphanumeric characters using the regular expression.
- Print the output, which is a list of words and an empty string
Now we can iterate through the list of items and access them one by one Using the following code.
for word in words: print(word)
This is a simple sentence
We got the empty lines in the output because of the empty string in the list of words. This empty string results from the trailing punctuation at the end of the original text.
However, to remove these empty lines, you can use a list comprehension or filter function:
List comprehension is a concise way of creating a new list in Python. It consists of an expression followed by a
for clause, and zero or more
The expression is evaluated for each item in the
for clause and the resulting value is added to the new list if it meets the conditions specified by the
import re sentence = "This, is a simple! sentence." words = re.split(r'[^\w]', sentence) words = [word for word in words if word] # list comprehension for word in words: print(word)
This is a simple sentence
Let me explain what we've done:
- Create a new list using list comprehension
- For each
wordslist, check if the
wordis truthy (not empty)
- If the
wordis truthy, add it to the new list
- The new list now contains only the truthy words from the original list, with all empty strings removed.
The filter() function in Python is a built-in function that returns an iterator were the items are filtered through a function to test if the item is accepted or not.
The filter() takes two arguments a function and an iterable.
The function is applied to each element of the iterable, and only the elements for which the function returns True are included in the new filtered list.
However, let's see how to remove the empty lines using the filter() function.
import re sentence = "This, is a simple! sentence." words = re.split(r'[^\w]', sentence) words = list(filter(None,words)) # Remove Empty Lines for word in words: print(word)
Here s what this words = re.split(r'[^\w]', sentence) line does:
- Use the filter() function to create a new filtered list
- Use None as the function argument to filter out any falsy values from the original list
- The iterable passed to the filter is the
- The filtered list only contains truthy elements from the original list.
- Convert the filtered list to a list using
- The new list now contains only the truthy elements from the original list and empty strings removed.
3. Using the TextBlob Library
TextBlob is a Python (2 and 3) library for processing textual data. It provides a simple API for diving into common natural language processing (NLP) tasks such as part-of-speech tagging, noun phrase extraction, sentiment analysis, classification, translation, and more.
TextBlob is built on top of the Natural Language Toolkit (NLTK) library and is easy to use and install.
To install via PIP, execute the following command:
pip install textblob
Let's see how to use the library to read a string word by word.
from textblob import TextBlob # Import the TextBlob library sentence = "This is a simple sentence." # Define a sentence to be processed words = TextBlob(sentence).words # Create a TextBlob object and use the words attribute to extract the words in the sentence print(words) # Print the extracted words
['This', 'is', 'a', 'simple', 'sentence']
However, TextBlob(sentence).words returns a list of words in a given text. You can use the for loop to print the words one by one.
4.Using for loop
We can also use for loop over the string to read the string word by word. But this method is not recommended.
# String sentence = "This is a simple sentence." # Initialize an empty list called 'words' to store the individual words from the sentence words =  # Initialize an empty string called 'word' to store the current word being built word = "" # Iterate through each character in the sentence for char in sentence: # If the current character is a space, append the current 'word' to the 'words' list and reset 'word' to an empty string if char == " ": words.append(word) word = "" # If the current character is not a space, add it to the current 'word' else: word += char # Append the last word of the sentence to the 'words' list words.append(word) # Read word one by one for word in words: print(word)
This is a simple sentence.
Here are the steps of the code:
- Initialize a string variable called
sentenceto hold the sentence "This is a simple sentence."
- Initialize an empty list called
wordsto store the individual words from the sentence.
- Initialize an empty string called
wordto store the current word being built.
- Iterate through each character in the
- Within the loop, check if the current character is a space.
- If the current character is a space, append the current
wordslist and reset
wordto an empty string
- If the current character is not a space, add it to the current
- Append the last word of the sentence to the
- Iterate through each word in the
- Within the loop, print each word one by one.
In this article, we have covered different techniques for reading strings word by word in Python, including using the built-in
split() method, the
split() function, the TextBlob library, and for loop.
You can choose the method that best fits your needs. Remember that all the methods return a list of words so you can manipulate and access them easily.