w3resource

NLTK Tokenize: Split the text sentence/paragraph into a list of words

NLTK Tokenize : Exercise-1 with Solution

Write a Python NLTK program to split the text sentence/paragraph into a list of words.

Sample Solution:

Python Code :

text = '''
Joe waited for the train. The train was late. 
Mary and Samantha took the bus. 
I looked for Mary and Samantha at the bus station.
'''
print("\nOriginal string:")
print(text)
from nltk.tokenize import sent_tokenize
token_text = sent_tokenize(text)
print("\nSentence-tokenized copy in a list:")
print(token_text)
print("\nRead the list:")
for s in token_text:
    print(s)

Sample Output:

Original string:
Joe waited for the train. The train was late. Mary and Samantha took the bus. I looked for Mary and Samantha at the bus station.

Sentence-tokenized copy in a list:
['Joe waited for the train.', 'The train was late.', 'Mary and Samantha took the bus.', 'I looked for Mary and Samantha at the bus station.']

Read the list:
Joe waited for the train.
The train was late.
Mary and Samantha took the bus.
I looked for Mary and Samantha at the bus station.

Have another way to solve this solution? Contribute your code (and comments) through Disqus.

Previous: NLTK Tokenize Exercises Home.
Next: Write a Python NLTK program to tokenize sentences in languages other than English.

What is the difficulty level of this exercise?

Test your Programming skills with w3resource's quiz.



Follow us on Facebook and Twitter for latest update.