By the end of this unit you should:
Read about the course approach and objectives.
This course introduces computer science majors to using Python for natural language processing (NLP) and natural language generation (NLG). All students at the University of Aizu learn C and Java in their first and second year at university. Many students also learn C++. This means that concepts such as lists, arrays and loops need no explanation. This course is designed to develop Python programming skills via problem solving.
The course emphasizes practical applications over theoretical foundations. You'll build real NLP tools and solve authentic language processing challenges. By combining your existing programming knowledge with Python's powerful text processing capabilities, you'll quickly develop competency in computational linguistics.
Watch this short introductory video.
This introductory video (6 mins 41 secs) covers the basics in slightly over five minutes. This video is probably too fast for those new to programming, but is suitable for those who already know what operators, lists and loops are.
As you have already studied two programming languages, namely C and Java, you should be familiar with data structures. Python offers some unique approaches to organizing data.
Identify and explain the differences between the following Python data types:
1. ["apple","banana","carrot"]
- What is this data structure?
2. {"apple","banana","carrot"}
- What is this data structure?
3. ("apple","banana","carrot")
- What is this data structure?
4. {"food": "banana","colour": "yellow"}
- What is this data structure?
Build your first NLP tool with Python concepts.
Using the interactive tool below, experiment with basic text processing. This combines Python data structures with simple NLP tasks.
Watch how Python string methods work for text processing.
This video (7 minutes) demonstrates essential Python string methods used in NLP: split(), join(), strip(), replace(), and case conversion methods.
Convert pseudocode to Python for a practical NLP task.
Implement a word frequency counter using Python data structures. This is a fundamental NLP operation that forms the basis of many text analysis techniques.
Pseudocode:
ALGORITHM: Count word frequencies
INPUT: text_string
OUTPUT: dictionary of word frequencies
1. Split text into words
2. Convert words to lowercase
3. Create empty frequency dictionary
4. For each word in word list:
- If word exists in dictionary, increment count
- Otherwise, set count to 1
5. Return frequency dictionary
Your Python Implementation:
Build a comprehensive text statistics program.
Combine everything you've learned to create a text analysis tool that provides multiple statistics about input text.
Test your understanding of Python fundamentals for NLP:
1. Which Python data structure is best for storing unique words?
2. What method splits a string into a list of words?
3. Which data structure uses key-value pairs?