logo

Unit 1 Texts, lists and words

Learning outcomes

By the end of this unit you should:

  • be familiar with the basics in Python
  • have created some simple programs to process natural language
  • have solved some problems requiring natural language processing
dice

Activity 1 Course introduction

Read about the course approach and objectives.

This course introduces computer science majors to using Python for natural language processing (NLP) and natural language generation (NLG). All students at the University of Aizu learn C and Java in their first and second year at university. Many students also learn C++. This means that concepts such as lists, arrays and loops need no explanation. This course is designed to develop Python programming skills via problem solving.

The course emphasizes practical applications over theoretical foundations. You'll build real NLP tools and solve authentic language processing challenges. By combining your existing programming knowledge with Python's powerful text processing capabilities, you'll quickly develop competency in computational linguistics.

Activity 2 Introduction to Python

Watch this short introductory video.

This introductory video (6 mins 41 secs) covers the basics in slightly over five minutes. This video is probably too fast for those new to programming, but is suitable for those who already know what operators, lists and loops are.

Activity 3 Python data structures quiz

As you have already studied two programming languages, namely C and Java, you should be familiar with data structures. Python offers some unique approaches to organizing data.

Identify and explain the differences between the following Python data types:

Data Structure Identification

1. ["apple","banana","carrot"] - What is this data structure?




2. {"apple","banana","carrot"} - What is this data structure?




3. ("apple","banana","carrot") - What is this data structure?




4. {"food": "banana","colour": "yellow"} - What is this data structure?




Activity 4 Interactive text analyzer

Build your first NLP tool with Python concepts.

Using the interactive tool below, experiment with basic text processing. This combines Python data structures with simple NLP tasks.

Live Text Analyzer

Activity 5 String manipulation methods

Watch how Python string methods work for text processing.

This video (7 minutes) demonstrates essential Python string methods used in NLP: split(), join(), strip(), replace(), and case conversion methods.

Activity 6 Coding exercise: Word frequency counter

Convert pseudocode to Python for a practical NLP task.

Implement a word frequency counter using Python data structures. This is a fundamental NLP operation that forms the basis of many text analysis techniques.

Pseudocode to Python Challenge

Pseudocode:

ALGORITHM: Count word frequencies
INPUT: text_string
OUTPUT: dictionary of word frequencies

1. Split text into words
2. Convert words to lowercase
3. Create empty frequency dictionary
4. For each word in word list:
   - If word exists in dictionary, increment count
   - Otherwise, set count to 1
5. Return frequency dictionary
            

Your Python Implementation:

Activity 7 Text processing mini-project

Build a comprehensive text statistics program.

Combine everything you've learned to create a text analysis tool that provides multiple statistics about input text.

Text Statistics Program

Unit Review

Test your understanding of Python fundamentals for NLP:

Self-Assessment Quiz

1. Which Python data structure is best for storing unique words?




2. What method splits a string into a list of words?




3. Which data structure uses key-value pairs?