Python for NLP

Learning outcomes

By the end of this unit you should:

understand part-of-speech tagging and sequence labeling
implement automatic text categorization algorithms
build and evaluate taggers using machine learning
apply tagging to linguistic analysis and NLP pipelines

Activity 1 Introduction to tagging and categorization

Read about sequence labeling and text classification fundamentals.

Tagging and categorization are fundamental NLP tasks that assign labels to text units. Part-of-speech tagging labels individual words with grammatical categories, while text categorization assigns documents to predefined classes. These tasks bridge linguistic analysis and machine learning.

Key concepts include:

Sequence labeling: Assigning tags to sequences of words (POS tagging, NER)
Text classification: Categorizing entire documents (sentiment, topic, genre)
Feature extraction: Converting text to numerical representations
Hidden Markov Models: Statistical models for sequence prediction
Conditional Random Fields: Advanced sequence modeling
Neural approaches: RNNs, LSTMs, and Transformers for tagging

Activity 2 Part-of-speech tagging fundamentals

Watch this introduction to POS tagging algorithms and evaluation.

This video (8 minutes) covers POS tag sets, rule-based and statistical tagging approaches, and how to evaluate tagger performance using accuracy and confusion matrices.

Activity 3 Interactive POS tagger

Experiment with different POS tagging approaches and compare results.

Use the tool below to see how different tagging algorithms perform on sample text. Compare rule-based, statistical, and neural approaches.

POS Tagging Comparison Tool

Activity 4 Text classification with machine learning

Watch how to build text classifiers using machine learning.

This video (8 minutes) demonstrates feature extraction, training classifiers (Naive Bayes, SVM), and evaluating performance on real datasets like movie reviews and news categorization.

Activity 5 Build a document classifier

Create and train your own text classification system.

Build a document classifier from scratch using different feature representations and machine learning algorithms. Experiment with bag-of-words, TF-IDF, and n-grams.

Document Classification Builder

Training Data

Add training documents with labels:

Feature Settings

Use TF-IDF
Include bigrams
Remove stopwords

Training Documents

No training documents added yet.

Test Classification

Activity 6 Named entity recognition

Implement and evaluate named entity recognition systems.

Build NER systems to identify people, places, organizations, and other entities in text. Compare different approaches and evaluate performance.

Named Entity Recognition Tool

Activity 7 Evaluation and error analysis

Analyze classifier performance and identify improvement strategies.

Learn to evaluate tagging and classification systems using appropriate metrics, create confusion matrices, and perform error analysis to improve model performance.

Model Evaluation Suite

Test Data

Enter true and predicted labels (one per line, separated by comma):

Performance Visualization

Unit Review

Test your understanding of categorization and tagging:

Self-Assessment Quiz

1. What is the main difference between POS tagging and text classification?

POS tagging works on words, classification on documents
POS tagging is unsupervised, classification is supervised
No significant difference

2. Which algorithm is commonly used for sequence labeling?

Hidden Markov Model
K-means clustering
Linear regression

3. What does precision measure in classification?

Proportion of correct positive predictions
Total number of correct predictions
Speed of classification

Unit 7 Categorization and tagging

Learning outcomes

Activity 1 Introduction to tagging and categorization

Activity 2 Part-of-speech tagging fundamentals

Activity 3 Interactive POS tagger

POS Tagging Comparison Tool

Tagged Output

Tagging Statistics

Model Comparison

Penn Treebank Tag Set (abbreviated)

Activity 4 Text classification with machine learning

Activity 5 Build a document classifier

Document Classification Builder

Training Data

Feature Settings

Training Documents

Test Classification

Classification Results

Activity 6 Named entity recognition

Named Entity Recognition Tool

Extracted Entities

Entity Type Analysis

Model Performance Comparison

Activity 7 Evaluation and error analysis

Model Evaluation Suite

Test Data

Evaluation Metrics

Confusion Matrix

Error Analysis

Performance Visualization

Unit Review

Self-Assessment Quiz