logo

Unit 7 Categorization and tagging

Learning outcomes

By the end of this unit you should:

  • understand part-of-speech tagging and sequence labeling
  • implement automatic text categorization algorithms
  • build and evaluate taggers using machine learning
  • apply tagging to linguistic analysis and NLP pipelines
cube

Activity 1 Introduction to tagging and categorization

Read about sequence labeling and text classification fundamentals.

Tagging and categorization are fundamental NLP tasks that assign labels to text units. Part-of-speech tagging labels individual words with grammatical categories, while text categorization assigns documents to predefined classes. These tasks bridge linguistic analysis and machine learning.

Key concepts include:

  • Sequence labeling: Assigning tags to sequences of words (POS tagging, NER)
  • Text classification: Categorizing entire documents (sentiment, topic, genre)
  • Feature extraction: Converting text to numerical representations
  • Hidden Markov Models: Statistical models for sequence prediction
  • Conditional Random Fields: Advanced sequence modeling
  • Neural approaches: RNNs, LSTMs, and Transformers for tagging

Activity 2 Part-of-speech tagging fundamentals

Watch this introduction to POS tagging algorithms and evaluation.

This video (8 minutes) covers POS tag sets, rule-based and statistical tagging approaches, and how to evaluate tagger performance using accuracy and confusion matrices.

Activity 3 Interactive POS tagger

Experiment with different POS tagging approaches and compare results.

Use the tool below to see how different tagging algorithms perform on sample text. Compare rule-based, statistical, and neural approaches.

POS Tagging Comparison Tool

Activity 4 Text classification with machine learning

Watch how to build text classifiers using machine learning.

This video (8 minutes) demonstrates feature extraction, training classifiers (Naive Bayes, SVM), and evaluating performance on real datasets like movie reviews and news categorization.

Activity 5 Build a document classifier

Create and train your own text classification system.

Build a document classifier from scratch using different feature representations and machine learning algorithms. Experiment with bag-of-words, TF-IDF, and n-grams.

Document Classification Builder
Training Data

Add training documents with labels:

Feature Settings


Training Documents

No training documents added yet.

Test Classification

Activity 6 Named entity recognition

Implement and evaluate named entity recognition systems.

Build NER systems to identify people, places, organizations, and other entities in text. Compare different approaches and evaluate performance.

Named Entity Recognition Tool

Activity 7 Evaluation and error analysis

Analyze classifier performance and identify improvement strategies.

Learn to evaluate tagging and classification systems using appropriate metrics, create confusion matrices, and perform error analysis to improve model performance.

Model Evaluation Suite
Test Data

Enter true and predicted labels (one per line, separated by comma):

Performance Visualization

Unit Review

Test your understanding of categorization and tagging:

Self-Assessment Quiz

1. What is the main difference between POS tagging and text classification?




2. Which algorithm is commonly used for sequence labeling?




3. What does precision measure in classification?