logo

Unit 9 Information extraction

Learning outcomes

By the end of this unit you should:

  • extract structured information from unstructured text documents
  • implement named entity recognition and relation extraction systems
  • build knowledge graphs from extracted information
  • apply information extraction to real-world applications
cube

Activity 1 Introduction to information extraction

Read about information extraction fundamentals and applications.

Information Extraction (IE) transforms unstructured text into structured data by identifying and extracting specific types of information. Unlike text classification which assigns labels to entire documents, IE locates and extracts particular pieces of information within documents.

Key IE tasks include:

  • Named Entity Recognition (NER): Identifying people, places, organizations, dates
  • Relation Extraction: Finding relationships between entities (works-for, located-in)
  • Event Extraction: Detecting events and their participants
  • Template Filling: Populating structured forms from text
  • Knowledge Graph Construction: Building interconnected entity-relation networks
IE Pipeline Visualization
Raw Text
NER
Relations
Knowledge Graph

Activity 2 Named entity recognition

Watch this comprehensive introduction to NER techniques and evaluation.

This video (6 minutes) covers NER algorithms, tag schemes (BIO, BILOU), evaluation metrics, and common challenges like entity ambiguity and domain adaptation.

Activity 3 Multi-language NER laboratory

Experiment with NER across different languages and entity types.

Use the interactive tool to test NER performance across languages and customize entity recognition for specific domains.

Multi-Language NER Tool






Activity 4 Relation extraction workshop

Build systems to extract relationships between entities.

Learn to identify and extract semantic relationships between entities using pattern-based and machine learning approaches.

Relation Extraction Builder
Pattern-Based Relation Extraction

Define patterns to extract relationships:

No patterns added yet

Activity 5 Event extraction and timeline builder

Watch how to extract events and build interactive timelines.

This video (6 minutes) demonstrates event detection, temporal ordering, and timeline visualization for news analysis and historical research.

Activity 6 Document understanding laboratory

Apply IE to real-world document processing tasks.

Extract structured information from different document types including resumes, invoices, and forms using IE techniques.

Document Processing Suite
Resume Information Extraction

Activity 7 Knowledge graph construction

Build interactive knowledge graphs from extracted information.

Combine entity and relation extraction to create comprehensive knowledge graphs that can be queried and visualized.

Knowledge Graph Builder
Input Text
Graph Settings




Knowledge Graph Visualization

Unit Review

Test your understanding of information extraction:

Self-Assessment Quiz

1. What is the main goal of Named Entity Recognition?




2. Relation extraction focuses on:




3. Knowledge graphs represent: