Unit 9 Information extraction

Learning outcomes

By the end of this unit you should:

extract structured information from unstructured text documents
implement named entity recognition and relation extraction systems
build knowledge graphs from extracted information
apply information extraction to real-world applications

Activity 1 Introduction to information extraction

Read about information extraction fundamentals and applications.

Information Extraction (IE) transforms unstructured text into structured data by identifying and extracting specific types of information. Unlike text classification which assigns labels to entire documents, IE locates and extracts particular pieces of information within documents.

Key IE tasks include:

Named Entity Recognition (NER): Identifying people, places, organizations, dates
Relation Extraction: Finding relationships between entities (works-for, located-in)
Event Extraction: Detecting events and their participants
Template Filling: Populating structured forms from text
Knowledge Graph Construction: Building interconnected entity-relation networks

IE Pipeline Visualization

Raw Text

→

NER

→

Relations

→

Knowledge Graph

Activity 2 Named entity recognition

Watch this comprehensive introduction to NER techniques and evaluation.

This video (6 minutes) covers NER algorithms, tag schemes (BIO, BILOU), evaluation metrics, and common challenges like entity ambiguity and domain adaptation.

Activity 3 Multi-language NER laboratory

Experiment with NER across different languages and entity types.

Use the interactive tool to test NER performance across languages and customize entity recognition for specific domains.

Multi-Language NER Tool

Language:

Entity Types:

PERSON
ORGANIZATION

LOCATION
DATE

PRODUCT
MONEY

Activity 4 Relation extraction workshop

Build systems to extract relationships between entities.

Learn to identify and extract semantic relationships between entities using pattern-based and machine learning approaches.

Relation Extraction Builder

Pattern-Based Relation Extraction

Define patterns to extract relationships:

Relation Type: Pattern Template:

Current Patterns:

No patterns added yet

Activity 5 Event extraction and timeline builder

Watch how to extract events and build interactive timelines.

This video (6 minutes) demonstrates event detection, temporal ordering, and timeline visualization for news analysis and historical research.

Activity 6 Document understanding laboratory

Apply IE to real-world document processing tasks.

Extract structured information from different document types including resumes, invoices, and forms using IE techniques.

Document Processing Suite

Resume Information Extraction

Activity 7 Knowledge graph construction

Build interactive knowledge graphs from extracted information.

Combine entity and relation extraction to create comprehensive knowledge graphs that can be queried and visualized.

Knowledge Graph Builder

Input Text

Graph Settings

Show Persons
Show Organizations
Show Locations
Show Products

Knowledge Graph Visualization

Unit Review

Test your understanding of information extraction:

Self-Assessment Quiz

1. What is the main goal of Named Entity Recognition?

Identifying and classifying entities in text
Translating text between languages
Generating new text

2. Relation extraction focuses on:

Finding relationships between identified entities
Improving text readability
Correcting grammar errors

3. Knowledge graphs represent:

Networks of interconnected entities and relationships
Text similarity scores
Document classification results

Unit 9 Information extraction

Learning outcomes

Activity 1 Introduction to information extraction

IE Pipeline Visualization

Activity 2 Named entity recognition

Activity 3 Multi-language NER laboratory

Multi-Language NER Tool

Text with Highlighted Entities

Extracted Entities

NER Statistics

Activity 4 Relation extraction workshop

Relation Extraction Builder

Pattern-Based Relation Extraction

Machine Learning Relation Extraction

Relation Extraction Testing

Extracted Relations

Activity 5 Event extraction and timeline builder

Activity 6 Document understanding laboratory

Document Processing Suite

Resume Information Extraction

Personal Information

Skills

Work Experience

Education

Invoice Information Extraction

News Article Analysis