logo

Unit 5: Case studies

Learning outcomes

By the end of this unit you should:

  • understand how authorship analysis solves real-world problems
  • have practised armchair analysis
Rubik

Activity 1: Terminology review

Work in pairs. Discuss your answers to the following questions

What effect do the following have on authorship analysis?

  1. genre
  2. borrowed language
  3. idiosyncratic language

Explain the following terms in simple English.

  1. markers
  2. collocation
  3. lexical features
  4. syntactic features
  5. idiolect
  6. letter case

Activity 2: Idiolect analysis

Your name is Yuki Abe. You have just received a study abroad scholarship and will go to the United States. You will stay with a family. There is a university student in the family called Joey. Write an email of about 200 words introducing yourself to Joey. You CAN use an online dictionary but you CANNOT use Translation tools (e.g. DeepL, GoogleTranslate) of AI-generation tools (e.g. ChatGPT). Include the following information:

  1. Your name - Yuki Abe
  2. Your hometown - Aizu-wakamatsu
  3. Your university - UoA.
  4. Your hobbies - manga, online games, etc.
  5. Your plans - visit Rocky mountains, etc.

Submit your work via ELMS.

Activity 3: 2019 Ayia Napa statement

A 19-year-old British woman claimed that the police statement that she had written by hand was dictated to her by a Cypriot police officer. If the language used is typical of a young British woman, then her claim is likely to be untrue. However, is the language is not typical of a young British woman, then her claim is likely to be false. The full text is given below:

Statement

The report that I did on the 17th of July 2019 that I was raped at ayia napa was not the truth. The truth is that I wasnt raped and everything that happened in that appartment was with my consent. The reason I made the statement with the fake report is because I did not know they were recording & humiliating me that night I discovered them recording me doing sexual intercourse and I felt embarrassed so I want to appologise, say I made a mistake.

Source: Donlan, L., & Nini, A. (2022). A forensic authorship analysis of the Ayia Napa rape statement. In I. Picornell, R. Perkins, & M. Coulthard (Eds.), Methodologies and Challenges in Forensic Linguistic Casework, (pp.29-43). Wiley.

To solve this case an authorship profiling approach was adopted. The forensic linguist identified the language features listed below as being worthy of analysis.

  1. The report that I did (collocation: do / report)
  2. was not the truth (collocation: be / truth)
  3. apartment
  4. discover them recording me (collocation: discover / -ing)
  5. doing sexual intercourse (collocation do / sexual intercourse)

Analyze the features listed and submit your work via ELMS.

Activity 4: Access point: n-gram

Read.

N-grams are often used in forensic linguistics as the access point to identify the distinguishing, that is, the idiosyncratic features of an author. Access point is the name given to the way in which a text is first analyzed. The overuse or underuse of particular n-grams can help narrow down or identity the author of a questioned text. In the fields of computational linguistics and probability, an n-gram (sometimes also called Q-gram) is a contiguous sequence of n items from a given sample of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. Using Latin numerical prefixes, an n-gram of size 1 is referred to as a "unigram"; size 2 is a "bigram" (or, less commonly, a "digram"); size 3 is a "trigram". English cardinal numbers are sometimes used, e.g., "four-gram", "five-gram", and so on.

Discuss the meaning of the following with a partner.

  1. access point
  2. n-gram
  3. unigram
  4. bigram
  5. trigram

Knowledge and application

Activity 5: Case study 1

The reading text in Activity 4 is a combination of sentences written your tutor and someone else.

Work with a partner to try to identify which sentences were NOT written by your tutor.

Activity 6: Case study 2

Individual work

Work alone. Insert a sentence taken from Wikipedia into the email that you submitted in Activity 2.

Whole class work

Share your revised version by displaying it on your screen if you are in a classroom. Divide into two groups of students - authors and detectives. Authors stay by your screen. Detectives move around the classroom and identify the sentences copied from Wikipedia

Team work

Discuss how to automatically to identify the text NOT written by the author.

Review

Make sure you can explain the following in simple English:

  1. n-gram
  2. unigram
  3. bigram
  4. trigram

Running count: 58 of 60 concepts covered so far.