Unit 10: Review

Learning outcomes

By the end of this unit you should:

  • know the 60 key terms related to authorship analysis
  • have practised explaining how authorship can be analyzed

Review activities

Activity 1: Basic authorship vocabulary

Work alone, in pairs or threes. Explain the following terms.

  1. authorship analysis
  2. authorship attribution
  3. authorship profiling
  4. authorship verification

Activity 2: Comparison and constrast of basic authorship terms

Work alone, in pairs or threes. Explain the differences between the following terms.

  1. authorship attribution vs. authorship profiling
  2. similarity detection vs. authorship attribution
  3. authorship profiling vs. authorship analysis
  4. authorship attribution vs. authorship verification

Activity 3: Classification and categorization

Work alone, in pairs or threes. For each of the following scenarios, decide whether the task is text classification or text categorization.

  1. There is a questioned letter. The letter was either written by the wife or the husband. Authorship analysis showed the author was most likely the husband.
  2. A letter was discovered. The author of the letter is unknown. Authorship analysis showed the letter was most likely written by a young man who did not graduate high school.
  3. A male university student submitted a essay. The professor wanted to check if the student wrote the essay himself. Authorship analysis showed that the student most likely wrote the essay.

For each of the above scenarios, decide whether the task is authorship verification, profiling or attribution.

Activity 4: Stylometry

Work alone, in pairs or threes. Answer the following questions.

  1. What are the four types of features commonly used in stylometry?
  2. Can you provide two examples for each of the types?
  3. What is the name of the R package that can be used for stylometric analysis?

Activity 5: Authorship analysis

Work alone, in pairs or threes. Answer the following questions.

  1. What are the three factors that make authorship analysis difficult?
  2. In the expression "To be or not to be.", how many words, tokens and types are there?
  3. Can you explain why in English we say heavy rain but not hard rain?

Master list

Activity 6: Vocabulary

This list contains of the important technical terms related to authorship analysis. The terms are grouped by the unit in which they were introduced.

Work alone, in pairs or groups. Describe, explain and provide examples for each of these terms.

  1. authorship
  2. analysis
  3. attribution
  4. genre
  5. borrowed language
  6. idiosyncratic language
  7. punctuation
  8. spelling
  9. vocabulary
  10. markers
  11. text
  12. questioned text
  13. known text
  14. authentic
  15. counterfeit
  16. collocation

  17. profiling
  18. forgery (to forge)
  19. attribution
  20. authorship verification
  21. similarity detection
  22. disputed
  23. anonymous
  24. plagiarism
  25. text categorization
  26. text classification
  27. judge
  28. jury
  29. machine learning
  30. support vector machine
  31. black-box model
  32. white-box model
  33. stylometry
  34. lexical features
  35. syntactic features
  36. structural features
  37. content-specific features

  38. token
  39. non-word token
  40. word token
  41. type
  42. part of speech
  43. POS tag
  44. combination
  45. permutation

  46. idiolect
  47. Stylo
  48. dendrogram
  49. cluster analysis
  50. multidimensional scaling
  51. letter case
  52. upper case
  53. lower case
  54. capital letters

  55. n-gram
  56. unigram
  57. bigram
  58. trigram

  59. needle in the haystack
  60. lexical density


If you are not sure of the meaning of any of the technical terms above, ask a question in the discussion forum on ELMS.

Running count: 60 of 60 concepts covered so far.