logo

Unit 7: Prototype design

Learning outcomes

By the end of this unit you should have:

  • considered how to operationalize your expert system
  • assigned tasks to team members
  • ensure your prototype design fullfils the essential criteria detailed below
Rubik

Activity 1: Discussion

Work in pairs. Discuss your answers to the questions below.

  1. Why are expert systems not a highly researched area nowadays?
  2. In what scenarios are rule-based expert systems more suitable than machine learning systems?
  3. What are the main components of an expert system?
  4. Are all those components present in your expert system design?

Activity 2: Final project: Primary purpose

Read.

Current expert witnesses in criminal cases who need to examine evidene and present their findings to court do not have access to a system that can compare datasets. The primary purpose of your expert system is to create a system that identifies the linguistic similarity among datasets (corpora). The system should compare two or more known corpora to a questioned corpus. The system will evaluate which corpus is most similar and which is most dissimilar. The dissimilar corpus is ruled out. This continues until only one corpus remains. That corpus is identified most similar, and therefore we assume written by the same author as the questionned corpus.

Activity 3: Essential and desirable features

Check these lists of features for the expert system which show what the system needs to do.

All features

The expert system will be able to:

  1. count, list and order the frequency of words essential
  2. count, list and order the frequency of keywords essential
  3. allow user to select reference corpus essential
  4. allow user to select statistical formular for keyness
  5. display the first 20 words/keywords of each dataset

  6. order the known datasets by number of shared words/keywords with questionned dataset
  7. display the shared words/keywords in the first 20 words/keywords of each dataset essential
  8. identify the most similiar dataset based on the highest number of shared top 20 words/keywords of the questionned dataset
  9. identify the least similiar dataset based on the lowest number of shared top 20 words/keywords of the questionned dataset essential
  10. suggest to rule out the least similiar dataset essential
  11. allow user to confirm or reject ruling out
  12. continue to identify the most/least similiar dataset and rule out the least for the top 40, then 60 keywords

  13. tag the part-of-speech (POS) of each word essential
  14. allow the user to select any word or string and display the word or string in context essential
  15. show 6 words before the target and 6 after the target word for each instance of the word in each dataset
  16. allow the user to search for POS patterns following the target word, e.g. absolutely + JJ essential
  17. count the number of identical POS patterns to Q in each K dataset essential
  18. order the known datasets by number of identical POS patterns with the questionned dataset
  19. identify the most similiar dataset based on the number of identical POS patterns with questionned dataset
  20. identify the least similiar dataset based on the number of identical POS patterns with questionned dataset essential
  21. suggest ruling out the least similiar dataset essential
  22. allow user to confirm or reject ruling out
Essential features

The expert system will be able to:

  1. count, list and order the frequency of words
  2. count, list and order the frequency of keywords
  3. allow user to select reference corpus

  4. display the shared words/keywords in the first 20 words/keywords of each dataset
  5. identify the least similiar dataset based on the lowest number of shared top 20 words/keywords of the questionned dataset
  6. suggest to rule out the least similiar dataset

  7. tag the part-of-speech (POS) of each word
  8. allow the user to select any word or string and display the word or string in context
  9. allow the user to search for POS patterns following the target word, e.g. absolutely + JJ
  10. count the number of identical POS patterns in each dataset
  11. identify the least similiar dataset based on the number of identical POS patterns with questionned dataset
  12. suggest ruling out the least similiar dataset
Desirable features

The expert system will be able to:

  1. allow user to select statistical formular for keyness
  2. display the first 20 words/keywords of each dataset

  3. order the known datasets by number of shared words/keywords with questionned dataset
  4. identify the most similiar dataset based on the highest number of shared top 20 words/keywords of the questionned dataset
  5. allow user to confirm or reject ruling out
  6. continue to identify the most/least similiar dataset and rule out the least for the top 40, then 60 keywords

  7. show 6 words before the target and 6 after the target word for each instance of the word in each dataset
  8. order the known datasets by number of identical POS patterns with the questionned dataset
  9. identify the most similiar dataset based on the number of identical POS patterns with questionned dataset
  10. allow user to confirm or reject ruling out

Activity 4: Part-of-speech tags

Read.

POS tagging is the act of labelling words with a particular part of speech. The common parts of speech are noun, verb, adverb and adjective. However, most POS taggers use a much large set of tags. The most popular POS tagset has 36 tags. NLP pipelines that aim to map syntax or disambiguate meanings often use this layer. The Penn treebank tagset is shown in the table below.

CC Coordinating conjunction CD Cardinal number DT Determiner
EX Existential there FW Foreign word IN Preposition or subordinating conjunction
JJ Adjective JJR Adjective, comparative JJS Adjective, superlative
LS List item marker MD Modal NN Noun, singular or mass
NNS Noun, plural NNP Proper noun, singular NNPS Proper noun, plural
PDT Predeterminer POS Possessive ending PRP Personal pronoun
PRP$ Possessive pronoun RB Adverb RBRAdverb, comparative
RBS Adverb, superlative RP Particle SYM Symbol
TO to UH Interjection VB Verb, base form
VBD Verb, past tense VBG Verb, gerund or present participle VBN Verb, past participle
VBP Verb, non-3rd person singular present VBZ Verb, 3rd person singular present WDT Wh-determiner
WP Wh-pronoun WP$ Possessive wh-pronoun WRB Wh-adverb

Activity 5: Learning from peers

Check the slide deck and see if there are any ideas that you could use to improve your expert system. Slides were created by students and combined into a single slide deck.

Student-designed expert systems

Review

Can you:

  1. do this
  2. do that
  3. and do something else.

If you do not, make sure that you do before your next class.

Running count: 38 of 38 concepts covered so far.