Unit 7: Prototype design

Learning outcomes

By the end of this unit you should have:

considered how to operationalize your expert system
assigned tasks to team members
ensure your prototype design fullfils the essential criteria detailed below

Activity 1: Discussion

Work in pairs. Discuss your answers to the questions below.

Why are expert systems not a highly researched area nowadays?
In what scenarios are rule-based expert systems more suitable than machine learning systems?
What are the main components of an expert system?
Are all those components present in your expert system design?

Activity 2: Final project: Primary purpose

Read.

Current expert witnesses in criminal cases who need to examine evidene and present their findings to court do not have access to a system that can compare datasets. The primary purpose of your expert system is to create a system that identifies the linguistic similarity among datasets (corpora). The system should compare two or more known corpora to a questioned corpus. The system will evaluate which corpus is most similar and which is most dissimilar. The dissimilar corpus is ruled out. This continues until only one corpus remains. That corpus is identified most similar, and therefore we assume written by the same author as the questionned corpus.

Activity 3: Essential and desirable features

Check these lists of features for the expert system which show what the system needs to do.

All features

The expert system will be able to:

count, list and order the frequency of words essential
count, list and order the frequency of keywords essential
allow user to select reference corpus essential
allow user to select statistical formular for keyness
display the first 20 words/keywords of each dataset

order the known datasets by number of shared words/keywords with questionned dataset
display the shared words/keywords in the first 20 words/keywords of each dataset essential
identify the most similiar dataset based on the highest number of shared top 20 words/keywords of the questionned dataset
identify the least similiar dataset based on the lowest number of shared top 20 words/keywords of the questionned dataset essential
suggest to rule out the least similiar dataset essential
allow user to confirm or reject ruling out
continue to identify the most/least similiar dataset and rule out the least for the top 40, then 60 keywords

tag the part-of-speech (POS) of each word essential
allow the user to select any word or string and display the word or string in context essential
show 6 words before the target and 6 after the target word for each instance of the word in each dataset
allow the user to search for POS patterns following the target word, e.g. absolutely + JJ essential
count the number of identical POS patterns to Q in each K dataset essential
order the known datasets by number of identical POS patterns with the questionned dataset
identify the most similiar dataset based on the number of identical POS patterns with questionned dataset
identify the least similiar dataset based on the number of identical POS patterns with questionned dataset essential
suggest ruling out the least similiar dataset essential
allow user to confirm or reject ruling out

Essential features

The expert system will be able to:

count, list and order the frequency of words
count, list and order the frequency of keywords
allow user to select reference corpus

display the shared words/keywords in the first 20 words/keywords of each dataset
identify the least similiar dataset based on the lowest number of shared top 20 words/keywords of the questionned dataset
suggest to rule out the least similiar dataset

tag the part-of-speech (POS) of each word
allow the user to select any word or string and display the word or string in context
allow the user to search for POS patterns following the target word, e.g. absolutely + JJ
count the number of identical POS patterns in each dataset
identify the least similiar dataset based on the number of identical POS patterns with questionned dataset
suggest ruling out the least similiar dataset

Desirable features

The expert system will be able to:

allow user to select statistical formular for keyness
display the first 20 words/keywords of each dataset

order the known datasets by number of shared words/keywords with questionned dataset
identify the most similiar dataset based on the highest number of shared top 20 words/keywords of the questionned dataset
allow user to confirm or reject ruling out
continue to identify the most/least similiar dataset and rule out the least for the top 40, then 60 keywords

show 6 words before the target and 6 after the target word for each instance of the word in each dataset
order the known datasets by number of identical POS patterns with the questionned dataset
identify the most similiar dataset based on the number of identical POS patterns with questionned dataset
allow user to confirm or reject ruling out

Activity 4: Part-of-speech tags

Read.

POS tagging is the act of labelling words with a particular part of speech. The common parts of speech are noun, verb, adverb and adjective. However, most POS taggers use a much large set of tags. The most popular POS tagset has 36 tags. NLP pipelines that aim to map syntax or disambiguate meanings often use this layer. The Penn treebank tagset is shown in the table below.

CC Coordinating conjunction	CD Cardinal number	DT Determiner
EX Existential there	FW Foreign word	IN Preposition or subordinating conjunction
JJ Adjective	JJR Adjective, comparative	JJS Adjective, superlative
LS List item marker	MD Modal	NN Noun, singular or mass
NNS Noun, plural	NNP Proper noun, singular	NNPS Proper noun, plural
PDT Predeterminer	POS Possessive ending	PRP Personal pronoun
PRP$ Possessive pronoun	RB Adverb	RBRAdverb, comparative
RBS Adverb, superlative	RP Particle	SYM Symbol
TO to	UH Interjection	VB Verb, base form
VBD Verb, past tense	VBG Verb, gerund or present participle	VBN Verb, past participle
VBP Verb, non-3rd person singular present	VBZ Verb, 3rd person singular present	WDT Wh-determiner
WP Wh-pronoun	WP$ Possessive wh-pronoun	WRB Wh-adverb

Activity 5: Learning from peers

Check the slide deck and see if there are any ideas that you could use to improve your expert system. Slides were created by students and combined into a single slide deck.

Student-designed expert systems

Review

Can you:

do this
do that
and do something else.

If you do not, make sure that you do before your next class.

Running count: 38 of 38 concepts covered so far.