By the end of this unit you should:
Listen to this introduction to find out the name of your teacher and how to contact him on campus and via email.
Read the introduction below:
The official university course syllabus provides details of the grade percentages awarded to participation, quizzes and final assessment.
The course divides into two parts: (1) authorship and language, and (2) prototype development. You will work individually, in pairs or teams to understand how language can be used to analyze authorship. You will work in teams to develop a prototype. For students who can program in Python and prefer not to work in teams, you can form a team of one.
Active participation is defined (by me) as submitting assignments or completing assigned tasks via the learning management system ( ELMS ).
In general, each assignment or task is awarded either zero or 100%. Most assignments involve solving problems. This emoticon is used to remind you of these. Quizzes are conducted either online or live. The final assignment is the creation of a prototype authorship analysis tool. For this assignment, you need to design, develop and evaluate an original tool. Your group will need to submit three items, namely the source code, a written report and a video evaluation.
Introduce yourself to your classmates. State your preferred name, something you are proficient at (programming, gaming, maths?), and share the reason why you selected this course.
Read the following.
The course divides into two parts: knowledge acquistion and prototype development. In the knowledge acquisition part, we focus on the core concepts of time and tense. In the prototype developmet part we focus on visualization of language.
Authorship and language
The first five units are dedicated to understanding how language can be used to ascertain authorship, and enabling you to apply this knowledge to texts written in English. The five units to be covered are:
Prototype development
In this part, different visualization tools are introduced. This is followed by a brief introduction to different natural language pipelines. The lion's share of this part will be spent on prototype development. This prototype needs to be evaluated and so methods of evaluation are also covered. The final unit aims to review the course, bringing together all the core concepts covered.
The courses comprises 14 sessions and 10 units. the first half of the course will focus on Units 1 to 5. The remainder of the course will focus on Units 6 to 10.
Read and think.
Authorship identification can be simple, difficult or impossible. There are many factors that impact authorship identification. Consider your own writing in your first language. Is your writing stable? Do you use the same spellings, same structure and same punctuation consistently? Does your language change when you write short messages, posts on social network sites, or university assignments? Do you use some words or phrases more frequently than other people? Do you have a catchphrase that other people could identify as being yours? Do you ever copy the language of anyone else?
Discuss your answers in pairs or small groups.
Work individually, in pairs or in teams. Decide the genre each text comes from. The genres are children's story, newspaper article and personal letter. Explain the reasons for your choices.
Work with a partner. Discuss any patterns that you were able to find.
Consider the following phrase.
"今でしょ"
Who said this? Are you sure? What is the probability? What evidence do you have?
The activities above should have raised your awareness of the effect of genre and borrowing on language choice. There is another key issue which relates to whether the language a person uses stands out as markedly different. Language that is different is creative and original. So for example, someone introducing themselves as:
"I am John."
shows conformity and not creativity. But, someone introducing themselves as:
"I was named John, so that is my name."
shows creativity (but is likely to be considered a little odd or strange by others).
How about:
"The name is Blake, John Blake."This version of an introduction draws on the format: family name, given name then family name, which was made famous by the British secret agent, James Bond. Clearly, I am not James Bond and am not claiming to be James Bond, but I borrowed the structure, not the name.
Draw a Venn diagram to show how the three aspects of genre, borrowed language and idiosyncratic (creative and original) language interact
Identifying authorship is a classification problem. Languages can be classified at multiple levels, which include the language itself (e.g. English or French), the genre (e.g. Letter or Note), the author (e.g. Shakespeare or Chaucer). Classification at each of these levels requires the classifier (automatic or human) to make decisions based on probability.
Work in pairs or threes to solve the following problems. All decisions must be based on evidence.
Compare your answers with other groups.
Discuss and decide on the authorship of the following cases.
Knowledge and application activities are designed to help you activate the key terminology and apply the concepts covered in the course so far. Try to use the terminology and concepts accurately and appropriately.
Analyze the language in this email to decide whether or not the email was written by an American army officer. Identify the markers you use to make your decision and justify your decision.
Greetings,
My name is Maj. Gary Hoffman. I am an American soldier, presently in Iraqi for the protection of the US embassy and advise the Iraqi army in relation to the advance of ISIS. With a very desperate need for assistance, I have decided to contact you for your kind assistance to move the sum of Thirty eight Million United States Dollars to you if I can be assured that my share will be safe in your care until I complete my service.
More details will be follow
Truly Yours
Discuss the authorship of the following cases. How could authorship be analyzed in each case?
Compare and contrast the questioned text with the two known texts. Decide which markers are important. Prepare to present your evidence in support of your decision.
Questioned text
There is a bom in XXXX school. It will explode this afternoon. This is no joke. Evacuate the school by 2.00 pm or else their will be many casulties. You have been warned.
Known text 1
Yesterday afternoon, the headmaster received an anonymous email, which stated that there was a bomb in our school. To ensure the safety of the students and staff, our school was evacuated and all lessons were cancelled.
Known text 2
We had the afternoon off school yesturday. Someone sent a bom threat to Mr XXX. Our chemistry test was cancelled. The police and the bom squad arrived to search for the bom.
Make sure you can explain the following simple English:
Make sure you can explain the differences between the following in simple English:
Running count: 16 of 60 concepts covered so far.