logo

Unit 6: Natural language generation

Learning outcomes

By the end of this unit you should:

  • understand how natural language generation can be used for language learning
  • have assessed the pros and cons of rule-based parsing and probabilistic parsing
  • are aware of the potential and pitfalls of the usage of large language models
  • have designed and developed a rule-based natural language generation tool
Rubik

Activity 1: Introduction to natural language generation for language learning purposes

Read.

Natural Language Generation (NLG) has emerged as an influential subfield of Artificial Intelligence (AI), transforming the landscape of language learning. NLG algorithms, such as OpenAI's GPT-4, enable the generation of coherent and contextually relevant textual content, revolutionizing the way students acquire new languages, including but not limited to English. With applications ranging from personalized language learning systems to automated essay evaluation, NLG offers an array of possibilities to augment traditional language acquisition methods. For instance, NLG-powered chatbots can simulate interactive dialogues, providing learners with real-time feedback and adaptive practice.

Work in pairs. Discuss your answers to the following questions.

  1. What are some specific examples of NLG applications in language learning?
  2. How do NLG algorithms ensure the generated content is contextually appropriate for learners?
  3. What challenges do educators face in implementing NLG technologies in language learning curricula?
  4. Are large language models suitable for language learning? Why (not)?

Activity 2: Chatbots in the classroom

Identify which of the following descriptions of the application of chatbots in the language learning classroom is not appropriate.

  1. Individualized practice: Chatbots facilitate personalized learning experiences, allowing students to practice at their own pace and focus on areas where they need improvement. They can adapt their responses based on a learner's proficiency level, reinforcing language skills in real-time.
  2. Immediate feedback: Chatbots provide instant feedback on learners' language usage, such as grammar, vocabulary, and pronunciation, fostering faster progress and skill development. They can also offer suggestions and corrections, enabling students to learn from their mistakes and refine their language skills.
  3. Simulated conversations: Chatbots enable learners to engage in real-life-like dialogues, improving their speaking and listening skills. This interactive approach helps students develop fluency and confidence in their ability to communicate in English.
  4. Enhancing motivation: Gamification elements, such as points, levels, and badges, can be incorporated into chatbot interactions, making language learning more engaging and enjoyable. This can boost students' motivation to practice and enhance their language proficiency.
  5. Cultural exposure: Chatbots can incorporate aspects of English-speaking cultures into their responses, exposing learners to cultural nuances, idioms, and expressions, enriching their overall language learning experience.
  6. Supplementary support: Chatbots can be used as additional tools to complement traditional classroom instruction. They can provide extra practice outside of class hours, enabling learners to reinforce their language skills and deepen their understanding of the material.
  7. Accessibility: Chatbots are often available through web or mobile applications, making them easily accessible to learners anytime, anywhere. This convenience encourages more frequent practice and fosters continuous language development.
  8. Replacing human teachers: Chatbots can completely replace human teachers in the English language classroom, as they possess the same level of empathy, cultural sensitivity, and ability to foster genuine human connections as an experienced educator.

Compare your choice with your partner's choice.

Activity 3: Trend description generation

Read.

Describing graphs and bar charts is a common task for both students and workers. In the UoA in the graduation thesis students usually show results using figures and write an accompanying description. In business, sales figures are frequently presented using graphs and accompanying notes. Spreadsheet programs can convert a table of values into a multitude of different formats. However, it cannot produce a textual description (yet). Learners of English often need to pass proficiency examinations, such as IELTS, TOEFL and TOEIC. All of these exams harness descriptions of data series. From a set of values it is possible to generate trend descriptions using a set of rules.

Listen to a short lecture given by your tutor using this slide.

Activity 4: Guiding principles

Read

Guiding principles are designed to help produce output that is concise, corpus-informed, and maintains expectations of collocation (words that occur together) and colligation (words that occur with grammatical features). Adhering to conventions of collocation and colligation involves multiple challenges, which can also be described using guiding principles, such as minimizing repetition of content words and sentence patterns. The guiding principles were informed by analyses of a corpus of data-series trend description corpus. These principles may be used by both developers of trend description generation software and by students who need practice to write trend descriptions. The descriptions are worded with software developers in mind, and so some minor alterations when using the principles with learners are needed. The guiding principles are not designed to be mapped directly to specific instructions or snippets of code, but are there to provide a general direction in which the codebase should be steered. A selection of the guiding principles is reproduced below:

Read. Consider how these principles may be realized in a program.

  1. Generate a simple sentence showing the change between each pair of values.
  2. Merge subsequent sentences into single simple sentences when the direction of change is the same.
  3. Use verb showing the direction of change in at least a half of all sentences
  4. Use noun showing the direction of change for a third to a quarter of all sentences.
  5. Use different grammatical subjects in subsequent simple sentences.
  6. Append prepositional phrases describing values in the following order: initial value, the value of change and final value (e.g. from X by Y to Z).
  7. Append prepositional phrases describing time periods in the following order: initial period and final period (e.g. from P1 to P2).
  8. When the value remains constant in subsequent sentences omit one value.
  9. Place prepositional phrase describing value before prepositional phrase describing time period (e.g. from 32 in January to 48 in February).
  10. Include no more than two prepositional phrases describing values in one clause.
  11. Do not repeat identical prepositional phrases in subsequent sentences.

Activity 5: Task and requirements

Create a program in either Python (using NLTK) or JavaScript (using Compromise). The program should generate a trend description. There general requirements are:

  1. There is a maximum of 12 datapoints (e.g. one point for each month), i.e. 11 changes in data values.
  2. The title is in the form "THE + NOUN + PREPOSITION + NOUN", e.g. the price of bananas, the number of sales, etc.
  3. The data values are given as "TIME POINT, VALUE" , e.g. January, 3245 or 11:00 am, 44.
  4. Users should be able to generate a dataset or input their own dataset in the prescribed format.
  5. Default values should be loaded so users can try out the program quickly.
  6. Users should be able to input values manually or via CSV file.
  7. The program should generate trend descriptions at different proficiency levels, e.g. Beginner, Intermediate and Advanced.

The requirements for each proficiency level are given below

  1. Beginner: Introductory sentence is provided. One simple sentence per change. All sentences use verb of change. Verbs are limited to 2 for each directionality.
  2. Elementary: One simple sentence per direction of change, e.g. merge consecutive sentences with same directionality.
  3. Pre-intermediate: Introductory sentence describes general trend. Up to a third of all sentences use noun of change. More verbs are included.
  4. Intermediate: Adverbs and adjectives are used to describe magnitude of change.
  5. Upper intermediate: Compound sentences are used. The highest and lowest points are pointed out.
  6. Advanced: Complex sentences are used. Wider range of verbs and nouns of change are used.

Submit your code via the learning management system ( ELMS ) as an HTML file using the same format as the example divisions shown on this webpage.

Review

Can you explain the differences between the following?

  1. rule-based parsing vs. probabilistic parsing
  2. natural language processing vs. natural language generation

If you cannot, make sure that you do before your next class.

Running count: 50 of 65 concepts covered so far.