logo

Unit 6: Natural language generation

Learning outcomes

By the end of this unit you should:

  • understand how natural language generation can be used for language learning
  • have assessed the pros and cons of rule-based parsing and probabilistic parsing
  • are aware of the potential and pitfalls of the usage of large language models
  • have designed and developed a rule-based natural language generation tool
Rubik

Activity 1: Introduction to natural language generation for language learning purposes

Read.

Natural Language Generation (NLG) has emerged as an influential subfield of Artificial Intelligence (AI), transforming the landscape of language learning. NLG algorithms, such as OpenAI's GPT-4, enable the generation of coherent and contextually relevant textual content, revolutionizing the way students acquire new languages, including but not limited to English. With applications ranging from personalized language learning systems to automated essay evaluation, NLG offers an array of possibilities to augment traditional language acquisition methods. For instance, NLG-powered chatbots can simulate interactive dialogues, providing learners with real-time feedback and adaptive practice.

Work in pairs. Discuss your answers to the following questions.

  1. What are some specific examples of NLG applications in language learning?
  2. How do NLG algorithms ensure the generated content is contextually appropriate for learners?
  3. What challenges do educators face in implementing NLG technologies in language learning curricula?
  4. Are large language models suitable for language learning? Why (not)?

Activity 2: Chatbots in the classroom

Identify which of the following descriptions of the application of chatbots in the language learning classroom is not appropriate.

  1. Individualized practice: Chatbots facilitate personalized learning experiences, allowing students to practice at their own pace and focus on areas where they need improvement. They can adapt their responses based on a learner's proficiency level, reinforcing language skills in real-time.
  2. Immediate feedback: Chatbots provide instant feedback on learners' language usage, such as grammar, vocabulary, and pronunciation, fostering faster progress and skill development. They can also offer suggestions and corrections, enabling students to learn from their mistakes and refine their language skills.
  3. Simulated conversations: Chatbots enable learners to engage in real-life-like dialogues, improving their speaking and listening skills. This interactive approach helps students develop fluency and confidence in their ability to communicate in English.
  4. Enhancing motivation: Gamification elements, such as points, levels, and badges, can be incorporated into chatbot interactions, making language learning more engaging and enjoyable. This can boost students' motivation to practice and enhance their language proficiency.
  5. Cultural exposure: Chatbots can incorporate aspects of English-speaking cultures into their responses, exposing learners to cultural nuances, idioms, and expressions, enriching their overall language learning experience.
  6. Supplementary support: Chatbots can be used as additional tools to complement traditional classroom instruction. They can provide extra practice outside of class hours, enabling learners to reinforce their language skills and deepen their understanding of the material.
  7. Accessibility: Chatbots are often available through web or mobile applications, making them easily accessible to learners anytime, anywhere. This convenience encourages more frequent practice and fosters continuous language development.
  8. Replacing human teachers: Chatbots can completely replace human teachers in the English language classroom, as they possess the same level of empathy, cultural sensitivity, and ability to foster genuine human connections as an experienced educator.

Compare your choice with your partner's choice.

Activity 3: Understanding comparative descriptions

Read and identify features

Table 1 compares the number of Asian and European tourists visiting five countries in 2023. Overall, more Asian tourists visited these countries than European tourists. The highest number of Asian tourists visited Japan, while the highest number of European tourists visited France. Notably, the number of Asian and European tourists to Germany was almost equal. Thailand received the fewest European tourists.

Identify the following:

  1. Introductory sentence
  2. Overall comparison
  3. Specific comparisons using superlatives (most, fewest)
  4. Special cases

Discuss

  1. What sentence patterns are used for comparison?
  2. How are these patterns changed to show degree?
  3. How can you compare these?: "England 100k visitors, Scotland 10k visitors"

Activity 4: Drafting guiding principles

Formulate guiding principles for comparing aspects across multiple categories

Use the sample paragraph from Activity 3 and your observations to formulate guiding principles for a description generator that compares values in a table. These principles will guide the structure and variety of output produced by your program.

Here are six principles from the learner’s point of view. Can you add to these?

  1. Begin with a sentence introducing the table and its key variables.
  2. Include a general trend overview (e.g. which group is generally higher).
  3. Use superlatives to identify the highest and lowest values in each group.
  4. Mention exceptions or equal values as special cases.
  5. Vary sentence structures to improve readability.
  6. Avoid repeating grammatical subjects and sentence openings.

Activity 5: Task and requirements

Write a program in Python or JavaScript to generate comparative descriptions from tabular data. If you prefer to use a different language, please negotiate with your tutor.

The dataset should compare two groups (e.g. Asian and European visitors) across five or more items (e.g. countries). Your program should:

  1. Allow users to input data manually or via CSV file.
  2. Generate an introductory sentence describing the table.
  3. Output a general statement (e.g. "Overall, more X than Y").
  4. Identify and describe the highest and lowest values for each group.
  5. Identify equal or similar values across groups and describe them.
  6. Use varied sentence templates based on guiding principles created in Activity 7.

Optional: Add proficiency level output modes (e.g. Basic, Intermediate, Advanced) that affect vocabulary and sentence complexity.

Submit your code via ELMS. Include the description output on the page with sample inputs for testing.

Activity 6: Trend description generation

Read.

Describing graphs is a common task for both students and workers. In the UoA in the graduation thesis students usually show results using figures and write an accompanying description. In business, sales figures are frequently presented using graphs and accompanying notes. Spreadsheet programs can convert a table of values into a multitude of different formats. However, it cannot produce a textual description (yet). Learners of English often need to pass proficiency examinations, such as IELTS, TOEFL and TOEIC. All of these exams harness descriptions of data series. From a set of values it is possible to generate trend descriptions using a set of rules.

Listen to a short lecture given by your tutor using this slide.

Activity 7: Guiding principles

Read

Guiding principles are designed to help produce output that is concise, corpus-informed, and maintains expectations of collocation (words that occur together) and colligation (words that occur with grammatical features). Adhering to conventions of collocation and colligation involves multiple challenges, which can also be described using guiding principles, such as minimizing repetition of content words and sentence patterns. The guiding principles were informed by analyses of a corpus of data-series trend description corpus. These principles may be used by both developers of trend description generation software and by students who need practice to write trend descriptions. The descriptions are worded with software developers in mind, and so some minor alterations when using the principles with learners are needed. The guiding principles are not designed to be mapped directly to specific instructions or snippets of code, but are there to provide a general direction in which the codebase should be steered. A selection of the guiding principles is reproduced below:

Read. Consider how these principles may be realized in a program.

  1. Generate a simple sentence showing the change between each pair of values.
  2. Merge subsequent sentences into single simple sentences when the direction of change is the same.
  3. Use verb showing the direction of change in at least a half of all sentences
  4. Use noun showing the direction of change for a third to a quarter of all sentences.
  5. Use different grammatical subjects in subsequent simple sentences.
  6. Append prepositional phrases describing values in the following order: initial value, the value of change and final value (e.g. from X by Y to Z).
  7. Append prepositional phrases describing time periods in the following order: initial period and final period (e.g. from P1 to P2).
  8. When the value remains constant in subsequent sentences omit one value.
  9. Place prepositional phrase describing value before prepositional phrase describing time period (e.g. from 32 in January to 48 in February).
  10. Include no more than two prepositional phrases describing values in one clause.
  11. Do not repeat identical prepositional phrases in subsequent sentences.

Activity 8: Task and requirements

Create a program in either Python (using NLTK) or JavaScript (using Compromise). The program should generate a trend description. There general requirements are:

  1. There is a maximum of 12 datapoints (e.g. one point for each month), i.e. 11 changes in data values.
  2. The title is in the form "THE + NOUN + PREPOSITION + NOUN", e.g. the price of bananas, the number of sales, etc.
  3. The data values are given as "TIME POINT, VALUE" , e.g. January, 3245 or 11:00 am, 44.
  4. Users should be able to generate a dataset or input their own dataset in the prescribed format.
  5. Default values should be loaded so users can try out the program quickly.
  6. Users should be able to input values manually or via CSV file.
  7. The program should generate trend descriptions at different proficiency levels, e.g. Beginner, Intermediate and Advanced.

The requirements for each proficiency level are given below

  1. Beginner: Introductory sentence is provided. One simple sentence per change. All sentences use verb of change. Verbs are limited to 2 for each directionality.
  2. Elementary: One simple sentence per direction of change, e.g. merge consecutive sentences with same directionality.
  3. Pre-intermediate: Introductory sentence describes general trend. Up to a third of all sentences use noun of change. More verbs are included.
  4. Intermediate: Adverbs and adjectives are used to describe magnitude of change.
  5. Upper intermediate: Compound sentences are used. The highest and lowest points are pointed out.
  6. Advanced: Complex sentences are used. Wider range of verbs and nouns of change are used.

Submit your code via the learning management system ( ELMS ) as an HTML file using the same format as the example divisions shown on this webpage.

Review

Can you explain the differences between the following?

  1. rule-based parsing vs. probabilistic parsing
  2. natural language processing vs. natural language generation

If you cannot, make sure that you do before your next class.

Running count: 50 of 65 concepts covered so far.