logo

Unit 6: Visualization tools

Learning outcomes

By the end of this unit you should:

  • be aware of the difficulties in visualizing time and tense
  • have evaluated three tense identifiers
  • have tried to a number of visualization tools
  • know how to match time expressions using regular expressions
Rubik

Activity 1: Types of visualization

Read.

It is necessary to identify the tenses and time expressions in a text prior to visualizing those features. There are two main methods for identification namely rule-based parsing, such as using regular expressions, and probabilistic parsing, for which machine learning is used. Once the features are identified, the system designer needs to decide whether to visualize the features within the text or extract those features and display them in a different format. In-text visualization can take many forms, some of which include inserting labels or emoticons, colorizing the background or text, and changing the font size, type or weight. Possible ways to visualize tenses and tenses in different formats include charts, graphs and tables.

colourful visualization

Activity 2: Financial graphs from texts

Read.

Company reports and newspaper articles frequently present the quarterly and annual sales results using graphs. A hot topic in natural language processing is automatic graph generation.

Consider the following sentence:

The share price for UOA industries rose by 5% this quarter to close at 4567 yen.

From this sentence we can extract a lot of information needed to create a graph visualizing this sentence, namely:

  • title: share price for UOA industries
  • x axis: time (measured in quarters)
  • y axis: closing value of stock (yen)
  • trend: rise
  • slope: slight rise (5%)
  • first point: value that increases by 5%
  • second point: 4567

From only one simple sentence, we can create a simple straight-line graph. However, to do so we need to extract the relevant data and values to complete the necessary cells in a table that can be used to create a graph.

stock crash

Activity 3: Financial texts from graphs

Read

Generating text from graphs is a particularly interesting challenge. Top-tier English language tests, such as IELTS and TOEFL, test candidates ability to describe graphs and charts. Candidates are presented with the graph and need to write the corresponding text. To provide students with practice for this activity, a program that could automatically generate text for any graph would be helpful. Then, students can compare their writing with an automatically-generated model.

To automatically generate text, the system needs to identify:

  • the item being graphed
  • the direction of fluction (e.g. rise, fall, no change)
  • the amount of fluctuation
  • the points when fluction occurs

Let's consider the task of visualizing the changes in sales of Taiwanese bananas to a supermarket last year. The sales are reported each month. Time periods can be reported in months or quarters. The highest and lowest points of the graph also need to be known.

To automatically generate text, the system needs to identify and generate:

  • the sales of bananas, banana sales, total sales, the sales
  • rise (increase, climb), fall (decrease, drop), remain steady
  • 100,000, 50,000, 75,000, etc.
  • Jan, Feb, Mar....Dec, First quarter, Second quarter, etc.
  • peak, hit a low

How would you try to solve this? One approach could be to identify the values and terms needed, then write functions to extract and store the data, and finally create a function to plot the graph. Easy to say, but how difficult is it to do? Using a language and any libraries you are familiar with, have a go.

Activity 4: Data visualization using Rawgraphs

Access the Rawgraphs website here. Start by using one of their sample datasets. Select a chart time. Then, drag and drop the elements to map the dimensions, and customize your visualization in the final section. Try out a number of the 21 types of charts available.

This activity should give you an idea of how powerful data visualization is, and the wide choice of charts.

16 charts

Activity 5: Data visualization using Tableau

Watch and listen to a short explanation of Tableau

Tableau public is the free version of Tableau, but comes with some restrictions. However, for the purposes of this course, it is possible to use Tableau to create your visualization prototype.

Activity 6: Time Expression Detection Using Soft Patterns (suitable for those familiar with deep learning)

Look through this code to see how their logic works, and run the code. If you find anything interesting, share it in the discussion forum.

For those who are interested in their SoPa (soft patterns approach which lies between RNNs and CNNS), check out this arxiv paper.

Activity 7: Timelines

Watch and listen to a short explanation (4 min 33 sec) on how to make timelines in Python using labella and R using timevis.

Activity 8: Google charts

Read the following.

Google charts can be used to visualize data online. Check out the introductory guide to Google charts here. Be sure to try out the "add interactivity". The chart below is created using Google charts.

Activity 9: Textexture

Copy and paste a short newspaper article or story into Textexture to see the text visualized as a network graph.

This tool is no longer supported by the developer who now offers a pay-for-use out-of-the-box version and an open source version for those who can program. The open source version is available on Github here.

titanic

Activity 10: Using regular expressions (in Python)

Create regular expressions to match common time expressions. You can use any programming language, but Python or JavaScript ar the recommended ones. This practice activity on w3schools is a good starting point if you are not familiar with Python. The code for regular expressions is simple, but regular expressions can look rather complicated. When developing a prototype from scratch you will almost certainly need to use regular expressions.

Knowledge and application

Activity 11: Pseudocode to generate financial text from a table

Explain in words (pseudocode) how to automatically generate a text from the table of annual banana sales is given below.

Month Number of banana sales (million yen)
January 2345
February 4055
March 3300
April 3305
May 2310
June 3120
July 2885
August 5012
September 4862
October 3477
November 2991
December 1654

For example, to decide the trend or directionality: if the monthly sales value is greater than the previous, use "rise"; if the value is lower, use "fall", else use "remain steady". Other concepts to consider include:

  • select different grammatical subjects in subsequent clauses or sentences
  • add adverbs such as slightly, substantially
  • identify the highest and lowest points

Activity 12: Comparison of tense identifiers

Try out the three tense identification tools on the course tools tab. Compare and contrast the tools. Point out the benefits and drawbacks of each tool. Include the following:

  1. Describe the similarities between the tools.
  2. Describe the differences between the tools.
  3. Describe how you think each tool works based on your experimentation.
  4. If possible, make specific suggestions on how any of the tools can be improved.

Review

Make sure you can explain the following 3 concepts in simple English:

  1. rule-based parsing
  2. probabilistic parsing
  3. regular expressions

Running count: 62 of 70 time-and-tense-related concepts covered so far.

"If you are depressed, you are living in the past, if you are anxious, you are living in the future, if you are at peace, you are living in the present." - Lao Tzu

Copyright John Blake, 2020