Texts and Tools (TNT) Lab

The current focus in the Texts and Tools lab is on creating practical online tools that help people learn English. The tools we create often detect and/or visualize particular language features. Some language features are easy to detect automatically while others are much more challenging. Our research draws on corpus linguistics to analyze texts and computational linguistics to create rule-based and probabilistic-based pattern-searching tools or pipelines.

Lab overview

  • Vision: Enhancing language learning using technology
  • Mission: Help scientists share their research with the world
  • Aims:
    Create tools for scientists to
    1. understand written research documents
    2. produce written research documents
  • Objectives for AY2024:
    1. Release production-ready authorship analysis corpus tool
    2. Develop fine-grained POS tagger
    3. Develop and evaluate explainable authorship analysis tool
    4. Refine and increase functinality of trend description generator
    5. Develop and evaluate iCALL suite of language learning tools
  • Slogan: Grit and a razor-focus leads to success
  • Values: Integrity and work ethic (plodding and bursting)
  • Culture: Tracking board showing key performance indicators (KPIs) - "what gets measured gets done"

Research themes

  • eXplainable AI (XAI) Approach for Detecting AI-Generated Text In order to build trust and transparency in AI systems, explainability plays a crucial role. Our team is dedicated to exploring XAI techniques specifically tailored for detecting AI-generated text. By providing interpretable insights and explanations for the detection process, we aim to empower users and stakeholders to make informed decisions regarding the authenticity and reliability of textual content.
  • Corpus Tool for Authorship Analysis This project focuses on developing a besoke corpus tool for forensic linguists that facilitates the comparison and contrast of similarities between or among datasets. This tool aims to streamline the analysis of linguistic patterns, aiding in the investigation of authorship and uncovering valuable insights from textual data. Our goal is to provide forensic linguists with a robust and user-friendly resource that enhances their ability to analyze and interpret complex language-related phenomena.
  • Trend description generator The Trend Description Generator is a natural language generation (NLG) tool which creates textual descriptions to accompany charts and graphs. The description is generated from the same structured dataset used to create the visuals. This open-access online tool enables learners of English to experiment and see how altering datapoints impacts the language features used to describe trends.
  • Interactive arithmetic This project aims to create an interactive learning resource for young learners to master the four core operations of addition, subtraction, multiplication and division. The initial focus is on providing tools for two-digit multiplication using various methods, and creating accompanying visualizations.

Lab member recruitment: undergraduate and graduate

If you want to develop practical language-related online tools, consider joining this lab. I am keen to recruit students who are keen to use their coding skills to create language tools that will help others improve their reading and writing skills. English is used as the primary lingua franca. Online communication is via Slack. If you have no interest in coding, this is not the lab for you. I have one expectation for lab members: show grit.

The TNT lab aims to be a place that offers members a supportive challenging atmosphere. Each lab member brings with a different set of skills, behaviours, knowledge and interests. We aim to harness these individual differences to their best and match members with research projects that best suit them. Lab members are expected to contribute to a variety of projects when B3 students. By working on a variety of projects, you will get valuable experience which will help you design and lead your own project as a B4 student. This project will be the vehicle for their graduation thesis.

Schedule for 2024

Formal lab meetings are usually held in Semester 1 for B3 students and Semester 2 for B4 students. The planned schedule is as follows:

Bachelor students (B3 and B4 students)
  • Quarter 1: practical programming - Python (B3)
  • Quarter 2: practical programming - NLTK and Keras (B3)
  • Quarter 3: GT research - GT first draft (Introduction & Method) (B4)
  • Quarter 4: GT research - GT submission and presentation (B4)
Masters (M1 and M2 students)
  • All year: Weekly project and progress meetings

Recommended courses for lab members

Recommended (but not required) courses

  • FU08 Automata and languages
  • FU10 Language processing systems
  • FU14 Introduction to software engineering
  • IT11 Information retrieval and natural language processing
  • EL317 Patterns and language
  • EL331 Authorship analysis using Python

Lab members

Current lab members:

  • Senior associate professor: John Blake
  • Lab members from April 2024:
    • Kazuma Tamura (M1)
    • Tsubasa Sato (B4)
    • Chihiro Sato (B3)
    • Kazuto Tomizawa (B3)
    • A.N. Other (B2)

(B2 = sophomore, B3 = junior, B4 = senior, M1 = first-year master degree, M2 = second-year master degree)

KPI table

Lab alumni:

  • Kazuma Tamura. Graduation thesis submitted 2024: Explainable authorship analysis using SHAP and POS.
  • Fumito Takeue. Graduation thesis submitted 2024: Identifying and providing graduated feedback on lexical and grammatical errors.
  • Yusuke Niiyama. Graduation thesis submitted 2022: Development of an app of trend description generation.
  • Izumu Koshihara. Graduation thesis submitted 2022: Authorship attribution application and algorithm.
  • Kento Miura. Graduation thesis submitted 2022: Comparison of document features by passive voice, n-gram and readability using natural language processing.
  • Takumi Kondo. Graduation thesis submitted 2020: Pattern detection and video typology.
  • Hiroki Inoue. Graduation thesis submitted 2018: Verification and improvement of software to support reading English aloud.