HVPT phonetic word pair trainer
The HVPT phonetic word pair trainer is a Web-based tool which, given the target language of a student and an optional set of focus phoneme pairs, can present a series of audio recordings of native speakers pronouncing words featuring the pairs, allowing the student to select the word pronounced.

There are two main subsystems in this project:

1. The trainer itself, which presents the words, keeps score, and manages scores.

2. A setup facility allowing people to suggest new languages, new phoneme pairs, new words, and new "focus groups" (groups of phonemes which particular groups of students have trouble with -- the needs of native Japanese speakers differ from those of native Spanish speakers) and to provide new audio snippets to be integrated into the trainer.

This site's purpose is to present the technical infrastructure for this project; the trainer itself will ultimately reside elsewhere -- although during development, it will reside here for ease of use (and the development version will continue to reside here.)


Let's look at the trainer first, and in doing so we can clarify the database schema we're going to need to make it work.

The example I'll use for this presentation is the English word pair MITT vs. MEET, which differ by one phoneme which many non-anglophones find hard.

The centerpiece of the trainer is the presentation page. This page shows a certain number of questions (from 1, the simplest page, to n, which may require scrolling and might therefore be harder to use.) Each question on the page consists of:

  • A button or question mark which the user can mouse over or click to hear the word in question.
  • Two words under that (MITT and MEET), on which the user can click.
  • A score box, either to the side or between the words.

At the bottom of the page, if there is more than one question on the page, is a total score. If the session consists of more than one page full of questions, then it is a subtotal, and there is a running total between all pages in the session. Once you've left a page, you can't go back to it (that's a sanity limiter for the programmer, yes.) [Note: one could do clever tricks with Javascript to make this front-end nicer -- but these tricks wouldn't affect the database schema one whit, so for now, I'm ignoring them.]

To select a given page, then, the student must somehow select the following:

  1. A target language (example: EN or EN-UK or even EN-UK:Cockney)
  2. A set of phoneme pairs (the set could consist of a single pair) within that language.
  3. A set of word pairs exemplifying each phoneme pair.

That gives us a list of word pairs, from which we can generate the presentation page as described above rather easily with a random number generator and a little glue. So our database schema is pretty simple:


  • Name of language

(Note that we might well want to add some columns for tracking, history, documentation, and what have you -- this list is just the part that makes it work.)

Phoneme pair

  • Language
  • Phoneme pair specification

A phoneme pair specification is a string like /i/ee/ and needn't be restricted to two, actually. I'm making a decision here to avoid placing that arbitrary restriction on the system. (This is why I don't have two columns with one phoneme each.)

Word pair

  • Language
  • Phoneme pair specification
  • Word pair specification

Same rules: e.g. /mitt/meet/ - these "words" should strictly be seen as keys into the word table, though, because there's no reason to restrict them to words we can actually display in text. So they should be ASCII7-only, to make programming easy.


  • Language
  • Word specification
  • Display HTML (note below)

If the display HTML is blank, we'll default to simply using the word specification, but we can also point to a graphic here.


  • Language
  • Speaker
  • Pointer to MP3 or SWF file for audio


  • Language variant

This allows us to note that Speaker A is West Coast US while Speaker B is London or even (gulp) Glasgow. At some later date, I guarantee we'll want to be able to restrict pronunciations to regional dialects that students are actually trying to learn.

(Note again -- this Speaker record would also have name, etc., but we don't care about that information for the trainer itself.)

Phoneme pair set

  • Name
  • List of phoneme pairs

This would allow a Japanese speaker to select a set of "trouble phonemes" which include "r" and "l", while a German speaker can concentrate on those troublesome vowels and "th" versus "s".

Later in the course of the project, I'll also introduce:


  • User ID
  • Identifying or contact information



  • Session ID
  • User
  • Phoneme pair set or explicit list
  • Scoring information (both total score and phoneme pair scores.)

That's the database scheme for now. I might need to move this to another page; it's longer than I anticipated.


The main point of the setup facility is to allow people to suggest all of the records above, and also to upload word audio. Since I want users to be able to suggest things quickly and easily, but I still need a way to track changes to reverse vandalism, these are the tables I'll want to include:


  • What was added or edited
  • IP address of the originator (if anonymous)
  • User ID (if not anonymous)

Upload file

  • What the file purports to be (e.g. a display graphic or an audio file)
  • File location
  • IP and/or user ID

And that implies:

Setup user

  • User ID

And that concludes today's lesson. Tomorrow, with luck, I'll be able to code this SQL here and perhaps even start prototyping!

Creative Commons License
This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.