Natural Sciences and Engineering Research Council of Canada
Symbol of the Government of Canada

Common menu bar links

Past Winner
2006 NSERC André Hamer Postgraduate Prize

Constance Adsett

Master's Level

Dalhousie University

Constance Adsett
Constance Adsett

Computer systems that convert text to speech have come a long way, but they remain prone to stumbling over unfamiliar words. Computer scientist Constance Adsett wants to minimize that problem by finding the best tools for automatically breaking words into their proper syllables.

Accurate syllabification can significantly enhance the ability of a text-to-speech system to pronounce words correctly. Adsett's master's research, which she will do with the help of an NSERC André Hamer Prize, will test and compare the performance of existing algorithms for automatic syllabification. She will work in conjunction with thesis supervisor Yannick Marchand at the National Research Council's Institute for Biodiagnostics Atlantic branch.

Syllabification tools fall into two categories: "rule-based" and "data-driven." As the name suggests, rule-based algorithms seek to define the rules by which a given language divides and pronounces its words. In English, for example, that would include a syllable break falling between double letters or beginning each syllable with a consonant (at least most of the time).

The trouble with rule-based approaches is that they depend heavily on the involvement of linguistic experts. The list has to be accurate and complete, and rules may have to be applied in a specific order. Also, as Adsett points out, "Rules don't always allow for exceptions."

Adsett prefers data-driven approaches, which start with a database of words whose syllabification is known. When the system encounters a new word, it compares its structure to the known words in order to determine the correct syllabification. With each new word, the system "learns" more, which should increase its accuracy for the next unfamiliar word. The same approach works for any language.

"The exceptions are easier to handle in that case because there are different types of words that might match one another better," Adsett observes.

She says her research will focus on nine languages, chosen mainly because there were existing databases available. The list includes major languages such as English and French, but also lesser known languages such as Frisian (from the northern province of the Netherlands) and Basque (from the northern part of Spain). The latter databases exist thanks to the efforts of people who are trying to preserve traditional dialects.

This area of research appealed to Adsett's practical nature. Text-to-speech technology improves access to information for everyone, but is particularly useful to people with certain disabilities. Blind people, for example, can have the content of electronic documents read to them and those unable to speak can enter words that the computer will speak for them. Knowledge used to refine text-to-speech systems can also help researchers understand how people process language, which could help treat speech problems.

"Ideally, all research has a goal to better the world, but sometimes it is easier to see it clearly," says Adsett.