Normalization of Non-Standard Words in Croatian Texts

03/27/2015
by   Slobodan Beliga, et al.
0

This paper presents text normalization which is an integral part of any text-to-speech synthesis system. Text normalization is a set of methods with a task to write non-standard words, like numbers, dates, times, abbreviations, acronyms and the most common symbols, in their full expanded form are presented. The whole taxonomy for classification of non-standard words in Croatian language together with rule-based normalization methods combined with a lookup dictionary are proposed. Achieved token rate for normalization of Croatian texts is 95 form.

READ FULL TEXT

Please sign up or login with your details

Forgot password? Click here to reset