top | item 23752662

(no title)

binrec | 5 years ago

Decomposition of glyph sequences in phonetic transcription alphabets (e.g. IPA representations of phonemes) into phonological feature sets.

Existing attempts to solve this problem are hackish and difficult to customize: they typically treat each glyph as a set of features and handle diacritics and digraphs by naive composition and awkward special-casing. They also aren't written with an eye to customization in either alphabet or featural model: they typically map an ad-hoc extension of IPA to an ad-hoc featural model.

I think a natural improvement would be to develop a specification language in which each individual glyph (base character or diacritic) is a function from a feature set to a feature set, with Haskell-style pattern-matching to allow graceful handling of digraphs and context-sensitive diacritics - although syntactic sugar for digraphs would in practice be required for usability. Ideally it would also be possible to map feature sets to feature sets, in order to preserve a human-readable intermediate form (e.g. "unvoiced dental plosive") which is later mapped to the more customary binary features.

In addition to the utility for phonological databases and the like, this would also enable more rigorous testing of crosslinguistic feature sets: every feature set is implicitly a set of proposed linguistic universals and existence claims. If two segments have the same featuralization, they should never contrast in a given language; if they do, the featuralization is unsound. And if a featuralization proposes the existence of many contrasts that aren't attested anywhere, it could probably stand to be optimized.

But most of my interest in this comes from my work on a phonological database. The database needs some method of handling featuralization to facilitate feature-based search, and I just haven't seen a good way to do that yet.

discuss

No comments yet.