(no title)
stephensonsco | 7 years ago
There's ~4 types of audio:
Phone call - close microphone - conversational - low bandwidth audio - two way conversation - more industry specific terminology
Meetings - 2-5 people - conversational - far away mic - better bandwidth audio - more industry specific terminology
Broadcast - usually good diction - close mic - good bandwidth audio - more general terminology
Command&Control (saying to your phone: "go to <this address>") - close mic or array or mics far away - short audio chunks, 2-10 seconds - spoken in a way that makes it easier to recognize (learned behavior) - usually a lot of widely known named entities are said
In that full aggregated line up I bet we'd be in the 22-24% WER pack. That'd mostly be because we focus only on phone calls and meetings. We don't try to improve command&control/broadcast/podcast type yet. Broadcast because it's perceived as lower value (so customers tend not to pay for good recognition for it [we do train models to make them better for specific customers/verticals(usually a reduction of errors by 20-40%), but the buyer has to have a budget for it for now, but there are ways to make it cheaper in the long term]), command and control because you have to have a fleet of devices out in the field collecting data and driving use cases and we don't have customers there yet.
dumbfoundded|7 years ago
In terms of gathering data, I'm curious how to plan to get the 15K audio hours it takes to train each of these models. The most you want to segment it (like through acoustic environment or genders), the more data you need. Do you have a cheap way of generating high quality data?
stephensonsco|7 years ago
But we do utilize our capabilities to better tackle the wild data gathering and labeling. For instance, "is every labeled minute just as valuable as any other?". Definitely not. So if you can find and select only the data you want to label, rather than indiscriminately labeling a bunch, then you can increase your overall efficacy.
stephensonsco|7 years ago
We excel in phone call and meetings settings. I.e. the typical sales/office/support environment.