Not an expert, but my suspicion is that the camera following lips can add an extra streaming data point making transcription accuracy much higher even at low volumes. Again a hunch and I guess the computational power and battery needs might still be insurmountable
No comments yet.