Joint Speech Recognition and Speaker Diarization via Sequence Transduction

1 · Google AI Research · Aug. 16, 2019, 6:19 p.m.
Posted by Laurent El Shafey, Software Engineer and Izhak Shafran, Research Scientist, Google Health Being able to recognize “who said what,” or speaker diarization, is a critical step in understanding audio of human dialog through automated means. For instance, in a medical conversation between doctors and patients, “Yes” uttered by a patient in response to “Have you been taking your heart medications regularly?” has a substantially different implication than a rhetorical “Yes?” from a physician...