Precisión de los Métodos de Transcripción Asistida Tecnológicamente
Publicado: Lunes, 15, marzo 2021, 03:00 p.m.
Following our review of technologically-aided transcription methods and preliminary analysis of speed and ease of use, we analyzed the accuracy of two such those methods: auto-generated transcription (Microsoft Stream) and audio from speakers (SpeechNotes). This is part of the National Endowment of the Humanities grant-funded project "Bilingual Voices in the U.S./Mexico Borderlands: Technology-Enhanced Transcription and Community Engaged Scholarship."
The auto-generated transcription method (Stream) is loaded onto an online website, and it creates a transcript in one language. The user may be present, or may log off the platform and return later to download and/or edit the transcript on the website platform. The audio from speakers method (SpeechNotes) requires that the individual play the audio from the speakers. In our experience, the individual must pause the audio at times to allow the transcription to catch up. (Learn more about these transcription methods here.)
In analyzing accuracy, we looked at the three main issues- missing words, incorrect words, and accents. We additionally considered punctuation, speaker codes, and capitalization, but we eliminated those from our accuracy measures, since those are more specific to our WEBVTT format.
The chart above shows the words that were missing from the transcript created by these different programs. You can see that Auto-Generated Stream does much better than SpeechNotes in blue.
In terms of incorrect words that were captured and transcribed incorrectly, this was more of a problem for the auto-generated method (Stream). However, these two categories influence one another. For instance, in Interview 4, SpeechNotes missed 285 words, only captured 15 words, and of those, 5 were incorrect.
The transcripts had very few mistakes with accents, amounting to 1-7 per interview.
Accounting for each missed word, incorrect word, and accent mistake out of that interview’s total words in the correct transcript, we derived accuracy rates by method per interview. As we can see here, the auto-generated method (Stream) is outperforming SpeechNotes, where the audio was captured through the speakers. We can see that SpeechNotes transcripts were only 5%-56% accurate, while auto-generated (Stream) was 44%-86% accurate.
Averaged together across interviews, we have an average accuracy rate of 26% for Audio from Speaker (SpeechNotes) and 65% for auto-generated (Stream).
As a next step in the process, we are currently piloting the use of auto-generated transcription revision using Stream compared to manual transcription using ExpressScribe with two experiential learning corpus development courses at the University of Arizona and the University of Texas Rio Grande Valley. We will then compare the average speed and ease of use among these students for these methods.