Speed and Ease of Use for Technologically Aided Transcription Methods
Posted: Friday, September 10, 2021, 02:00 PM
During Spring 2021, we tested out 2 different transcription methods (Microsoft Stream and ExpressScribe) with students in internship style corpus development classes at UTRGV and UA. We here report the findings of both the speed and ease of use of these methods, funded in part by the National Endowment of the Humanities.
Based on preliminary findings on speed and ease of use during a pilot study with research assistants and evaluations of accuracy, SpeechNotes was no longer considered a viable option.
So, during Spring 2021, we tested these methods with two internship style corpus development classes, one at the University of Arizona led by Dr. Ana Carvalho and the other at the University of Texas Rio Grande Valley, under the direction of myself, Dr. Katherine Christoffersen. Students in these classes were trained in sociolinguistics methods and added to the project as research assistants after completing IRB training. Students conducted sociolinguistic interviews, and then they transcribed the interviews using the two transcription methods, starting with ExpressScribe (10 minutes each week for 3 weeks) and then editing the Microsoft Stream transcript (10 minutes each week for 3 weeks).
Research assistants created the initial transcripts in Stream and edited those transcripts using the Step 1 and Step 2 scripts in R studio developed by Ms. Jessica Draper as described on this blog post.
Overall findings show that the editing of the auto-generated Microsoft Stream transcript was substantially faster at 96.47 minutes of revision per 10 minutes of audio for Stream versus 148.29 minutes of transcription per 10 minutes of audio for ExpressScribe. (See graph below.)
For ease of use, the results were a bit more nuanced. The students recognized the distinct advantages and disadvantages of each method. For instance, students noted that Stream (3.0) was faster than ExpressScribe (2.9), although this difference was much smaller than the actual data on speed. (See graph above.) They noted that Stream (3.4) was more intuitive than ExpressScribe (3.2). Somewhat surprisingly, student ratings also showed that ExpressScribe was more accurate (2.9) than Stream (2.8) but only slightly so, and that Stream was better for bilingual data (2.5) than ExpressScribe (2.3). This may be due in part to the fact that many students mentioned that they appreciated that Stream's auto-generated transcripts included some accents, and students not in Spanish language courses (such as the UTRGV students) may have found this particularly helpful during the transcription process.
Overall, ExpressScribe was rated highest for both Ease of Use (3.5) and Overall Experience (3.3) but only slightly so. (See graph below.)
Yet, when asked which method they preferred, 25 students (75.8%) preferred Stream versus 8 students (24.2%) who preferred ExpressScribe. Many students suggested combining the methods and using ExpressScribe to revise the transcript. In fact, this was suggested as an option.
These findings support the conclusion that an effective workflow for community-based and student participation in technologically-aided transcription could be:
- Creation of auto-generated transcripts in Microsoft Stream by RA/professor
- Step 1 Revision of .txt transcript file using R (to remove extra lines)
- Students insert speaker codes and revise transcript
- Step 2 Revision of .txt transcript file using R (to organize transcript according to speaker)
However, these transcripts still require the additional time-intensive steps of anonymizing the transcripts and audio files using Audacity as well as a final check for formatting, spelling, grammar, and accents.
Thus, the team is now preparing to seek out additional funds to complete the time-intensive revision process in order to expand the corpus.