Skip to main content
  • About
    • Login
  • Resources
    • Researchers
    • Teachers
    • Community
  • Research & News
    • Press Releases
    • Cobiva News
  • Links to Other Sociolinguistic Corpora
  • Cite the Corpus
DIRECTORY
MAPS
MYUTRGV
UTRGV
The University of Texas
Rio Grande Valley
DIRECTORY
MAPS
MYUTRGV
News
GIVE
Open Search Menu
Menu
utrgv logo
Corpus Bilingüe del Valle CoBiVa
Program of Department of Writing and Language Studies
  • About
    ▼
    About  Square placeholder image 300px
    About

    CoBiVa is the first digital documentation of bilingualism in the Rio Grande Valley, preserving oral narratives on language and culture. It fosters linguistic diversity, supports research, and connects the university with the community.

    • Research Team
    • Login
  • Resources
    ▼
    Resources Square placeholder image 300px
    Resources
    • Researchers
    • Teachers
    • Community
  • Research & News
    ▼
    Research & News Square placeholder image 300px
    Research & News

    Explore CoBiVa Research & News, featuring publications, scholarly work, presentations, and posters related to bilingual language research and studies from the CoBiVa project

    • Press Releases
    • Cobiva News
  • Links to Other Sociolinguistic Corpora
  • Cite the Corpus
picture of a graph about Accuracy of Technologically-Aided

Accuracy of Technologically-Aided Transcription Methods

Hexagons
  1. Home
  2. Research & News
  3. News Archive
  4. Accuracy of Technologically-Aided Transcription Methods

Accuracy of Technologically-Aided Transcription Methods

Following our review of  technologically-aided transcription methods and preliminary analysis of speed and ease of use, we analyzed the accuracy of two such those methods: auto-generated transcription (Microsoft Stream) and audio from speakers (SpeechNotes). This is part of the   National Endowment of the Humanities  grant-funded project "Bilingual Voices in the U.S./Mexico Borderlands: Technology-Enhanced Transcription and Community Engaged Scholarship."

The auto-generated transcription method (Stream) is loaded onto an online website, and it creates a transcript in one language. The user may be present, or may log off the platform and return later to download and/or edit the transcript on the website platform. The audio from speakers method (SpeechNotes) requires that the individual play the audio from the speakers. In our experience, the individual must pause the audio at times to allow the transcription to catch up. (Learn more about these transcription methods transcription methods.)

In analyzing accuracy, we looked at the three main issues- missing words, incorrect words, and accents. We additionally considered punctuation, speaker codes, and capitalization, but we eliminated those from our accuracy measures, since those are more specific to our WEBVTT format.

Bar chart showing missing words in four interviews for two different transcription methods, which are 'Audio from Speakers (SpeechNotes)’ and 'Auto-Generated (Stream)'. Interview 1 has 98 missing words from Audio from speakers, and 15 from auto-generated. Interview 2 has 223 Audio from speakers, and 30 auto-generated. Interview 3 has 194 from audio from speakers, and 112 from auto-generated. Interview 4 has 285 from audio from speakers, and 110 from auto-generated. Audio from speakres (SpeechNotes) consistently has more missing words than 'Auto-Generated.' Full explanation provided below the image

The chart above shows the words that were missing from the transcript created by these different programs. You can see that Auto-Generated Stream does much better than SpeechNotes in blue.

Bar chart showing missing words in four interviews for two different transcription methods, which are 'Audio from Speakers (SpeechNotes)’ and 'Auto-Generated (Stream)'. Interview 1 has 30 missing words from Audio from speakers, and 23 from auto-generated. Interview 2 has 13 Audio from speakers, and 44 auto-generated. Interview 3 has 26 from audio from speakers, and 18 from auto-generated. Interview 4 has 5 from audio from speakers, and 57 from auto-generated. Audio from speakres (SpeechNotes) has more missing words in interview 1 and 3, but fewer in interviews 2 and 4. Full explanation provided below the image

In terms of incorrect words that were captured and transcribed incorrectly, this was more of a problem for the auto-generated method (Stream). However, these two categories influence one another. For instance, in Interview 4, SpeechNotes missed 285 words, only captured 15 words, and of those, 5 were incorrect.

The transcripts had very few mistakes with accents, amounting to 1-7 per interview.

Bar chart comparing transcription accuracy rates for audio from speakers (SpeechNotes) and auto-generated (Stream) across four interviews. For Interview 1, SpeechNotes achieved 56.35% accuracy, while Stream reached 85.99%. For Interview 2, SpeechNotes was at 14.80%, compared to Stream’s 72.56%. For Interview 3, SpeechNotes recorded 26.07% accuracy, while Stream reached 55.85%. Lastly, for Interview 4, SpeechNotes had 5% accuracy, while Stream achieved 44.33%. Full explanation provided below the image

Accounting for each missed word, incorrect word, and accent mistake out of that interview’s total words in the correct transcript, we derived accuracy rates by method per interview. As we can see here, the auto-generated method (Stream) is outperforming SpeechNotes, where the audio was captured through the speakers. We can see that SpeechNotes transcripts were only 5%-56% accurate, while auto-generated (Stream) was 44%-86% accurate.

Bar chart comparing the average transcription accuracy between the two methods : Audio from Speakers (SpeechNotes) has 25% (0.2566) and Auto-Generated (Stream) 64% (0.6469). Auto-Generated is more accurate. Full explanation below. Full explanation provided below the image.

Averaged together across interviews, we have an average accuracy rate of 26% for Audio from Speaker (SpeechNotes) and 65% for auto-generated (Stream).

As a next step in the process, we are currently piloting the use of auto-generated transcription revision using Stream compared to manual transcription using ExpressScribe with two experiential learning corpus development courses at the University of Arizona and the University of Texas Rio Grande Valley. We will then compare the average speed and ease of use among these students for these methods.

UTRGV
  • CARES, CRRSAA and ARP Reporting
  • Site Policies
  • Required Links
  • Accreditation Statement
  • Fraud Reporting
  • Senate Bill 18
  • Clery Act Reports
  • Web Accessibility
  • Mental Health Resources
  • Sexual Misconduct Policy
  • Reporting Sexual Misconduct
  • Contact UTRGV
  • 1-844-ATUTRGV
  • Facebook
  • X Twitter
  • Instagram
  • LinkedIn
  • Academics
  • Admissions
  • Cost and Financial Aid
  • Student Life
  • Search Programs
  • First-Year Students
  • Transfer
  • International Students
  • Graduate Students
  • Current Students
  • Faculty Resources
  • Staff Resources
  • UTRGV Careers
  • Visit Campus
  • Directory
The University of Texas Rio Grande Valley