Skip to main content
  • About
    • Login
  • Resources
    • Researchers
    • Teachers
    • Community
  • Research & News
    • Press Releases
    • Cobiva News
  • Links to Other Sociolinguistic Corpora
  • Cite the Corpus
DIRECTORY
MAPS
MYUTRGV
UTRGV
The University of Texas
Rio Grande Valley
DIRECTORY
MAPS
MYUTRGV
News
GIVE
Open Search Menu
Menu
utrgv logo
Corpus Bilingüe del Valle CoBiVa
Program of Department of Writing and Language Studies
  • About
    ▼
    About  Square placeholder image 300px
    About

    CoBiVa is the first digital documentation of bilingualism in the Rio Grande Valley, preserving oral narratives on language and culture. It fosters linguistic diversity, supports research, and connects the university with the community.

    • Research Team
    • Login
  • Resources
    ▼
    Resources Square placeholder image 300px
    Resources
    • Researchers
    • Teachers
    • Community
  • Research & News
    ▼
    Research & News Square placeholder image 300px
    Research & News

    Explore CoBiVa Research & News, featuring publications, scholarly work, presentations, and posters related to bilingual language research and studies from the CoBiVa project

    • Press Releases
    • Cobiva News
  • Links to Other Sociolinguistic Corpora
  • Cite the Corpus
Screenshot of instructions on how to install R, More details and full image below.

Revising Stream Transcripts to WEBVTT with R

Hexagons
  1. Home
  2. Research & News
  3. News Archive
  4. Revising Stream Transcripts to WEBVTT with R

Revising Stream Transcripts to WEBVTT with R

Screenshot of text that reads: Revising .txt Transcripts with R  Step 1 & 2 (March 2021)  Ris a free software environment and programming language for statistical computing and data analysis. It is also great for data manipulation, including regular expressions for identifying, extracting, and replacing patterns in texts — a convenient tool for revising transcripts!  Setup  To begin, you will need to download two things: R and RStudio.  1. Download R for your system; open the file and install

During our project, funded in part by the National Endowment of the Humanities, Dr. Ryan Bessett, Dr. Ana Carvalho, and myself (Dr. Katherine Christoffersen) tested several technologically-aided transcription methods. We eventually found that Stream auto-generated transcripts were preferable based on accuracy, speed, and ease of use. While Stream generated transcripts with timestamps, it did not create the precise WEBVTT format required for time-alignment and clickable transcripts on the CoBiVa website.

During Summer 2020, Mr. Bart Rossman at the University of Arizona created a sample script for Atom which would revise the transcripts through several steps. This was one step in the right direction, but the multiple steps allowed for more errors. We also encountered some bugs working in Atom. Bart suggested that we look into R and work with Ms. Jessica Draper on an R script that would allow for a one-step process that could revise all the transcripts in a given file.

During Fall 2020, Ms. Jessica Draper created an initial script for the revision of Stream auto-generated transcripts to WEBVTT format. Since Stream does not identify and tag different speakers, this part needs to be done manually. Thus, we created a two step process. There is an initial revision of the transcript. Then, it is additionally revised after students insert speaker codes to identify when someone speaks (and who that speaker is).

During Spring 2021, Ms. Jessica Draper worked with us along with two research assistants, Ms. Isabella Calafate de Barros (University of Arizona, UofA) and Ms. Mayte Vega Mudy (University of Texas Rio Grande Valley, UTRGV) to test the script run through 30 total transcripts from students in experiential learning internship style classes at both campuses, taught by Dr. Katherine Christoffersen (UTRGV) and Dr. Ana Carvalho (UofA). This allowed for de-bugging of the script and the instructions. It worked very well, except if the students did not correctly insert the speaker codes or there were some other problem with formatting in a given file, it would not work.

We have provided the R script files and instructions below:

R Script to Change Stream Transcripts to WEBVTT Format (Step 1)

R Script to Change Stream Transcripts to WEBVTT Format (Step 2)

Instructions on Using R Script to Change Stream Transcripts to WEBVTT Format (Step 1 & 2)

You can also find this project listed on the COBIVA Github.

UTRGV
  • CARES, CRRSAA and ARP Reporting
  • Site Policies
  • Required Links
  • Accreditation Statement
  • Fraud Reporting
  • Senate Bill 18
  • Clery Act Reports
  • Web Accessibility
  • Mental Health Resources
  • Sexual Misconduct Policy
  • Reporting Sexual Misconduct
  • Contact UTRGV
  • 1-844-ATUTRGV
  • Facebook
  • X Twitter
  • Instagram
  • LinkedIn
  • Academics
  • Admissions
  • Cost and Financial Aid
  • Student Life
  • Search Programs
  • First-Year Students
  • Transfer
  • International Students
  • Graduate Students
  • Current Students
  • Faculty Resources
  • Staff Resources
  • UTRGV Careers
  • Visit Campus
  • Directory
The University of Texas Rio Grande Valley