9

YouTransfer

Text Style Transfer project of YouTube creators

yt

For this project, I had the pleasure of collaborating with Data Science Graduate students to develop a Text-Style Transfer model that was targeted at YouTube videos. Due to my Software background, my key focus was with developing the infrastructure of the project, specifically setting up the pipeline to get data from the YouTube API for training.

Due to time constraints, we developed all of our code just within Jupyter Notebooks and did not move to a different deployment environment- this did the job!

Skills used

Python, NLP, scrapetube, AdversarialVAE, Libraries (pandas, nltk, torch, gensim, numpy, Python 3.x, Youtube_transcript_api, PyTorch, Transformers

Results

While much of the formal process we went through for recognition of this project was botched, we were colloquially commended on the project for going above and beyond the necessary scope, as well as preparing a strong presentation

My contributions

  • Collaborated with team of Graduate and Undergraduates, overseeing Data Collection and Preprocessing
  • Leveraged the YouTube Data API as well as the formal YouTube API to efficiently collect video transcripts
  • Implemented Word2Vec model using gensim library for data preparation and to make word embeddings
  • Preprocessed the collected data, performing text cleaning, lowercasing, tokenization, and removal of stopwords, using Python with the NLTK library to enhance pytorch model performance downstream