Listen

See examples of Auto-Tunes in Action

Visit the Demo

Motivation

Attempting to predict the outcomes of athletic contests is a process as old as sports themselves, but up until now, most models relied on hard statistics about the players and teams involved without taking into consideration how the fans might play a part in the process. Our goal as a team was to see if utilizing data from social media, specifically Twitter, could help a neural network predict the outcomes of NFL football games. Harnessing ‘the wisdom of the crowds’ is not a new idea and has been used to help other models predict major events like stock market fluctuations and disease outbreaks, but we believe it could hold important insights into how a team will perform in a contest whether its a direct result of their fan’s support through motivation, or a proxy measure of overall performance game to game.

Technical Overview

Our research combines two previous predictive models into one. We started with an artificial neural network specifically designed to use NFL team statistics to predict game outcomes, using the same statistics and network structure. We then used tweets from a different model to create two new features for our network; tweet rate, which was a borrowed metric from the twitter model that measured the change in total tweets about a team week to week, and crowd sentiment, our very own metric which used the average polarity (positive or negative attitude) of all fan tweets about a given team each week to create a single score.

Development/Testing

The bulk of the overhead development came in the form of data collection and cleaning. We used the python package sportsreference to gather the NFL statistics used to create the majority of the features for our model. As for the tweets, we used the python package tweepy to pull tweets from a list of tweet IDs provided by the authors of the predictive twitter model. We then used VADER to perform the sentiment analysis. The neural network was developed and tested using pytorch lightning through Google Colab.

Testing

To test the data, we used the MIR-QBSH dataset, which was created by J Stephen Downey. This dataset had both positives and negatives for us. It was good because it was a large dataset of many normal people trying to hum melodies. This allowed us to see what would happen when someone did not quite hit the right pitch as well as tuning different parts of our code to humans humming. The negative was that the humming would not start immediately when the recording started but the ground truth midi file did. This meant that the notes of our midi representation and the ground truth representation did not line up.

Results

When we ran our algorithm over the entire dataset we achieved 14% accuracy, but when we ran the data on individual audio files and were able to line up the audio by hand the accuracy ranged from 25-40%. Additionally when listening back to the midi files our function created, there were always errors, but the melody was always easy to distinguish. If we had more time to work on the project we could more accurately test the data by either creating our own dataset or altering the dataset that we used to have the melody begin at the beginning of the wave file for every recording.

Resources

  • MIR-QBSH (our dataset)
  • Librosa (for pitch tracking and onset/offset)
  • Numpy (for arrays)