Our goal here is to build a machine learning agent that can predict which subreddit an unlabeled post comes from. We aim to acheive this via implementing various machine learning algorithms and techniques such as Classification (Binary and Multiclass) and Deep Learning. We then compare and contrast the various algorithms and deduce which is the best for the job. We document our findings in our final report.
Each data record is naturally a reddit post. We choose any two subreddits as our target classes for the posts. We split the data records at around 1000 for each target class.
The posts are fetched using Reddit's PRAW API.
- Naive Bayes
- Random Forest
- Logistic
- LSTM (Long Short-Term Memory)
- We compare accuracies and generate ROC Curves and Learning curves for a deeper analysis
- Feel free to dive into the Report pdf for more details.