1 Introduction

“Fake news” is a term that has risen to the forefront of the national conservation over the past few years. According to a 2019 Pew Research Center survey,1 Americans rate fake news as a larger problem than violent crime, climate change, and racism (among other categories). In addition—according to the same survey—68% of Americans say that fake news has a big impact on their confidence in government. At its core, fake news is simply defined as news stories that are false or fabricated. Fake news stories can be “propaganda that [are] intentionally designed to mislead the reader” or “clickbait” stories that are designed for “economic incentives.”2

Due to the issue of fake news, verifying the validity of news articles (fact-checking) has become an increasingly important job for news network. This can be seen in the last 2020 presidential debate between Joe Biden and Donald Trump, where multiple news sources (e.g. the New York Times3) reported on the validity of each candidate’s claims. Websites such as Snopes4 and PolitiFact5 do this fact-checking on a daily basis. The dataset used in this project comes from the fact-checking website PolitiFact (see Section 3).

However, fact-checking is not perfect. There exist limitations from fact-checking websites like PolitiFact, such as confirmation bias: “people may not be likely to fact-check a story that aligns with their pre-existing beliefs.”6 Additionally, manual fact-checking is a time-intensive process. Thus, there is motivation to provide another source of fake news assessment: a statistical model that aims to predict fake news using Natural Language Processing. Designing a model that does so is the goal of this project.