OVERVIEW
What is this project about?
There are two key aspects that contributed to the selection of this project. The first being that to many Americans, the outcome of the 2016 presidential election came as a surprise. In an attempt to understand how the election outcome had been so poorly predicted, many integral factors in the election such as voter participation rates, candidate success and failure in key states, the electoral college, the possibility of foreign intervention, and many more factors have been heavily investigated and researched. The second factor is that the news has been a highly contested topic lately. The sitting president and the former presidential candidate

who, to many Americans, unexpectedly won, is very critical of many news sources. Therefore, to contribute to this ongoing research and investigation of the 2016 presidential election and to hopefully illuminate the possibilities for the outcome of the 2020 presidential election, this project examines the impact of local newspaper reporting and their word choices on the outcome of the presidential election. ​
​
Specifically, this project attempts to answer the following research questions:
How big of an effect does local news reporting have on presidential elections?
-
What topics are of most interest to each state?
-
Do these interests reflect why certain candidates were more popular in each state?
-
Was the outcome of the election consistent with these concerns?
-
If there were specific topics unique to a state, why?
Why is this project significant?
Given the heated political atmosphere in America, politics are becoming more and more extreme in language and ideology. While news media outlets are supposed to keep the government accountable, articles and reports nowadays typically contain heavily sentimental vocabulary that attempt to persuade readers to side with certain issues. However, without an objective source of information, citizens are finding it harder and harder to obtain their political or social knowledge through filtered speech that is objective and not emotionally charged. Readers need to understand how their thoughts are being affected by possibly unverified sources in order to make informed decisions for their in the 2020 presidential election. Though the results and trends of this project can't be generalized, it will hopefully give the reader some insight on how social issues and news language can predict or correlate to each other. The analysis of word frequency and uniqueness in this field is an intervention in the field of political psychology, and could be useful in predicting citizens' preferences in the upcoming 2020 election.
METHODOLOGY
Who is this project for?
Are you interested in linguistic analysis, the effect of news outlets on politics, or curious about the 2016 presidential elections in general? If you wish to know the answers to any of these questions, please read on. Though the effect of social media on campaigns has been commonly examined, there doesn't seem to be an interactive way for citizens to be informed about how language in news articles pertains to perceptions on social issues, and vice versa.
How did we get our data?
We were originally going to use the Web Scraper chrome extension to gather the abstracts of the newspaper articles from the publisher’s Facebook page (e.g. The New York Times and Los Angeles Times), but we discovered that there were a number of issues with this method. The first was that we wanted to scrape data continuously, meaning that for every consecutive post made within a particular time frame, we wanted to extract the text from that post, analogous to a looping mechanism in code. However, the Web Scraper chrome extension didn’t have a feature that allowed us to scrape the text in the specific html element we selected continuously. We could have tried supplementing the scrapping of the chrome extension with a Python script or Python code to get the data we wanted, but we decided against because we weren’t sure how well it work, how long it would take for us to figure it out, and how much cleaning would be left once we had gathered the data. Given our many uncertainties and time frame, we decided to explore other methods. We ultimately choose to use LexusNexis, a library database, since we could easily get relevant newspaper articles by refining our searches using the filtering functionality. We then used Voyant Tools to analyze the corpus. Voyant Tools was a particularly good analyzing method for this project, since we are interested in the relationships and colorations between word usage, which is perfectly explored and visualized through the word map on Voyant Tools.
DATA SELECTION
How did we clean, select, and analyze our data?
To get the most relevant search results using LexusNexis, we set the date range as Jun 01, 2015 to Nov 08, 2016, since June 1st was the day the first presidential campaign was announced and November 8 was election day, which helped us capture all news reported building up to the election. Capturing the build up to the election was most relevant since we were trying to understand the ideas and discussions in the news that may have influenced voting behavior on election day. To further filter the news articles, we choose to filter by the region “North America,” then by the country “United States,” and by the language “English”. We filtered by the source “newspaper” and the publishing location by six states (we did six separate searches for each state) California, Texas, New York, Pennsylvania, Florida, and Georgia. The search team we used was: (presidential or president) and election, since we were interested in the 2016 presidential election. We made these choices because we only wanted data about the American presidential election, because English is the primary and official language of the U.S., and because newspapers are probably the most academically written yet accessible new sources available today. We choose the particular states because they represent the political spectrum that exists in the United States and have large enough populations that they can represent the U.S. as a collective whole. Of the six states, New York and California were chosen for their consistent blue-leaning voting patterns and large populations, Texas and Georgia were chosen for their consistent red-leaning voting patterns and large populations, and Florida and Pennsylvania were chosen for their tendency to unpredictably vote red or blue. The selection tools on LexusNexis retrieved very clean data, to clean the data, we made changes to the format of the word documents to eliminate any redundancies or information that was irrelevant (e.g. headers, images, etc.).
​
With the cleaned documents, we combined the search results by state (LexusNexis could only retrieve 250 results at a time, so we ran four searches for each state). We had a document of the search results for each state (New York, California, Texas, Florida, Pennsylvania, and Georgia). We also had created three combined documents, each of which had the combined results of the two states with the similar political leanings. Lastly, we created a document that combined all the results of all the states. We did this so that we would individual state corpuses of data, ideological corpuses, and a national corpus. With these three separate corpuses, we could then do separate analysis and three different levels: state, ideological, and national, to understand the differences and similarities in between them.
​
We took all the corpuses and individually uploaded them to Voyant Tools to analyze them and identify unique words using TF - IDF scores. We added the following stop words: election, elections, elected, president, presidential, day, year, times, state, news, tribune, and years, since we believe that they don't offer or add any substantial meaning to our analysis (or the words are very generic and appear so frequently their significance becomes diminished) . Using the world cloud for visualization and the significance column, we found the words of interest by state, ideology, and nation. We analyzed and understood the use and significance of the words using the context functionality in Voyant Tools, and reading or skimming the articles in which the words appeared. We supplemented our reading with Google searches of the words to further understand the possible use of them and importance in that particular state or ideology.
“Every four years in the presidential election,
some new precedent is broken.”