2020 Elections - President

By: Michael Dunphy
October 2020

Introduction

A dashboard of forecasts of the Presidential Election for 2020.

Visual can be found here: https://public.tableau.com/profile/michael.dunphy8764#!/vizhome/2020Elections-USPresident/MainDashboard?publish=yes

Data Source: https://en.wikipedia.org/wiki/2020_United_States_presidential_election

Below is the code used to collect and modify the data to produce the Tableau visual.

Code

The program makes use of several libraries including:

In [1]:
!pip install lxml
!pip install html5lib
import requests
import pandas as pd
from bs4 import BeautifulSoup
Requirement already satisfied: lxml in /opt/conda/lib/python3.7/site-packages (4.5.2)
Requirement already satisfied: html5lib in /opt/conda/lib/python3.7/site-packages (1.1)
Requirement already satisfied: six>=1.9 in /opt/conda/lib/python3.7/site-packages (from html5lib) (1.12.0)
Requirement already satisfied: webencodings in /opt/conda/lib/python3.7/site-packages (from html5lib) (0.5.1)
In [2]:
# Access Wiki page with forcasting data from Cook, IE, Sabato, Politico, RCP, Niskanen, CNN,
# The Economist, CBS News, 270toWin, ABC News, NPR, NBC News, 538, CNalysis
r = requests.get('https://en.wikipedia.org/wiki/2020_United_States_presidential_election')
root = BeautifulSoup(r.content)

find = root.find('table', class_="wikitable sortable")
li = pd.read_html(find.prettify())
ratings = pd.DataFrame(li[0])


# Ratings will be the main data set cleaned and used for the visual
ratings.head()
Out[2]:
State Electoral votes PVI [293] 2016 result Average rating as of Oct 6, 2020 Cook Sept 29, 2020 [294] IE Sept 4, 2020 [295] Sabato Oct 1, 2020 [296] Politico Sept 8, 2020 [297] RCP Sept 30, 2020 [298] Niskanen Sept 15, 2020 [299] CNN Sep 20, 2020 [300] The Economist Oct 4, 2020 [301] CBS News Aug 16, 2020 [302] 270toWin Sept 25, 2020 [303] ABC News Oct 6, 2020 [304] NPR Aug 3, 2020 [305] NBC News Aug 6, 2020 [306] FiveThirtyEight [d] Oct 2, 2020 [307] CNalysis Oct 6, 2020 [308]
0 Alabama 9 R+14 62.1% R Safe R Safe R Safe R Safe R Safe R Safe R Safe R Safe R Safe R Likely R Safe R Safe R Likely R Safe R Solid R Safe R
1 Alaska 3 R+9 51.3% R Likely R Likely R Lean R Likely R Likely R Likely R Tossup Safe R Likely R Likely R Likely R Lean R Likely R Likely R Likely R Likely R
2 Arizona 11 R+5 48.9% R Tossup Lean D (flip) Tilt D (flip) Tossup Tossup Tossup Likely D (flip) Lean D (flip) Tossup Tossup Tossup Lean D (flip) Tossup Lean D (flip) Lean D (flip) Tossup
3 Arkansas 6 R+15 60.6% R Safe R Safe R Safe R Safe R Safe R Likely R Safe R Safe R Safe R Likely R Safe R Safe R Likely R Safe R Solid R Safe R
4 California 55 D+12 61.7% D Safe D Safe D Safe D Safe D Safe D Safe D Safe D Safe D Safe D Likely D Safe D Safe D Likely D Safe D Solid D Safe D
In [3]:
# Data table of dates the columns were last updated on Wiki
updated = pd.DataFrame(ratings.columns.values[4:])

updated.to_csv("updated.csv", index= False)

updated.head()
Out[3]:
0
0 Average rating as of Oct 6, 2020
1 Cook Sept 29, 2020 [294]
2 IE Sept 4, 2020 [295]
3 Sabato Oct 1, 2020 [296]
4 Politico Sept 8, 2020 [297]
In [4]:
# Data Cleaning
ratings.columns = ["State", "Electoral Votes", "PVI", "2016 result", "Average Ratings", "Cook", "IE", "Sabato", 
                   "Politico", "RCP", "Niskanen", "CNN", "The Economist", "CBS", "270toWin", "ABC", "NPR", "NBC", 
                   "538", "CNalysis"]

ratings.drop(ratings.index[-1:], inplace= True) 

ratings.head()
Out[4]:
State Electoral Votes PVI 2016 result Average Ratings Cook IE Sabato Politico RCP Niskanen CNN The Economist CBS 270toWin ABC NPR NBC 538 CNalysis
0 Alabama 9 R+14 62.1% R Safe R Safe R Safe R Safe R Safe R Safe R Safe R Safe R Safe R Likely R Safe R Safe R Likely R Safe R Solid R Safe R
1 Alaska 3 R+9 51.3% R Likely R Likely R Lean R Likely R Likely R Likely R Tossup Safe R Likely R Likely R Likely R Lean R Likely R Likely R Likely R Likely R
2 Arizona 11 R+5 48.9% R Tossup Lean D (flip) Tilt D (flip) Tossup Tossup Tossup Likely D (flip) Lean D (flip) Tossup Tossup Tossup Lean D (flip) Tossup Lean D (flip) Lean D (flip) Tossup
3 Arkansas 6 R+15 60.6% R Safe R Safe R Safe R Safe R Safe R Likely R Safe R Safe R Safe R Likely R Safe R Safe R Likely R Safe R Solid R Safe R
4 California 55 D+12 61.7% D Safe D Safe D Safe D Safe D Safe D Safe D Safe D Safe D Safe D Likely D Safe D Safe D Likely D Safe D Solid D Safe D
In [5]:
# Used to count the number of groupings for each available state 
def get_count(row, prediction):
    count = 0
    column_names = ratings.columns.values[5:19]
    
    for col in column_names:     
        if prediction in row[col]:
            count += 1
    return count
In [6]:
# Used to get consensus ruling for the state of the race for that state 
def get_consensus(row):
    m = 0
    column_names = ratings.columns.values[20:30]
    
    for col in column_names:
        
        if row[col] >= m:
            m = ratings.iloc[index][col]
            prediction = col
            
    return prediction, m
In [7]:
# Adds numbers of groups to each state
safe_r = []
likely_r = []
lean_r = []
tilt_r = []
toss = []
tilt_d = []
lean_d = []
likely_d = []
safe_d = []

for index, row in ratings.iterrows():
    safe_r.append(get_count(row, "Safe R"))
                  
    likely_r.append(get_count(row, "Likely R"))
                    
    lean_r_c = 0 
    lean_r_c += (get_count(row, "Leans R"))
    lean_r_c += (get_count(row, "Lean R"))
    lean_r.append(lean_r_c)
                 
    tilt_r.append(get_count(row, "Tilt R"))
                  
    toss.append(get_count(row, "Tossup"))
                
    safe_d.append(get_count(row, "Safe D"))
                  
    likely_d.append(get_count(row, "Likely D"))
                    
    lean_d.append(get_count(row, "Lean D"))
                  
    tilt_d.append(get_count(row, "Tilt D"))

ratings["Safe R"] = safe_r
ratings["Likely R"] = likely_r
ratings["Lean R"] = lean_r
ratings["Tilt R"] = tilt_r
ratings["Tossup"] = toss
ratings["Safe D"] = safe_d
ratings["Likely D"] = likely_d
ratings["Lean D"] = lean_d
ratings["Tilt D"] = tilt_d

ratings.head()
Out[7]:
State Electoral Votes PVI 2016 result Average Ratings Cook IE Sabato Politico RCP ... CNalysis Safe R Likely R Lean R Tilt R Tossup Safe D Likely D Lean D Tilt D
0 Alabama 9 R+14 62.1% R Safe R Safe R Safe R Safe R Safe R Safe R ... Safe R 11 2 0 0 0 0 0 0 0
1 Alaska 3 R+9 51.3% R Likely R Likely R Lean R Likely R Likely R Likely R ... Likely R 1 10 2 0 1 0 0 0 0
2 Arizona 11 R+5 48.9% R Tossup Lean D (flip) Tilt D (flip) Tossup Tossup Tossup ... Tossup 0 0 0 0 7 0 1 5 1
3 Arkansas 6 R+15 60.6% R Safe R Safe R Safe R Safe R Safe R Likely R ... Safe R 10 3 0 0 0 0 0 0 0
4 California 55 D+12 61.7% D Safe D Safe D Safe D Safe D Safe D Safe D ... Safe D 0 0 0 0 0 11 2 0 0

5 rows × 29 columns

In [8]:
# Adds consensus ruling, max number of same groupings, and confidence of consensus ruling
prediction = []
m = []

for index, row in ratings.iterrows():
    p, em = get_consensus(row)
    prediction.append(p)
    m.append(em)

ratings["Consensus"] = prediction
ratings["Max"] = m
ratings["Confidence"] = ratings["Max"] / 15

ratings.head()
Out[8]:
State Electoral Votes PVI 2016 result Average Ratings Cook IE Sabato Politico RCP ... Lean R Tilt R Tossup Safe D Likely D Lean D Tilt D Consensus Max Confidence
0 Alabama 9 R+14 62.1% R Safe R Safe R Safe R Safe R Safe R Safe R ... 0 0 0 0 0 0 0 Safe R 11 0.733333
1 Alaska 3 R+9 51.3% R Likely R Likely R Lean R Likely R Likely R Likely R ... 2 0 1 0 0 0 0 Likely R 10 0.666667
2 Arizona 11 R+5 48.9% R Tossup Lean D (flip) Tilt D (flip) Tossup Tossup Tossup ... 0 0 7 0 1 5 1 Tossup 7 0.466667
3 Arkansas 6 R+15 60.6% R Safe R Safe R Safe R Safe R Safe R Likely R ... 0 0 0 0 0 0 0 Safe R 10 0.666667
4 California 55 D+12 61.7% D Safe D Safe D Safe D Safe D Safe D Safe D ... 0 0 0 11 2 0 0 Safe D 11 0.733333

5 rows × 32 columns

In [9]:
# Combines data to be used in visual
combined = ratings

combined.to_csv('combined.csv', index = False)

combined.head()
Out[9]:
State Electoral Votes PVI 2016 result Average Ratings Cook IE Sabato Politico RCP ... Lean R Tilt R Tossup Safe D Likely D Lean D Tilt D Consensus Max Confidence
0 Alabama 9 R+14 62.1% R Safe R Safe R Safe R Safe R Safe R Safe R ... 0 0 0 0 0 0 0 Safe R 11 0.733333
1 Alaska 3 R+9 51.3% R Likely R Likely R Lean R Likely R Likely R Likely R ... 2 0 1 0 0 0 0 Likely R 10 0.666667
2 Arizona 11 R+5 48.9% R Tossup Lean D (flip) Tilt D (flip) Tossup Tossup Tossup ... 0 0 7 0 1 5 1 Tossup 7 0.466667
3 Arkansas 6 R+15 60.6% R Safe R Safe R Safe R Safe R Safe R Likely R ... 0 0 0 0 0 0 0 Safe R 10 0.666667
4 California 55 D+12 61.7% D Safe D Safe D Safe D Safe D Safe D Safe D ... 0 0 0 11 2 0 0 Safe D 11 0.733333

5 rows × 32 columns

Dashboard created by: Michael Dunphy (he/him/his)

Twitter Github Tableau