4 min read

A Simple Computational Method Applied to Mail-In Ballot Data of 2020 Presidential Election

This is an instructional on how to use a computational method on downloaded mail-in ballot data.
A Simple Computational Method Applied to Mail-In Ballot Data of 2020 Presidential Election

I am not writing this for programmers or academics specifically. I am writing this for all American citizens. I am going to show a simple computational program applied to the Pennsylvania mail-in ballot data for the 2020 Presidential Election.

First, I will show precisely how an average individual can use this program for a different state’s mail-in ballot data. A user will need a text editor (I strongly recommend Sublime Text); a program to run code (RStudio or R, which is free); a script for that program (which I provide below. Just copy and paste it into your RStudio console.); and a dataset to download. If a dataset from a state other than Pennsylvania is downloaded, there may be a different number of columns and/or the columns may represent different dates or other criteria. A user will need to alter the numbers of the columns in the program accordingly. For the Pennsylvania 2020 mail in ballot dataset:

The 2nd column represented the voter’s registered party at the time of voting.

The 5th column represented when an applicant filed for a ballot.

The 6th column represented when an applicant was approved.

The 7th column represented when an applicant received the ballot.

The 8th column represented when an applicant returned the ballot.

I thought it would be suspicious for an applicant to be able to receive a ballot in the mail, fill it out, and then return it to the designated office on the same day, so I wrote a script in R that would count each row where those two dates (the 7th and 8th columns) were identical. My result was 49,226. Furthermore, I believed it to be even more suspicious for the entire process to occur on the same date, so I switched the 7 with a 5to compare the 5th (the date an applicant filed for approval for a ballot) and the 8th columns and ran the script again. My result was 24,192. I also returned the affiliated party for every row where the specified columns were identical, but I am not sure if that information is useful. If a user does not want that returned, delete the “return ElecData[i, 2]line of code.

The dataset itself was 3.079 million rows. Below are the steps to using the program.

  1. Download the desired data set as a .csv file (a comma separated list). The tiktok user gregontheright was the first I know of to try this analysis method, but he used spreadsheets. Consequentially, he had to download the dataset as a .xls file, and could only download a third of the dataset. Furthermore, to simplify things, delete the header of the dataset, which is everything at the top that does not represent the first row of data (titles of each column, overarching title of the dataset, etc.)
  2. Open up the file in your text editor.
  3. Copy and paste my program into RStudio’s document or onto the console and change the numbers (the 7 and/or the 8 in the first line of the for loop) for each column, so that the appropriate columns for the downloaded dataset are compared.
  4. Run the first line in the console, which will open the default directory that Rpulls from. The user will then have to find the .csv file that was downloaded and click it. If it is not found, move it to the correct directory.
  5. Run the 2nd line in the console, which saves the .csv file as a special array that does not recognize the first line as a header, and recognizes the rows as strings so that we can compare the columns in the for loop.
  6. Run the rest of the code by copying and pasting onto the console.
  7. If any error occurs, read the error in the console and respond accordingly. Many online forums can help if the user will copy and paste the error and ask for advice.
  8. Most likely, there will be a hundred rows or so that have errors in them. The program will stop on those lines and return an error about array size. It will list the line of the file that the error occurred. Go to your text editor (Sublime Text), click ctrl + g, then type in the line that returned an error, delete (or fix) that line, save, open up R again, and repeat from 3) or 1) depending on which version of R that is installed.
  9. When all faulty lines are deleted (I chose not to insert a function that would simply skip faulty lines for simplicity, because the user would also need to upload a certain library into R along with dealing with a bit more code), the program will iterate through the entire file and return your results.

The user may want to compare different dates/columns from their dataset. All that would be needed at this point is to change the number in the first line (the 7 and/or the 8) to compare the desired columns.



My Rstudios script (do not include this line):

pullfile = file.choose()

# search for file MyFile

ElecData = read.csv(pullfile, header = FALSE, stringsAsFactors = FALSE)

# create an object (an array) called ElecData from the file MyFile

j = nrow(ElecData)

# We will iterate from zero to the last row of the array, where j is the (last) # number of rows

n = 0

# n is the number of data points that are returned. Initially, we have zero.

# i is the iterating variable. We have to initialize it here and we will be

# starting at 1.

# The if/else statement compares the 7th and 8th columns of the ith row

# The for loop iterates through each row i where the if/else statement will act

# If rows 7 and 8 are equivalent, we add 1 to n. Otherwise (else) n = n.

# After the first run, change the 7 for a 5 in order to compare columns 5 and 8

for (i in 1:j)

{ if(all(ElecData[i, 7] == ElecData[i, 8]))

{

return ElecData[i, 2]

n = n + 1

}

else

{

n = n

}

}