June 2025

Can Fast Food Predict How a State Votes?

Linking 11 fast food franchise densities to 2024 presidential election outcomes — a logistic regression model hit 81% accuracy.

PythonPandasscikit-learnseabornGeoPandasBeautifulSoupJupyter

Walkthrough Video

Overview

For COGS 108 (Data Science in Practice), my team of five investigated whether the density of fast food franchises across U.S. states could predict voting outcomes in the 2024 presidential election. We collected location data for 11 major chains — from Arby's to Starbucks — normalized everything per 100,000 residents, and ran statistical tests and a logistic regression classifier against county-level election results. The model achieved 81% accuracy across 50 randomized cross-validation runs, with a p-value under 0.000001.

The headline: it isn't fast food density overall that correlates with voting — it's which chains are where. Arby's, Wendy's, Chick-fil-A, KFC, and Domino's cluster in Republican-leaning states. Dunkin' Donuts and Starbucks skew Democratic. McDonald's and Chipotle sit in the middle and don't predict much of anything.

Research question

Is there a relationship between the density of specific fast food franchises and state-level voting outcomes in the 2024 U.S. presidential election?

Team

Travis Henry — ML model, web scraping, walkthrough video
Aman Dhillon — data visualization, EDA write-up
Samuel Gonzalez — t-tests, p-values, graph generation
Devin Park — visualization, EDA, written conclusion
Jonathan Ty (me) — web scraping, dataset collection and upload

Background

Fast food is deeply embedded in American culture, but its significance extends beyond convenience. Campaign finance records consistently show Republicans and Democrats spend money at very different types of restaurants — GOP candidates favor McDonald's and Chick-fil-A while Democratic candidates tend toward Panera and Chipotle. These aren't coincidences. They reflect deeper cultural and economic identities that show up in voting behavior.

Fast food consumption isn't strictly a low-income phenomenon either. Research shows it actually peaks at middle income levels, then falls off among the highest earners — people in the top income group are 54.5% less likely to eat fast food than the lowest group. This complexity is part of why the data seemed interesting: franchise density encodes regional culture, economic patterns, and demographic identity all at once.

We focused on eleven chains that reflect a range of political, economic, and regional identities: Starbucks, Subway, McDonald's, Taco Bell, Domino's, Chick-fil-A, Arby's, Wendy's, KFC, Dunkin' Donuts, and Chipotle. Some of these (Chick-fil-A, Chipotle) have well-documented partisan associations in campaign spending data. Others are national staples with broad reach. Together they gave us a feature space rich enough to train a classifier on 50 data points — one per state.

Data

12 datasets, two collection methods

Getting the data required two approaches. Some chains had clean Kaggle datasets we could load directly — Wendy's, McDonald's (via a CSUN research table), Starbucks, and Subway. The rest required web scraping the chains' official location pages using Selenium and BeautifulSoup, since the data either didn't exist in structured form or was paywalled.

Dataset #1: 2024 County-Level Presidential Results (Kaggle, 3,088 rows) — aggregated to state level for modeling
CSV chains: Wendy's, McDonald's, Starbucks, Subway — cleaned and grouped by 2-letter state abbreviation
Scraped chains: Taco Bell, Domino's, KFC, Dunkin' Donuts, Chipotle, Arby's — parsed from each chain's official location pages
Population data: pulled from the TotalPop field in the election dataset for per-capita normalization

My contribution was building and running the web scraping pipeline for the six chains without accessible datasets. Each chain's site had a different structure — dropdowns, paginated tables, flat lists — which meant writing custom parsing logic for each one rather than a single reusable scraper.

Loading and cleaning the election data

Python

county_stats = pd.read_csv("2024_US_County_Level_Presidential_Results.csv")
county_stats = county_stats.dropna(
    subset=['state_name', 'votes_gop', 'votes_dem', 'total_votes']
)

# Aggregate from county level to state level
state_totals = county_stats.groupby('state_name').agg({
    'votes_gop': 'sum',
    'votes_dem': 'sum',
    'total_votes': 'sum'
}).reset_index()

state_totals['pct_trump'] = state_totals['votes_gop'] / state_totals['total_votes']
state_totals['pct_harris'] = state_totals['votes_dem'] / state_totals['total_votes']

# Drop ME and NE — they split electoral votes and complicate binary classification
state_votepct = state_totals[~state_totals['state_name'].isin(['Maine', 'Nebraska'])]

Loading a CSV-based chain (Wendy's)

Python

wendys_locations = pd.read_csv("wendys_restaurants.csv")

# Extract two-letter state code from the address2 column
wendys_locations['state'] = wendys_locations['address2'].str.extract(r'\b([A-Z]{2})\b')

valid_states = list(state_abbrev.values())  # 50 US state codes
wendys_locations = wendys_locations[wendys_locations['state'].isin(valid_states)]

wendys_locations = wendys_locations['state'].value_counts().reset_index()
wendys_locations.columns = ['state', "Wendy's"]
wendys_locations = wendys_locations.set_index('state')

Exploratory Data Analysis

2024 election results

Most states are not close calls. The vote share histogram shows a mostly bimodal distribution — states tend to cluster toward one candidate rather than landing near 50/50. The geographic map makes regional patterns visible: the South, Midwest, and Mountain West went Republican; the West Coast and Northeast went Democratic.

Fast food totals by state

Raw location counts predictably track population — California, Texas, and Florida lead by a wide margin. But raw counts tell us nothing about relative density. A state with 2,000 restaurants and 40 million people is very different from one with 400 restaurants and 500,000 residents.

Per-capita normalization

Normalizing to restaurants per 100,000 residents shifts the picture significantly. West Virginia, Mississippi, and Kentucky rank highest — the Southeast dominates. Some Northeastern states like Rhode Island and Connecticut also rank high despite small geographic size, likely due to higher urban density.

Python

fast_food_df = fast_food_by_state.rename_axis('State').reset_index(name='restaurant_count')
merged_df = pd.merge(fast_food_df, state_votepct[['State', 'TotalPop']], on='State')

merged_df['restaurants_per_100k'] = (
    merged_df['restaurant_count'] / merged_df['TotalPop']
) * 100_000

merged_df = merged_df.sort_values('restaurants_per_100k', ascending=False)

Statistical Analysis

Before building a model, we tested whether overall fast food density differed significantly between red and blue states. It didn't. The independent samples t-test returned a t-statistic of -0.103 and a p-value of 0.919. The Pearson correlation between density and vote margin was r = 0.139 with p = 0.337. No signal. The KDE plot below confirms it — the distributions for red and blue states almost completely overlap.

Python

red = merged_df[merged_df['vote_diff'] > 0]['restaurants_per_100k']
blue = merged_df[merged_df['vote_diff'] < 0]['restaurants_per_100k']

t_stat, p_val = ttest_ind(red, blue)
r, r_p = pearsonr(merged_df['restaurants_per_100k'], merged_df['vote_diff'])

# Results:
# t = -0.103, p = 0.9187  → no significant difference between red and blue states
# r =  0.139, p = 0.3366  → no significant linear correlation with vote margin

Total density doesn't predict voting. But when we looked at individual chains, the picture changed entirely.

Machine Learning Model

We built a logistic regression classifier where each state is one data point, the features are per-capita restaurant counts for all 11 chains, and the label is the 2024 winner. With only 50 states, cross-validation was essential. We ran 50 randomized 60/40 train-test splits to get a reliable accuracy estimate.

Python

state_votepct['winner'] = state_votepct.apply(
    lambda row: 'red' if row['pct_trump'] > row['pct_harris'] else 'blue', axis=1
)

# Normalize each chain per 100k residents
for col in food_df.columns:
    model_df[col] = (model_df[col] / model_df['TotalPop']) * 100_000

X, y = model_df[food_df.columns], model_df['winner']

accuracies = []
for seed in range(50):
    X_tr, X_te, y_tr, y_te = train_test_split(X, y, test_size=0.4, random_state=seed)
    clf = LogisticRegression(max_iter=1000).fit(X_tr, y_tr)
    accuracies.append(clf.score(X_te, y_te))

# mean(accuracies) ≈ 0.81
# one-sample t-test vs. 0.5 → t = 28.48, p < 0.000001

Performance

Mean accuracy across 50 runs: 0.81. A one-sample t-test against a 0.5 baseline (random chance) returned t = 28.48 and p < 0.000001. The model significantly and consistently outperforms chance. The confusion matrix shows it performs somewhat better at identifying Republican-leaning states than Democratic ones — likely because Republican states make up the majority of the training data.

Key Findings

Averaging the logistic regression coefficients across all 50 runs reveals which chains are actually doing the predictive work. Positive coefficients indicate association with Republican voting; negative with Democratic.

Republican-associated: Arby's, Wendy's, Chick-fil-A, KFC, Domino's — all significantly higher per-capita density in red states (p < 0.05 on t-tests)
Democratic-associated: Dunkin' Donuts, Starbucks — significantly higher per-capita density in blue states
No significant association: McDonald's and Chipotle — broadly distributed across both partisan leanings

Franchise-level distributions

The KDE plots below show the per-capita distribution of each chain in red vs blue states. For Arby's, Wendy's, and KFC, the separation between the two distributions is visually clear. For McDonald's, the overlap is nearly total — exactly what the coefficient plot suggested.

Limitations

State-level aggregation obscures a lot. Urban and rural areas within the same state vote very differently, and they have very different fast food environments too. Texas as a single data point papers over the gap between Houston and the Panhandle. A county-level analysis would likely surface stronger and more nuanced patterns — but matching county-level restaurant counts with county-level votes was outside the scope of this project.

The Modifiable Areal Unit Problem (MAUP) is real here: the patterns we see at the state level might look completely different at the county or ZIP code level. We're also working with 50 data points, which is enough to identify patterns but not enough to be confident about generalizability. Fast food density is shaped by income, urbanization, and regional infrastructure in ways that may fully explain the voting correlation without any direct causal relationship.

We're identifying correlations, not causes. The finding that Arby's per-capita count predicts Republican voting probably says more about the rural South and Midwest — where Arby's happens to cluster — than it does about Arby's specifically. That said, the signal is statistically significant and consistent across methodologies. Whether it's about the chains themselves or the places those chains call home is a question worth asking.