01 · Pipeline From FEC bulk file to pixel, in five stages.
Every figure and node you see starts life as a row in one of the
FEC's public bulk files. Five deterministic stages turn those
rows into the entities, flows, and graphs on this site. No
stage is manual; the whole thing re-runs nightly.
01
Pull
FEC bulk ZIPs via official manifest
02
Parse
Pipe-delimited → typed Parquet
03
Resolve
Names, addresses → stable IDs
04
Link
Build transaction edges
05
Serve
Indexed reads, static JSON cache
!
Why this matters: every panel on the site traces back
to an FEC filing ID and a line number. If a number looks
wrong, you can click through to the original row.
02 · Sources Only public FEC bulk files. Nothing else.
We pull from the Federal Election Commission's bulk data
manifest. That's the same feed the FEC's own site uses. We
do not buy vendor lists, scrape campaign sites, or accept
tips. If it isn't in a bulk file, it isn't here.
pas2.zip
PAC & committee contributions
Contributions to candidates from other committees (PACs, parties, joint-fundraisers).
refresh · nightly~5M rows / cycle
oth.zip
Other committee transactions
Transfers, refunds, loans, and non-contribution receipts between committees.
refresh · nightly~8M rows / cycle
oppexp.zip
Operating expenditures
Disbursements by committees — the vendor side: consultants, ads, payroll, travel.
refresh · nightly~40M rows / cycle
cm.zip
Committee master
Registry of every active committee: ID, type, treasurer, address, designation.
refresh · weekly~35k rows
cn.zip
Candidate master
Registered candidates with office, party, district, linked principal committee.
refresh · weekly~8k rows
Source of record: fec.gov / data / browse-data.
Each file we ingest is hashed on download; the hash is kept with the row so you can prove the provenance of any figure.
03 · Ingest Refreshed every night at 02:00 ET.
The FEC posts updated bulk files roughly daily during filing
season and weekly off-season. We poll their manifest at
02:00 America/New_York, download anything whose
hash changed, and run the pipeline end-to-end. The site gets a
new cache at roughly 03:40 ET.
Filings submitted on paper (mostly Senate committees pre-2018)
are processed by FEC staff and can lag the electronic feed by
several weeks. We flag these with a paper-lag
marker on the entity page.
What "last synced" means on an entity page
The timestamp shown on any detail page is the moment our
pipeline finished writing that entity's record, not the moment
it was filed with the FEC. The filing date is shown
separately on each transaction row.
04 · Resolution Matching the same vendor across millions of rows.
Every FEC-registered committee already carries a stable ID.
The harder job is vendors: they don't file with the FEC,
so "Acme Media LLC" on one disbursement and "ACME MEDIA" on
another need to be recognized as the same entity. We build
that ID mechanically.
R1
Normalize first, match second. Vendor names are lowercased, stripped of punctuation, and cleared of common suffixes (LLC, Inc, Co). Addresses are run through USPS AMS conventions.
R2
Block by ZIP + normalized name. Candidate matches are only considered within the same 5-digit ZIP and normalized name root. This keeps pairwise work bounded.
R3
Score with a weighted edit distance. Name root, street number, and city get the most weight. A threshold of 0.82 merges; between 0.72 and 0.82, we surface it for review.
R4
Committees are matched by FEC ID only. Every registered committee already has a stable C0… identifier. We never fuzz-merge committees — a new ID is a new entity.
R5
Vendors use the address hash. Vendors don't file with the FEC. We cluster operating expenditures by (normalized-name, normalized-address) and treat that hash as the vendor's ID.
i
Precision over recall. When in doubt, we keep two
entities separate rather than merge them incorrectly. You'll
occasionally see the same vendor listed twice — that's the
trade-off.
05 · Classification Every committee gets one party label. Here's how.
The party color on every node comes from a single classification
pass. The rule order matters — the first rule that fires wins.
1
Explicit FEC party code. If the committee master lists DEM or REP, we use it verbatim. Minor parties become other.
2
Candidate committees inherit. A candidate's principal committee takes the candidate's registered party.
3
PACs labeled by beneficiary flow. For un-coded PACs, we look at the last 8 quarters of outgoing contributions. ≥80% to one party → that party. Between 20% and 80% → bipartisan.
4
Vendors never get a party. Vendors receive money from many committees — a vendor is unaligned by definition, even if 95% of revenue is from one side.
06 · Connections How we represent connections between entities.
Money flows form a directed multigraph: entities
(committees and vendors) are connected by rolled-up
flows over a chosen window.
Edge construction
Two entities are connected when at least one FEC transaction
flows between them in the selected date range. Multiple
transactions are summed into a single weight; the underlying
rows are still queryable from the detail panel.
Edge direction
Direction follows the money: source → target
means the source committee disbursed to the target. Refunds
flip direction and show as a negative-weighted back-edge.
Weighting
Connection weight is total dollars through the edge in the
window. Aggregate totals on entity pages use the log
of through-flow so the biggest committees don't dominate by
five orders of magnitude.
07 · Coverage What we have and don't have.
Political Flow covers federal elections only. Coverage varies
by filing type — here's the complete picture.
| Filing type | Since | Rows | Status |
| Committee master (PACs, parties, campaigns) | 1980 | 35k | Complete |
| Candidate master | 1980 | 8.2k | Complete |
| PAC & committee contributions | 1980 | 64M | Complete |
| Operating expenditures | 2004 | 412M | Complete |
| Independent expenditures | 1980 | 1.8M | Complete |
| Senate paper filings | 1980–2018 | — | Partial · paper-lag |
| Dark-money 501(c)(4) spend | — | — | Not in FEC scope |
| State & local elections | — | — | Out of scope |
×
Known gap: "dark money" spending by 501(c)(4) groups
on issue ads is only visible when those groups file
independent expenditure reports with the FEC. A large
fraction of influence spending is genuinely off-ledger. We
cannot show what the FEC doesn't see.
08 · Caveats Things to keep in mind before reading a chart.
Money is a proxy. It's a good one, but it's still a proxy,
and it's subject to filing error, strategic routing, and
reporting windows. Specific caveats:
a
Earmarks double-count at a glance. A $2,800 contribution earmarked through ActBlue to a candidate appears in both the ActBlue row and the candidate row. Aggregate totals net them out; entity pages don't.
b
Filing windows skew comparisons. Quarterly filers and monthly filers report on different schedules. A quarter-over-quarter comparison between two committees on different schedules can mislead. Use cycle-to-date where possible.
c
Vendor pass-throughs. A committee may pay a media-buying vendor, which pays TV stations. The end beneficiary of the spend is the station, not the vendor — we show the vendor edge because that's what was filed.
09 · Reproduce Do it yourself. Here's the shape of the work.
Everything on this site is derived from files you can download
too. If you don't trust a number, the fastest way to check is
to reproduce it. Rough outline for any given figure:
1
Find the source file. Every detail page links the bulk file and the row range it pulled from — e.g. indiv24.zip · rows 14,203,115–14,203,182.
2
Download from fec.gov. Use the bulk data index. Verify the hash matches what we recorded at ingest — if it doesn't, the FEC re-issued the file and our numbers may be stale.
3
Load into any SQL engine. DuckDB, SQLite, Postgres — doesn't matter. Column order is documented by the FEC; our headers file is the same.
4
Re-run the aggregate. Entity pages show the exact GROUP BY logic and window. Most figures are a single aggregate query.
10 · FAQ Things people ask.
Do you editorialize?
No. Every blurb, leaderboard, and highlight is
derived mechanically from filings. The one editorial
choice is the default sort order on the landing page
(largest dollar flow, last 90 days). Everything else is
user-controlled.
Can I get a takedown?
FEC filings are public records. We don't remove
publicly-filed information. If you believe a record is
misattributed (e.g. name collision), contact us with the
filing ID and we'll re-run the resolver with a manual rule.
How fresh is the data right now?
The status line at the bottom of this page shows the
last successful sync. We re-ingest nightly at 02:00 ET.
If you're viewing data within an hour of that window, a
few figures may be mid-update — reload a minute later.
Why don't you show state races?
State elections are governed by 50 different filing
regimes, with wildly different fields, formats, and
freshness. We'd rather be complete about federal races
than shallow about everything.
Can I get the raw data?
Yes — the raw data is the FEC's, and it's free at
fec.gov. Our derivative tables (resolved entity IDs,
rollup edges, party classification) are published as
nightly Parquet snapshots under an open license.
Last pipeline run · 2026-04-23 02:41 ET · all green