Search toggle
Say hello.
3817 Elwood Ave STE111
360-820-3794

Real Record — Open Civic Data

Descriptive Documentation & Data Dictionary

Dataset: Real Record — Open Civic Data: Local Government Transcripts and Fiscal Records
Publisher: Real Housing Reform Initiative (RHRI) — realrecord.org
License: CC BY 4.0
S3 Bucket: s3://real-record-civic-data/
Transcript + Fiscal Coverage: City of Bellingham, WA; Whatcom County, WA (incl. Ferndale, Lynden); King County, WA (in active pipeline)
Legislative Coverage: City of Bellingham + Whatcom County + Washington State (expands automatically with corpus)
Current Size: ~3 GB (text only)
Contact: brian@realhousingreform.org


Overview

Real Record is open government transparency infrastructure — a structured, cross-linked civic-data corpus designed to make the full record of local government (what officials said, what was authorized, what was spent, and what was enacted) permanently accessible to researchers, journalists, and residents. It links four layers:

Layer What it contains Primary use
Transcripts Speaker-attributed meeting speech Who said what, when
Fiscal joins Budget, revenue, levy, and fee data What was authorized and spent
Briefings Machine-assisted meeting summaries Rapid discovery and search
Legislative joins Ordinances, resolutions, contract awards, grant acceptances, and state bills with meeting cross-references What was enacted, what it cost, and when it was discussed

All four layers share a common meeting_id key and can be joined across layers using jurisdiction + fiscal_year for financial data, or bill_id for legislative linkages.


Architecture & Scalability

Real Record is designed for systematic statewide expansion, not one-off jurisdiction coverage.

Auto-connect on import: When data is imported for a new jurisdiction — whether transcripts, fiscal documents, legislative records, or meeting packets — the pipeline automatically resolves and links all entities across all four layers. New jurisdictions, speakers, funds, and legislative items receive unique system identifiers on first encounter. No manual schema changes or configuration are required to add a jurisdiction to the corpus.

Washington State coverage universe:

RHRI maintains a WA Expansion Tracker covering all 281 incorporated Washington State cities and towns, organized into four tiers by population:

Tier Population Cities Example
1 — Priority 100,000+ 9 Seattle, Spokane, Tacoma, Vancouver, Bellevue
2 — High 25,000–100,000 36 Bellingham, Kirkland, Yakima, Kennewick
3 — Medium 5,000–25,000 77 Port Angeles, Pullman, Ellensburg
4 — Low <5,000 159 Smaller cities and towns statewide

As of May 2026: Bellingham and Whatcom County are active in the pipeline. Port Angeles (Clallam County) is in active research. The remaining 278 jurisdictions are prioritized and tracked.

Two-track expansion model:

Track A — Platform-standardized scraping: Washington State's public meeting ecosystem is dominated by a small number of commercial platforms — CivicPlus (~57 WA cities) and Granicus (~20 WA cities) account for the majority of Tier 1–3 jurisdictions. A single scraper built for each platform automatically reaches dozens of cities. Whatcom County and Bellingham were chosen as the founding test bed because their records were well-structured and scrapable, enabling rapid pipeline development and validation before scaling to larger jurisdictions.

Track B — Underserved jurisdictions (separate project and funding): Smaller cities and rural communities that lack the infrastructure to record, store, or publish their public meetings. For these jurisdictions, RHRI operates a parallel capacity-building program — providing the recording infrastructure and data collection directly, then ingesting the resulting records into the Real Record corpus under the same schema. These communities gain a permanent civic record for the first time; the corpus gains geographic coverage that no amount of web scraping could reach.

Both tracks produce structurally identical records — the same four layers, the same partition scheme, the same identifiers — so researchers can query across Track A and Track B jurisdictions without distinguishing the collection method.

Coverage at scale: At full Washington State coverage (39 counties + 281 cities), the corpus is projected to reach ~70 GB. The Hive partition scheme (jurisdiction/year) and Parquet format support efficient filtered reads at that scale without requiring users to download the full dataset.


Dataset Layers

Layer 1: Transcript Segments

Segment-level (sentence or speaker-turn) speech records from official public meetings.

Source: Official audio recordings (MP3) and closed-caption VTT files published by the City of Bellingham and Whatcom County. Transcription via Azure Speech-to-Text; speaker attribution via roster matching and heuristic classification.

Files: s3://real-record-civic-data/data/parquet/layer=transcripts/

Schema: segments

Field Type Description
segment_id string Unique segment identifier (UUID)
meeting_id string Canonical meeting identifier (see Meeting ID Format)
jurisdiction string Jurisdiction slug (e.g., bellingham-wa, whatcom-county-wa)
meeting_date date Date of the meeting (YYYY-MM-DD)
meeting_body string Body code (e.g., BEL-CON, WHA-CON, BEL-PLN)
segment_index integer Ordinal position of segment within meeting
start_time_sec float Start offset in seconds from meeting audio start
end_time_sec float End offset in seconds from meeting audio start
speaker_id string Speaker identifier (UUID); NULL if unresolved
speaker_name string Resolved speaker display name; NULL if unresolved
speaker_role string elected_official, staff, public_commenter, unknown
speaker_confidence float Resolution confidence score (0.0–1.0)
is_public_comment boolean True if segment is part of a public comment period
text string Full text of the segment
word_count integer Word count of segment text
source string speech_to_text or vtt_caption
collection_track string digital_scrape (automated from public portal) or rhri_recorded (RHRI provided the recording for a jurisdiction without meeting infrastructure)

Schema: speakers

Field Type Description
speaker_id string UUID
jurisdiction string Jurisdiction slug
display_name string Full name as appears on official rosters
role string elected_official, staff, public_commenter
body_codes string Pipe-separated body codes for this speaker
term_start date Start of term (elected officials)
term_end date End of term; NULL if current

Speaker Identification Methodology

Known participants (elected officials and staff) are pre-loaded from official rosters before each meeting. Every city council member, county councilmember, planning commissioner, and recurring staff presenter has a confirmed speaker_id. When the same person appears across meetings, their segments are attributed to the same persistent ID — enabling longitudinal analysis of individual officials' positions, participation rates, and topic focus over time.

Public commenters are identified by first name and last initial only (display_name format: "Jane D."). Most public commenters self-introduce at the start of their remarks — "My name is Jane D., I'm a resident of the Sunnyland neighborhood" — and the pipeline extracts this pattern to log participation without exposing full names. Commenters who do not self-identify are logged as public_commenter by role with no name attached.

Voices from the Meeting

Every meeting in the corpus includes a participation log — a structured record of who showed up and spoke, separate from the full transcript. This serves two purposes:

  1. Public accountability: Residents who take the time to speak at a public meeting deserve to be acknowledged. The participation log gives them a named record that their voice was heard and logged, even if the briefing summary condenses or omits their individual comment.

  2. Civic signal: Raw headcount and commenter volume are themselves data. Some meetings attract 3 speakers; others — like the October and November 2024 Whatcom County Council sessions on the jail bond — drew 4+ hours of continuous public comment. That disparity is part of the story.

Schema: meeting_participants

Field Type Description
participant_id string UUID
meeting_id string FK → meetings
jurisdiction string Jurisdiction slug
display_name string First name + last initial ("Jane D.") if self-identified; NULL otherwise
comment_position integer Order in which they spoke during public comment
comment_topics string Pipe-separated topic tags extracted from their remarks
comment_duration_sec integer Approximate speaking time in seconds
self_identified boolean True if the commenter stated their own name
neighborhood string Self-stated neighborhood or city of residence if mentioned

Meeting ID Format

Meeting IDs follow a hierarchical slug pattern that encodes geography and meeting body:

{JURISDICTION}-{BODY}-{YYYY-MM-DD}[-{SUFFIX}]

The jurisdiction segment is derived from the full geographic hierarchy — county + city/entity + body — so IDs remain globally unique as coverage expands:

Segment Derivation Example
Jurisdiction slug County + city/entity abbreviation BEL (Bellingham, Whatcom Co.) · WHA (Whatcom County) · KCO (King County)
Body code Committee or board abbreviation CON (Council) · PLN (Planning) · CTW (Committee of the Whole)
Date ISO 8601 date 2026-04-28
Suffix (optional) Distinguishes same-day sessions -A, -B, -EXS (executive session)

Full example: BEL-CON-2026-04-28 = Bellingham City Council, April 28 2026.

Namespace capacity: Because each jurisdiction slug encodes a unique county + city combination, and each body code is scoped within that jurisdiction, the ID space is effectively unbounded for Washington State coverage (39 counties × 281 cities × multiple bodies per city × 365 dates per year). Even at full multi-state expansion, a state prefix (WA-, OR-, ID-) can be prepended to the jurisdiction slug without any changes to the schema — existing IDs remain stable and new states occupy their own prefix namespace.

Meeting Body Codes

Code Body Jurisdiction
BEL-CON Bellingham City Council City of Bellingham
BEL-PLN Bellingham Planning Commission City of Bellingham
BEL-BFC Budget & Finance Committee City of Bellingham
WHA-CON Whatcom County Council Whatcom County
WHA-PLN Whatcom County Planning Commission Whatcom County
WHA-CTW Committee of the Whole Whatcom County
FER-CON Ferndale City Council City of Ferndale
LYN-CON Lynden City Council City of Lynden

Layer 2: Fiscal Joins

Fund-level revenue and expenditure history, tax levy authorizations, and ballot-measure linkages. Structured to enable promise-vs-delivery analysis across budget cycles.

Source: City of Bellingham Adopted Budget Books (2005–present), Whatcom County Adopted Budget Books, Comprehensive Annual Financial Reports (CAFRs) 1999–2023, Whatcom County Assessor Tax Books, and WA Department of Revenue levy data.

Files: s3://real-record-civic-data/data/parquet/layer=fiscal/

Schema: funds

Field Type Description
fund_id string UUID
jurisdiction string Jurisdiction slug
fund_code string Official fund number (e.g., 120, 411)
fund_name string Official fund name
fund_type string general, special_revenue, capital, debt_service, enterprise
rcw_authority string Washington RCW citation (if applicable)
description string Fund purpose summary

Schema: revenue_collections

Field Type Description
collection_id string UUID
fund_id string FK → funds
jurisdiction string Jurisdiction slug
fiscal_year integer Fiscal year (YYYY)
revenue_source_id string FK → revenue_sources
amount_usd decimal Collected revenue in USD
source_document string Source document reference
confidence string high (CAFR-derived), medium (budget book), low (estimate)

Schema: revenue_sources

Field Type Description
source_id string Slug identifier
source_name string Display name
source_type string Revenue category — see Source Types below
jurisdiction string Jurisdiction slug
rcw_authority string Washington RCW citation

Source Types:

Value Description Example
property_tax Regular and excess property tax levies General Fund levy, Greenways levy
sales_tax State-shared and local option sales tax 0.1% Public Safety Sales Tax, Annexation Sales Tax
bo_tax Business & Occupation tax on gross receipts Bellingham B&O on manufacturing, retail
utility_tax Occupation taxes on utility gross revenues Bellingham BMC 6.06 (water 18.25%, wastewater 11.5%)
utility_rate User rates for water, sewer, stormwater services Monthly service charges, consumption tiers
impact_fee One-time development impact fees (RCW 82.02.050) Transportation, park, school, fire impact fees
connection_fee System development charges for utility connections Water/sewer connection charges
permit_fee Building, land use, and development permit fees Building permits, subdivision fees
grant Federal, state, and foundation grants CDBG, ecology grants
bond General obligation or revenue bond proceeds Capital project bonds
intergovernmental State-shared revenues and interlocal transfers Motor vehicle fuel tax, liquor tax, ILA payments
other Miscellaneous revenues not categorized above Donations, miscellaneous receipts

Schema: budget_line_items

Field Type Description
line_id string UUID
fund_id string FK → funds
fiscal_year integer Fiscal year
category string personnel, operations, capital, debt, transfers, cash_balance
subcategory string Department or program label
amount_usd decimal Budgeted or actual amount in USD
amount_type string adopted, amended, actual
source_document string Source document reference

Schema: ballot_measures

Field Type Description
measure_id string Slug identifier
jurisdiction string Jurisdiction slug
measure_name string Official ballot measure name
election_date date Election date
passed boolean Passed / failed
yes_pct float Percent yes votes
authorized_amount_usd decimal Authorized levy or bond amount
authorized_rate float Tax rate (per $1,000 AV) if applicable
rcw_authority string Washington RCW citation
ordinance string Enacting ordinance reference
description string Summary of what was authorized

Schema: levy_authorizations

Field Type Description
levy_id string Slug (e.g., bellingham-greenways-v)
measure_id string FK → ballot_measures
fund_id string FK → funds
jurisdiction string Jurisdiction slug
levy_name string Official levy name
levy_type string regular, excess, bond, special
start_year integer First collection year
end_year integer Final collection year (NULL if ongoing)
authorized_rate float Authorized rate per $1,000 AV

Layer 3: Written Briefings

Machine-assisted meeting summaries produced by the RHRI briefing pipeline. Each meeting may have up to three briefing tiers.

Source: Derived from Layer 1 transcripts plus agenda PDFs and supporting documents. Produced using a reproducible pipeline (Claude claude-sonnet-4-5 via Anthropic API) with structured prompts versioned in the RHRI pipeline repository.

Files: s3://real-record-civic-data/data/markdown/layer=briefings/ s3://real-record-civic-data/data/parquet/layer=briefings/

Briefing Tiers

Tier Name Description Typical length
basic Basic Briefing Key decisions, votes, and action items 300–500 words
supplemental Supplemental Briefing Topic-tagged summaries with fiscal context 600–1,000 words
narrative Full Narrative Complete chronological account with public comment depth 1,500–4,000 words

Schema: briefings

Field Type Description
briefing_id string Slug (matches meeting_id + tier)
meeting_id string FK → meetings
jurisdiction string Jurisdiction slug
meeting_date date Meeting date
tier string basic, supplemental, narrative
title string Meeting title
executive_summary string 2–3 sentence summary
full_text string Complete briefing text (Markdown)
topics string Pipe-separated topic tags
fiscal_links string Pipe-separated fund/levy/source IDs mentioned
pipeline_version string RHRI pipeline version used
model string LLM model used
generated_at timestamp Generation timestamp

Layer 4: Legislative Joins

Structured tracking of ordinances, resolutions, and state legislation (RCW-cited bills) cross-referenced to the meetings where they were discussed and/or enacted.

Source: Official legislative records from jurisdiction clerk websites, WA State Legislature (app.leg.wa.gov), and ordinance/resolution text extracted from meeting packet PDFs. Cross-references are derived from briefing pipeline analysis and verified against meeting minutes.

Live tracker: realhousingreform.org/legislation
Files: s3://real-record-civic-data/data/parquet/layer=legislative/

Schema: bills

Covers all action items tracked through the RHRI Legislation Tracker at realhousingreform.org/legislation. As of May 2026: 908 items tracked, 555 approved, across 3 jurisdictions including 42 state bills (City of Bellingham, Whatcom County, Washington State).

The legislative layer automatically mirrors transcript and fiscal coverage — as new jurisdictions are added to the Real Record corpus, their legislative records are ingested in parallel. No separate configuration is required per jurisdiction.

Local items use a YYYY-NNN numbering system matching the jurisdiction's official clerk numbering. State bills use standard WA legislative notation (e.g., SB 602, HB 1234).

Field Type Description
bill_id string Slug identifier (e.g., bel-2024-898, wa-sb-602-2026)
item_number string Official item number as assigned by clerk (2024-898, SB 602)
jurisdiction string Jurisdiction slug (bellingham-wa, whatcom-county-wa, washington-state)
bill_type string Category — see Bill Types below
title string Official title as it appears on the agenda
status string final_vote_approved, committee_discussion, first_reading, tabled, failed, pending
introduced_date date Date first appearing on an agenda
last_discussed_date date Most recent meeting date where the item appeared
passed_date date Date of final vote; NULL if not yet passed
vote_tally string Vote count (e.g., 7-0, 6-0-1, 3-0); NULL if no vote taken
dollar_amount decimal Associated dollar amount (contract value, grant amount, budget impact); NULL if not applicable
mention_count integer Number of meeting segments in which this item is referenced
rcw_authority string Authorizing RCW citation(s) if state-law-enabled
fund_links string Pipe-separated fund IDs affected by this item
levy_links string Pipe-separated levy IDs authorized by this item
description string Plain-language summary of what the item does
source_url string URL to official text or agenda packet

Bill Types:

Value Description Example
ordinance Local law or code amendment requiring council vote Housing Action Plan Funding Allocation
resolution Formal council position, policy adoption, or appointment Rapid Transit Locally Preferred Alternative
contract_award Procurement decisions — services, construction, or interlocal agreements Pacific Street Fleet Building Contract ($808K)
grant_acceptance Federal, state, or foundation grants received by the jurisdiction Nooksack Riparian Resilience Project Grant ($2.2M)
state_bill Washington State Legislature bills tracked for local impact SB 602 Automatic License Plate Readers, SB 5184 Parking Reform
discussion_item Agenda items presented for information without a vote Downtown Vacancy Presentation, 2025 Budget Presentation

Schema: legislative_cross_references

Links each bill or action item to the meetings where it appeared. Each row is one (bill, meeting) pair — a single bill may appear in multiple meetings as it moves through readings, public hearings, and final votes.

Field Type Description
cross_ref_id string UUID
bill_id string FK → bills
meeting_id string FK → meetings (transcript layer)
jurisdiction string Jurisdiction slug
reference_type string first_reading, committee_discussion, public_hearing, final_vote, discussion_only
segment_ids string Pipe-separated segment IDs where the bill is mentioned
agenda_item string Agenda item number or title (e.g., "Agenda Item 7b")
vote_result string approved, failed, tabled, withdrawn; NULL if no vote
vote_tally string Vote count (e.g., 7-0, 6-0-1)
dollar_amount decimal Dollar amount if this action involved an expenditure or grant

Cross-Layer Join: Legislation → Fiscal → Transcripts

The legislative layer is the connective tissue between the fiscal and transcript layers. A ballot measure (fiscal layer) → enacting ordinance (legislative layer) → council discussion (transcript layer) forms a complete accountability chain: what voters authorized, what council passed, and what officials said about it before the vote.

# Example: Find every council segment discussing the utility occupation tax ordinance
import pyarrow.dataset as ds

cross_refs = ds.dataset("s3://real-record-civic-data/data/parquet/layer=legislative/",
                        format="parquet", partitioning="hive", filesystem=fs)

# Get all meeting IDs where BMC 6.06 (utility occupation tax) was discussed
bmc_606_refs = cross_refs.to_table(
    filter=ds.field("bill_id") == "bel-bmc-6.06-utility-occupation-tax"
).to_pandas()

# Join to transcript segments
meeting_ids = bmc_606_refs["meeting_id"].tolist()
segments = transcript_dataset.to_table(
    filter=(
        (ds.field("jurisdiction") == "bellingham-wa") &
        (ds.field("meeting_id").isin(meeting_ids))
    )
).to_pandas()

print(f"Found {len(segments)} segments across {len(meeting_ids)} meetings")

Topic Taxonomy: The 7 Policy Tenants

Every decision made by local government — every vote, ordinance, expenditure, levy, and policy outcome — connects to one or more of seven Policy Tenants through its subtopics. This is the organizing premise of the Real Record topic taxonomy: the seven tenants are comprehensive enough that no act of local government falls outside them.

A stormwater fee increase connects through Environment (water management) and Taxes (rate structure). A jail bond measure connects through Safety (corrections capacity) and Governance (ballot authorization). A housing ordinance may connect through Housing (zoning), Economy (development impact), and Taxes (impact fees). The multi-tenant connections are the story — they reveal which policy domains actually drive decisions, and which tenant combinations recur across years and jurisdictions.

Real Record tracks approximately 1,000 subtopics organized under the seven tenants. Every meeting segment, briefing, ordinance, and fiscal record in the corpus is tagged to the subtopics that appear in it — and through those subtopics, to one or more tenants. This makes the taxonomy a navigation layer across all four data layers simultaneously.

The tenants are the organizing spine of the Real Briefings Topic Hub: users can filter by tenant, browse subtopics ranked by mention count, and click through to every meeting, vote, and fund page where that topic appeared.

Tenant What it covers Example subtopics
Housing Land use, density, affordability, zoning, ADUs affordable-housing, ADUs, middle-housing, urban-growth-areas, Housing Policy
Environment Water, land, climate, natural resources flood-mitigation, stormwater-management, flood-response, water-rights-adjudication, Lake Whatcom Watershed
Taxes Revenue policy, levies, rates, tax structure property-tax, sales-tax, utility-tax, B&O-tax, levy-authorizations, tax-exemptions
Economy Local economic development, business, infrastructure budget-finance, transportation-policy, economic-development, capital-projects
Safety Public safety, emergency services, corrections public-safety, jail-capacity, behavioral-health, emergency-management, fire-districts
Social Community services, health, parks, engagement behavioral-health, parks-recreation, community-engagement, library, human-services
Governance Planning process, elections, administration, land-use law comprehensive-plan, growth-management-act, urban-growth-areas, committee-appointments, PROS-plan

How a topic connects across layers

Each topic tag is a cross-layer connector. When a user clicks a topic — say, flood-mitigation — they can reach:

  • Transcript segments: Every speaker turn where the topic appeared, across all meetings and bodies, with speaker name and meeting date
  • Briefings: Meeting summaries where the topic was flagged — ranked by recency or mention count
  • Legislative joins: Ordinances, resolutions, and state bills cross-referenced to that topic (e.g., stormwater ordinance, flood-zone overlay amendment)
  • Fiscal joins: Fund pages and revenue sources linked to the topic (e.g., Stormwater Utility Fund 480, Flood Control Zone District levies)

This four-layer cross-link is what distinguishes Real Record from a simple transcript archive. A topic isn't just a keyword — it's a thread connecting what residents said at a public hearing, what the council voted on, what it cost, and where the money came from.

Scale of the taxonomy

The current Whatcom County / Bellingham corpus contains approximately 1,000 distinct topic tags across the seven tenants. As additional jurisdictions are added, topics auto-resolve: a topic already in the taxonomy gains new data from the new jurisdiction's meetings automatically. Novel topics from new jurisdictions are added to the appropriate tenant during onboarding review.

The full browsable taxonomy is live at realhousingreform.org/topics (migrating to realrecord.org/topics).


Hot Topics: What's Being Discussed — and What Isn't

Real Record includes a topic-frequency layer that tracks how often specific subjects appear across meeting transcripts, over time and across jurisdictions. This powers a "trending topics" surface that shows researchers and residents what their government is actually spending time talking about.

What the data shows: Topic frequency is computed at the segment level — each speaker turn that references a tracked topic is counted. Frequency is normalized by meeting volume so jurisdictions with more meetings don't artificially dominate. Topics can be tracked monthly, quarterly, or annually.

The negative space is as important as the signal. An absence of discussion is a finding. For example: analysis of Bellingham City Council transcripts (2019–2026) shows that terms like jobs, employment, wages, and economic development appear in fewer than 0.3% of all segments — a near-total absence compared to housing, utilities, public safety, and budget topics that dominate the agenda. That gap tells a story about what local government is and isn't prioritizing, and it is only visible when you can measure the full record.

Trending topics surface issues gaining momentum — rising mention counts week-over-week — before they reach mainstream coverage. The jail bond / public safety sales tax discussion in Whatcom County, for example, was detectable as a trending topic in transcript data months before it became a front-page story.

Schema: topic_frequencies

Field Type Description
topic_id string Slug identifier (matches topics table)
jurisdiction string Jurisdiction slug
meeting_body string Body code; NULL for jurisdiction-wide aggregate
period string YYYY-MM (monthly) or YYYY-Q1 (quarterly)
mention_count integer Number of segments referencing this topic in the period
meeting_count integer Number of distinct meetings in which the topic appeared
speaker_count integer Number of distinct speakers who mentioned the topic
public_commenter_count integer Number of public commenters (not officials) who mentioned the topic
trend_delta float Change in normalized frequency vs. prior period (positive = rising)

Note on the public_commenter_count field: Separating resident-driven topic frequency from council-driven frequency reveals whether an issue is being pushed by the community or by officials — a meaningful distinction for accountability research.


Known Limitations & Data Caveats

  • Speaker attribution confidence: Approximately 65–75% of segments are attributed to named speakers. Unattributed segments default to speaker_role=unknown. Public commenters are classified by role (not name) unless they self-identify.

  • Transcription accuracy: Speech-to-text accuracy varies by audio quality. Proper nouns (parcel numbers, ordinance codes, proper names) are most susceptible to transcription errors. VTT-sourced segments (from official captions) have lower error rates than STT-derived segments.

  • Fiscal year conventions: Bellingham uses a biennial budget (two-year cycle); individual fiscal year figures are extracted from midpoint amendments and CAFR actuals. Whatcom County uses an annual budget. Revenue figures labeled medium confidence are interpolated from biennial totals.

  • Briefing completeness: Briefings are not produced for meetings where no transcript is available (audio not published, or transcription failed). Approximately 8% of meeting records have no associated briefing.

  • Coverage dates: Transcript coverage varies by body. Bellingham City Council and Whatcom County Council have the deepest coverage (2019–present). Planning Commission coverage begins 2024 for Bellingham and 2025 for Whatcom County.


Accessing the Data

From S3 (public read-only)

# List top-level structure
aws s3 ls s3://real-record-civic-data/ --no-sign-request

# List transcript partitions for Bellingham
aws s3 ls s3://real-record-civic-data/data/parquet/layer=transcripts/jurisdiction=bellingham-wa/ --no-sign-request

# Download all 2025 Bellingham Council transcripts
aws s3 sync s3://real-record-civic-data/data/parquet/layer=transcripts/jurisdiction=bellingham-wa/year=2025/ \
  ./local-data/ --no-sign-request

With Pandas / PyArrow

import pandas as pd
import pyarrow.dataset as ds

# Load Bellingham transcript segments for 2025
dataset = ds.dataset(
    "s3://real-record-civic-data/data/parquet/layer=transcripts/",
    format="parquet",
    partitioning="hive",
)

df = dataset.to_table(
    filter=(
        (ds.field("jurisdiction") == "bellingham-wa") &
        (ds.field("year") == 2025)
    )
).to_pandas()

With Amazon Athena

-- Create external table (run once)
CREATE EXTERNAL TABLE real_record_segments (
  segment_id     STRING,
  meeting_id     STRING,
  meeting_date   DATE,
  meeting_body   STRING,
  speaker_name   STRING,
  speaker_role   STRING,
  is_public_comment BOOLEAN,
  text           STRING,
  word_count     INT
)
PARTITIONED BY (jurisdiction STRING, year INT)
STORED AS PARQUET
LOCATION 's3://real-record-civic-data/data/parquet/layer=transcripts/'
TBLPROPERTIES ('parquet.compression'='SNAPPY');

MSCK REPAIR TABLE real_record_segments;

-- Example: find all public comments mentioning housing in 2025
SELECT meeting_id, meeting_date, speaker_name, text
FROM real_record_segments
WHERE year = 2025
  AND jurisdiction = 'bellingham-wa'
  AND is_public_comment = true
  AND LOWER(text) LIKE '%housing%'
ORDER BY meeting_date, segment_index;

File Formats

Format Layer(s) Use case
Parquet (Snappy) transcripts, fiscal, briefings Analytics, Athena, Spark, Pandas
JSON-Lines transcripts, briefings Streaming pipelines, LLM ingestion
Markdown briefings Human reading, documentation, RAG

Provenance & Pipeline

The full data pipeline is open source and documented at: realrecord.org/methodology (this page).

Source documents (original audio, PDF agendas, budget books) are not redistributed here due to size but are available via the originating jurisdictions' public records. Accession metadata (source URLs, download dates, document hashes) is included in the metadata/ prefix of the S3 bucket.


Citation

Real Housing Reform Initiative. (2024–present). Real Record — Open Civic Data:
Local Government Transcripts and Fiscal Records [Data set]. AWS Open Data.
https://registry.opendata.aws/real-record-civic-data/

Contact & Updates

  • Primary contact: brian@realhousingreform.org
  • Dataset home: https://registry.opendata.aws/real-record-civic-data/ (live after launch)
  • Methodology & pipeline: https://realrecord.org/methodology
  • Data archive: https://realrecord.org (migration in progress)

Last updated: 2026-05-13