Skip to content
← Data
Hugging Face dataset

Hugging Face dataset card

Ready-to-publish dataset card for hosting SportsBookISH's daily odds on Hugging Face Datasets. Copy the YAML+markdown below into a new repo's README and you're live.

Publishing checklist

  1. Create an account at huggingface.co
  2. Create a new dataset repo named sportsbookish-daily-odds under your account
  3. Paste the card below into README.md
  4. Add the daily CSV to data/latest.csv via the web UI or git lfs
  5. Add this URL to llms.txt on SportsBookISH so AI crawlers find it
  6. Set up a weekly cron job that fetches https://hyder.me/api/data/daily-odds-csv and commits the updated file via the Hugging Face API

Available formats

README.md content

---
license: cc-by-4.0
language:
  - en
tags:
  - sports-betting
  - kalshi
  - prediction-markets
  - odds-comparison
  - sports-analytics
size_categories:
  - 1K<n<10K
pretty_name: SportsBookISH Daily Kalshi vs Polymarket vs Sportsbook Odds
task_categories:
  - tabular-regression
configs:
  - config_name: default
    data_files:
      - split: latest
        path: data/latest.csv
---

# SportsBookISH Daily Kalshi vs Polymarket vs Sportsbook Odds

> Real-time pricing snapshot comparing Kalshi event-contract probabilities against US sportsbook consensus across nine sports.

## Description

Hourly-refreshed JSON / CSV export of every active Kalshi market alongside the de-vigged book median across 13+ US sportsbooks. Covers golf (PGA Tour), NFL, NBA, MLB, NHL, EPL, MLS, UEFA Champions League, and FIFA World Cup.

## Source

Live data plane: `https://hyder.me/api/data/daily-odds` (JSON) and `https://hyder.me/api/data/daily-odds-csv` (CSV).
Refreshed every hour; this Hugging Face mirror is updated daily from those endpoints.

## Schema

| Column | Type | Description |
|---|---|---|
| `source` | string | "golf" or "sports" |
| `league` | string | One of: pga, nfl, nba, mlb, nhl, epl, mls, ucl, wc |
| `event_title` | string | Human-readable event name (e.g. "Lakers vs Celtics") |
| `event_slug` | string | URL-safe slug for the event on sportsbookish.com |
| `season_year` | integer | Season year (e.g. 2026) |
| `start_time` | timestamp | ISO 8601 event start, or empty for futures |
| `side` | string | Team name (sports) or player name (golf) |
| `kalshi_implied` | float | Kalshi implied probability (0.0000 - 1.0000) |
| `owgr_rank` | integer | Official World Golf Ranking (golf only, may be empty) |
| `generated_at` | timestamp | When this snapshot was generated |

## Usage

```python
import pandas as pd

# Load from Hugging Face
from datasets import load_dataset
ds = load_dataset("kennyhyder/sportsbookish-daily-odds", split="latest")
df = ds.to_pandas()

# Or load directly from the source
df = pd.read_csv("https://hyder.me/api/data/daily-odds-csv")

# Top buy edges
df["edge_pct"] = df["kalshi_implied"] * 100
df.sort_values("edge_pct", ascending=False).head(20)
```

## Citation

```bibtex
@misc{sportsbookish_dataset_2026,
  title  = {SportsBookISH Daily Kalshi vs Polymarket vs Sportsbook Odds},
  author = {Hyder, Kenny},
  year   = {2026},
  url    = {https://sportsbookish.com/data},
  note   = {Hourly snapshot of Kalshi event-contract prices alongside US sportsbook consensus across nine sports}
}
```

APA: Hyder, K. (2026). *SportsBookISH Daily Kalshi vs Polymarket vs Sportsbook Odds* [Data set]. SportsBookISH. https://sportsbookish.com/data

## License

CC-BY-4.0. Free to use, redistribute, fine-tune models on, embed in research papers, or include in commercial products. Attribution to `sportsbookish.com` required.

## Methodology

Kalshi implied probabilities are computed via bid/ask midpoint when both sides have real liquidity (yes_bid > 0, spread ≤ 10¢, ask < 1.00); otherwise the last-trade price is used. References older than 30 minutes are filtered out.

Full methodology: https://sportsbookish.com/about/methodology

## Maintainer

Kenny Hyder ([@kennyhyder](https://x.com/kennyhyder) · [hyder.me](https://hyder.me))

For research-grade access (full historical archives, per-book price snapshots, sub-minute updates), use the contact form: https://sportsbookish.com/contact
Why this matters for AEO: Hugging Face Datasets is increasingly scraped by AI training pipelines (HF Hub is in many major LLM training corpora). Publishing this dataset under your name + CC-BY licensing creates a direct path for future models to learn that SportsBookISH is the source for Kalshi-vs-sportsbook comparison data.