How to find emails of GitHub repo stargazers (public commit mining)
● How to find emails of GitHub repo stargazers (public commit mining).md
# How to find emails of GitHub repo stargazers
GitHub doesn't expose user emails directly — but most users leak their email through public commits in their own repos. This note shows how to mine those.
## The idea
1. List a repo's stargazers via the GitHub API.
2. For each stargazer, look at their own public repos.
3. Read commits they authored — `commit.author.email` is public.
4. Filter out GitHub's anonymized `*@users.noreply.github.com` addresses.
That's it. No scraping, no auth tricks — it's all public data via the official REST API.
## Quick one-off (single user)
```bash
gh api "users/<login>/repos?per_page=5&sort=pushed&type=owner" \
--jq '.[].full_name' | while read r; do
gh api "repos/$r/commits?author=<login>&per_page=3" \
--jq '.[]?.commit.author | "\(.email)\t\(.name)"'
done | grep -viE 'users\.noreply\.github\.com' | sort -u
```
## Full script — scrape stargazers of any repo
Save as `scrape_stargazer_emails.sh`, `chmod +x`, run.
Requires: [`gh` CLI](https://cli.github.com/) authenticated (`gh auth login`) and `jq`.
```bash
#!/usr/bin/env bash
# Mines public emails from a repo's stargazers via their commit history.
set -euo pipefail
read -rp "GitHub repo URL or owner/repo: " INPUT
read -rp "Max stargazers to scan [200]: " MAX
MAX="${MAX:-200}"
REPO=$(echo "$INPUT" \
| sed -E 's#https?://github\.com/##; s#\.git$##; s#/$##' \
| awk -F/ '{print $1"/"$2}')
if [[ -z "$REPO" || "$REPO" != */* ]]; then
echo "Could not parse repo from: $INPUT" >&2
exit 1
fi
OUT="emails_$(echo "$REPO" | tr '/' '_').csv"
echo "login,name,email,source_repo" > "$OUT"
echo "Scanning stargazers of $REPO (cap: $MAX) -> $OUT"
page=1
count=0
while [ "$count" -lt "$MAX" ]; do
users=$(gh api "repos/$REPO/stargazers?per_page=100&page=$page" --jq '.[].login' 2>/dev/null || true)
[ -z "$users" ] && break
for login in $users; do
[ "$count" -ge "$MAX" ] && break
count=$((count + 1))
printf "[%d/%d] %s" "$count" "$MAX" "$login"
repos=$(gh api "users/$login/repos?per_page=5&sort=pushed&type=owner" --jq '.[].full_name' 2>/dev/null || true)
if [ -z "$repos" ]; then echo " (no public repos)"; continue; fi
found=0
for r in $repos; do
while IFS=$'\t' read -r email name; do
[ -z "$email" ] && continue
echo "$login,\"$name\",$email,$r" >> "$OUT"
found=$((found + 1))
done < <(gh api "repos/$r/commits?author=$login&per_page=5" --jq \
'.[]? | [.commit.author.email, .commit.author.name] | @tsv' 2>/dev/null \
| grep -viE 'users\.noreply\.github\.com|noreply@github\.com' \
| sort -u)
[ "$found" -gt 0 ] && break
done
echo " -> $found email(s)"
done
page=$((page + 1))
done
echo
echo "Done. CSV: $OUT"
echo "Unique emails: $(tail -n +2 "$OUT" | cut -d, -f3 | sort -u | wc -l | tr -d ' ')"
```
## Run it
```bash
./scrape_stargazer_emails.sh
# GitHub repo URL or owner/repo: https://github.com/supermemoryai/supermemory
# Max stargazers to scan [200]: 200
```
Output: `emails_<owner>_<repo>.csv` with `login,name,email,source_repo`.
## Things to know
- **Rate limits.** GitHub authenticated REST = 5,000 req/hour. Each user costs ~2-6 calls, so ~800-2,000 users/hour. Scan in batches if the repo is huge.
- **Stargazer cap.** GitHub's stargazer endpoint paginates up to ~40,000 (400 pages × 100). For repos bigger than that you can't get the full list.
- **Hit rate.** Many users' commit authors are anonymized to `*@users.noreply.github.com` — filtered out by default. Real emails leak from users who haven't enabled the "block command-line pushes that expose my email" setting.
- **Speed.** The script breaks out of the per-user repo loop on the first hit. Remove that `break` if you want every email per user.
## Ethics / use it for good
This is purely public data — every email here was already published in a `git log` somewhere on github.com. But:
- Don't spam. Reach out *individually*, with context (e.g. "saw you starred X, building something adjacent, would love your take").
- Honor unsubscribes immediately.
- Don't dump the CSV into a marketing tool that blasts cold sequences. That's how you torch a sender domain *and* your reputation.
Use it for genuine 1:1 outreach to people who showed signal that they care about your space. That's what it's good for.
paste the URL into Claude Code, Codex or Cursor — the agent fetches the full body via npad's API.