● public · updated 2 days ago · id hytfnmtgph ·4,362 chars

How to find emails of GitHub repo stargazers (public commit mining)

● How to find emails of GitHub repo stargazers (public commit mining).md 4,362 chars · read-only
# How to find emails of GitHub repo stargazers

GitHub doesn't expose user emails directly — but most users leak their email through public commits in their own repos. This note shows how to mine those.

## The idea

1. List a repo's stargazers via the GitHub API.
2. For each stargazer, look at their own public repos.
3. Read commits they authored — `commit.author.email` is public.
4. Filter out GitHub's anonymized `*@users.noreply.github.com` addresses.

That's it. No scraping, no auth tricks — it's all public data via the official REST API.

## Quick one-off (single user)

```bash
gh api "users/<login>/repos?per_page=5&sort=pushed&type=owner" \
  --jq '.[].full_name' | while read r; do
    gh api "repos/$r/commits?author=<login>&per_page=3" \
      --jq '.[]?.commit.author | "\(.email)\t\(.name)"'
  done | grep -viE 'users\.noreply\.github\.com' | sort -u
```

## Full script — scrape stargazers of any repo

Save as `scrape_stargazer_emails.sh`, `chmod +x`, run.

Requires: [`gh` CLI](https://cli.github.com/) authenticated (`gh auth login`) and `jq`.

```bash
#!/usr/bin/env bash
# Mines public emails from a repo's stargazers via their commit history.
set -euo pipefail

read -rp "GitHub repo URL or owner/repo: " INPUT
read -rp "Max stargazers to scan [200]: " MAX
MAX="${MAX:-200}"

REPO=$(echo "$INPUT" \
  | sed -E 's#https?://github\.com/##; s#\.git$##; s#/$##' \
  | awk -F/ '{print $1"/"$2}')

if [[ -z "$REPO" || "$REPO" != */* ]]; then
  echo "Could not parse repo from: $INPUT" >&2
  exit 1
fi

OUT="emails_$(echo "$REPO" | tr '/' '_').csv"
echo "login,name,email,source_repo" > "$OUT"

echo "Scanning stargazers of $REPO (cap: $MAX) -> $OUT"

page=1
count=0
while [ "$count" -lt "$MAX" ]; do
  users=$(gh api "repos/$REPO/stargazers?per_page=100&page=$page" --jq '.[].login' 2>/dev/null || true)
  [ -z "$users" ] && break

  for login in $users; do
    [ "$count" -ge "$MAX" ] && break
    count=$((count + 1))
    printf "[%d/%d] %s" "$count" "$MAX" "$login"

    repos=$(gh api "users/$login/repos?per_page=5&sort=pushed&type=owner" --jq '.[].full_name' 2>/dev/null || true)
    if [ -z "$repos" ]; then echo "  (no public repos)"; continue; fi

    found=0
    for r in $repos; do
      while IFS=$'\t' read -r email name; do
        [ -z "$email" ] && continue
        echo "$login,\"$name\",$email,$r" >> "$OUT"
        found=$((found + 1))
      done < <(gh api "repos/$r/commits?author=$login&per_page=5" --jq \
                 '.[]? | [.commit.author.email, .commit.author.name] | @tsv' 2>/dev/null \
                 | grep -viE 'users\.noreply\.github\.com|noreply@github\.com' \
                 | sort -u)
      [ "$found" -gt 0 ] && break
    done
    echo "  -> $found email(s)"
  done

  page=$((page + 1))
done

echo
echo "Done. CSV: $OUT"
echo "Unique emails: $(tail -n +2 "$OUT" | cut -d, -f3 | sort -u | wc -l | tr -d ' ')"
```

## Run it

```bash
./scrape_stargazer_emails.sh
# GitHub repo URL or owner/repo: https://github.com/supermemoryai/supermemory
# Max stargazers to scan [200]: 200
```

Output: `emails_<owner>_<repo>.csv` with `login,name,email,source_repo`.

## Things to know

- **Rate limits.** GitHub authenticated REST = 5,000 req/hour. Each user costs ~2-6 calls, so ~800-2,000 users/hour. Scan in batches if the repo is huge.
- **Stargazer cap.** GitHub's stargazer endpoint paginates up to ~40,000 (400 pages × 100). For repos bigger than that you can't get the full list.
- **Hit rate.** Many users' commit authors are anonymized to `*@users.noreply.github.com` — filtered out by default. Real emails leak from users who haven't enabled the "block command-line pushes that expose my email" setting.
- **Speed.** The script breaks out of the per-user repo loop on the first hit. Remove that `break` if you want every email per user.

## Ethics / use it for good

This is purely public data — every email here was already published in a `git log` somewhere on github.com. But:

- Don't spam. Reach out *individually*, with context (e.g. "saw you starred X, building something adjacent, would love your take").
- Honor unsubscribes immediately.
- Don't dump the CSV into a marketing tool that blasts cold sequences. That's how you torch a sender domain *and* your reputation.

Use it for genuine 1:1 outreach to people who showed signal that they care about your space. That's what it's good for.
feed this URL to your agent
https://npad.run/p/how-to-find-emails-of-github-repo-stargazers-hytfnmtgph
install npad

paste the URL into Claude Code, Codex or Cursor — the agent fetches the full body via npad's API.