# v3 Scraping — Baltic Tech Events

Documenting approach and actual outcome for each of the 4 data sources attempted on 2026-04-09.

---

## Source 1: Meetup.com GraphQL API

**Approach:** POST to `https://www.meetup.com/gql` with a lat/lng radius query centered on Tallinn (59.437, 24.753) and Vilnius (54.687, 25.279), 300 km radius, filtered to tech topics.

**Auth required:** Yes — OAuth 2.0 Bearer token. User has a Pro account; OAuth consumer creation pending at `https://secure.meetup.com/meetup_api/oauth_consumers/create`. Token exchange flow documented in conversation.

**Outcome:** ⏳ **Blocked — awaiting OAuth token**

Once token is available, query to run:
```bash
curl -X POST https://www.meetup.com/gql \
  -H "Authorization: Bearer $MEETUP_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{
    "query": "{ searchEvents(filter: { lat: 59.437, lon: 24.753, radius: 300, query: \"tech\", startDateRange: \"2025-01-01\" }) { edges { node { title dateTime venue { city lat lon } } } } }"
  }'
```

Run twice — once centered on Tallinn (covers EE + LV), once on Vilnius (covers LT). Expected yield: recurring local meetups in Riga, Tallinn, Vilnius.

**Events added from this source:** 0 (pending)

---

## Source 2: Eventbrite API

**Approach:** `GET /v3/events/search/` with location + tech category filter. Credentials available in `.env`: `EVENTBRITE_PRIVATE_TOKEN`, `EVENTBRITE_API_KEY`.

**Auth:** Token verified — `GET /v3/users/me/` returns valid account (Adrien Siegfried, id: 477434446459).

**Outcome:** ❌ **Dead end**

| Attempt | Result |
|---|---|
| `GET /v3/events/search/` with Bearer token | 404 — endpoint removed Feb 2020 |
| `POST /api/v3/destination/search/` with Bearer token | 401 — requires browser CSRF cookie, not API token |
| Web page scrape (eventbrite.com/d/latvia--riga/tech/) | AWS WAF CAPTCHA wall, no data |
| `GET /v3/users/me/organizations/` | 0 organizations — no org-scoped query possible |

The remaining v3 API endpoints (`/v3/organizations/:id/events/`, `/v3/venues/:id/events/`) require knowing specific IDs upfront and are useless for geographic discovery. **Eventbrite dropped public event search entirely.**

**Events added from this source:** 0

---

## Source 3: Luma (lu.ma)

**Approach:** Fetch known Baltic org calendar pages, parse `window.__NEXT_DATA__` or JSON-LD for event listings.

**Finding:** lu.ma redirects to luma.com (301). Calendar pages are Next.js apps — event listings are **client-side rendered**, meaning static HTML only contains timestamps (no names/descriptions/venues). The `__NEXT_DATA__` blob exists but only contains calendar metadata, not individual event details.

**Calendars found for the Baltics (via Google site: search):**

| Calendar | URL | Status |
|---|---|---|
| Riga TechWeek 2025 | luma.com/rigatechweek | Active — 15 events Aug 25-31 2025, details JS-only |
| Startup House Riga | luma.com/startup-house-riga | Active — 0 upcoming events at time of scrape |
| TestNest Tallinn | luma.com/testnest.tallinn | Active — JS-rendered |
| Latvian Ruby Meetup | lu.ma/939q9twh | Single event Apr 2025 (past) |
| Defence Tech Meetup Riga | lu.ma/en4sbse1 | Closed/private |
| EU Defence Tech Hackathon Vilnius | lu.ma/edth-2025-vilnius | May 4-6 2025 |
| LLM Applications Meetup Riga | lu.ma/j0gvlqcz | Small meetup |

**To get full Luma data:** Requires Playwright/Puppeteer (headless browser) or Luma's undocumented internal API (`/api/event/get?series=...`). Noted for future improvement.

**Events added from this source:** 2 (Riga TechWeek as festival entry; EU Defence Tech Hackathon Vilnius)

---

## Source 4: Direct Baltic Event Sites + Web Search

**Approach:** WebFetch on known Baltic conference URLs; extract JSON-LD where available, fall back to parsing visible content. Supplemented with targeted web searches for dates/attendance.

**Results:**

| Site | Fetchable | Data quality |
|---|---|---|
| latitude59.ee | ✅ | Full — name, dates, venue, attendance, topics |
| startupday.ee | ✅ | Full |
| rigacomm.com | ✅ | Full |
| balticfintechdays.com | ✅ | Full |
| devdays.lt (via clocate) | ✅ | Full — exact venue, date, topics |
| www.ai.lt (Vilnius AI Summit) | ✅ (via search) | Full |
| techweek.lv | ✅ | Partial — festival-level only |
| techchill.co | ⚠️ | Wix minified JS — parsed via search results instead |
| garage48.org/events | ✅ | Partial — upcoming events only |
| login.lt | ❌ | Connection refused |
| tds.icds.ee (Tallinn Digital Summit) | ✅ (via search) | Full |
| 2025.egovconference.ee | ✅ (via search) | Partial |

**Events added from this source:** 12 conferences + 1 hackathon

---

## Summary

| Source | Events collected | Status |
|---|---|---|
| Meetup GraphQL | 0 | ⏳ Awaiting OAuth token |
| Eventbrite | 0 | ❌ API removed |
| Luma | 2 | ⚠️ Partial — needs headless browser for full data |
| Direct sites | 13 | ✅ Best yield |
| Seed (prior EU events) | 32 | ✅ Hardcoded from prototype v2 |
| **Total** | **47** | |

**Next steps:**
1. Add Meetup token to `.env` as `MEETUP_ACCESS_TOKEN`, re-run Source 1
2. Add `MEETUP_REFRESH_TOKEN` for longevity
3. Use Playwright to scrape Luma calendar event details for Riga TechWeek
4. Re-check login.lt when site is accessible
