About

Methodology

How we collect, classify, and analyze hiring market data. Transparency about what our data covers and where it has limitations.

Data Sources

We aggregate job postings from 6 distinct sources, combining direct employer career pages with job board aggregators for comprehensive coverage.

ATS Sources (Direct)

  • *Greenhouse -- 450+ company career pages via browser automation
  • *Lever -- 180+ companies via public API
  • *Ashby -- 170+ companies via API (best salary data)
  • *Workable -- 135+ companies via API (workplace type)
  • *SmartRecruiters -- 35+ companies via API (experience level)

Aggregator Sources

  • *Adzuna -- Job board aggregator covering broad market activity

ATS sources provide higher-quality structured data (salary, location type, working arrangement). Adzuna provides broader coverage but with less structured metadata.


Geographic Coverage

Currently tracking 5 major hiring markets:

LondonNew York CityDenverSan FranciscoSingapore

Coverage varies by city. London and NYC have the deepest coverage; Singapore was added most recently.


Job Families

Reports cover three professional families:

Data & Analytics

Data engineers, analysts, scientists, ML engineers, analytics engineers, BI specialists

Product Management

Product managers, product owners, growth PMs, platform PMs, technical PMs

Project & Delivery

Project managers, delivery managers, programme managers, scrum masters, agile coaches


Classification

Every job posting passes through a multi-stage classification pipeline:

  1. 1.
    Pre-filtering: Title and location pattern matching reduces volume by ~95% before LLM processing, focusing only on relevant roles.
  2. 2.
    Agency detection: Known recruitment agencies are filtered out using a maintained blocklist, removing 10-15% of raw postings.
  3. 3.
    LLM classification: Gemini 2.5 Flash extracts structured fields: role subfamily, seniority level, skills, working arrangement, and track (IC/Management).
  4. 4.
    Deduplication: URL-based deduplication prevents the same posting from being counted multiple times across scraping runs.
  5. 5.
    Skill normalization: Raw skill mentions are mapped to a curated taxonomy using exact match and fuzzy normalization.

Quality Controls

1.
Agency filtering: Hard and soft blocklists remove recruitment agency postings that would inflate job counts without representing direct employer demand.
2.
URL validation: Automated 404 detection identifies dead links and expired postings, used as a freshness signal in the job feed.
3.
API freshness checking: For SPA-based career pages that always return 200, we use source-specific API checks to verify posting availability.
4.
Employer normalization: Display names are standardized across ATS sources to prevent the same employer from appearing under multiple names.

Limitations

  • 1.Coverage bias: Our ATS sources skew toward tech-forward, VC-backed, and mid-size companies. Large enterprises using Workday, Taleo, or internal systems are underrepresented.
  • 2.Compensation gaps: Salary data depends on employer disclosure. Markets without pay transparency legislation (most of Europe, Singapore) have much lower disclosure rates.
  • 3.Working arrangement data: Only available from ATS-sourced roles with structured fields. Adzuna-sourced roles lack this metadata.
  • 4.Classification accuracy: LLM classification is not perfect. Edge cases (hybrid PM/engineer roles, ambiguous seniority) may be misclassified.
  • 5.Temporal coverage: Reports reflect a snapshot of a single month. Month-over-month changes from a single data point should be interpreted cautiously.

Current Dataset

7,944+

Jobs tracked

4,453+

Employers

5

Cities

19

Reports