Job Screening System

Problem & context

45 minutes a day on a task that shouldn't exist

As an active PM job seeker in Melbourne, I was spending 45+ minutes daily scanning Seek and Indeed manually. The signal-to-noise ratio was low: most listings were physical product managers, pure sales roles, or jobs requiring clinical registration. The process was repetitive, inconsistent, and unscalable.

The goal wasn't to automate the decision to apply — that still requires human judgment. The goal was to eliminate the work that precedes the decision: finding, filtering, and prioritising.

Solution · System design

A pipeline, not a product

The right mental model was a decision-support pipeline, not a job platform. No UI, no database, no user accounts. The fastest path to value was: fetch → filter → score → alert.

Seek + Indeed → Apify scraper → Dedup layer

↓

5-layer hard filter → AI scorer JD extraction → background match → score/11

↓

Google Sheets + Gmail (3-tier) + WhatsApp ≥8

Product thinking · Key decisions

Every rule is a tradeoff

The decisions that shaped the product weren't technical — they were product decisions about scope, cost, precision, and explainability.

01

Apify over a custom crawler

External dependency accepted in exchange for speed and reliability. Building a crawler would take weeks; Apify took hours. The cost is $14/month — acceptable for an MVP.

Speed over control

02

Keyword-first, AI-confirm

Hard filter uses keyword triggers to flag suspicious roles, then OpenAI confirms intent. This eliminates 60%+ of listings cheaply before expensive AI calls.

Cost efficiency over simplicity

03

JD extraction over keyword matching

Scoring extracts 5–7 core JD requirements first, then matches against my real background. Generic skill matching always returned 4/5 — useless. Extraction-based scoring creates actual variance.

Accuracy over simplicity

04

WhatsApp for urgency, email for digest

Score ≥8 triggers WhatsApp immediately. All pass jobs go in daily email, sorted by tier. Two channels, two jobs: real-time alerting vs. structured review.

Engagement over simplicity

Build & iteration

Seven versions in one week

Each iteration fixed a real problem observed in production, not a hypothetical one.

v1

Indeed only · keyword filter

Prove the pipeline works end to end.

v2

Added Seek · fixed actor ID

Seek actor ID had l/I ambiguity. Broader coverage.

v3

Semantic pre-filter

OpenAI removes false positives — videographer, WHS specialist passing through.

v4

FTE & hours calculation

Part-time roles slipping through. Added FTE conversion and hour-per-week check.

v5

Scoring engine v1

First scoring model. Discovered skill match always returned 4/5 — generic keywords matched everything.

v6

JD extraction scoring

Extract 5–7 core JD requirements, match against real background. Scores now have real variance.

v7

Production-ready

WhatsApp alerts, 3-tier email, weekday-only, daily dedup, error notifications.

Prove the pipeline works end to end. Established the core fetch → filter → notify loop.

Seek actor ID had an l/I character ambiguity causing 404 errors. Fixed and added second data source for broader coverage.

Videographer and WHS specialist roles were passing the filter because companies mentioned "wellbeing." Added OpenAI semantic classification to remove non-relevant roles before the hard filter.

Part-time roles were slipping through. Added FTE conversion (0.53 FTE threshold) and hour-per-week parsing to correctly block over-20h/week roles.

v5 skill match always returned 4/5 — generic PM keywords matched every JD. Redesigned in v6: extract 5–7 core JD requirements first, then match against real candidate background. Scores now reflect actual fit.

Added WhatsApp alerts for score ≥8, 3-tier email layout, weekday-only scheduling, 14-day deduplication window, and crash notification emails.

Reflection · What I learned

Three things that surprised me

1

Hard filter design is product design.

Every rule is a precision/recall tradeoff. Too strict and you miss good roles; too loose and the signal drowns in noise. Writing each filter rule forced me to be explicit about my own constraints and values as a candidate.

2

AI scoring needs a reference point, not a keyword list.

Generic skill keywords match every PM job description — useless. Useful scoring requires a fixed candidate profile as an anchor: real experience, real gaps. The model only became meaningful when it had something specific to compare against.

3

Operational reliability is a product decision.

Scheduling, error notification emails, daily-run-once state, deduplication windows — these feel like engineering details but they're product decisions. A pipeline that silently fails is worse than no pipeline.