Inside the curation pipeline: how a headline gets into the game

Published 13 May 2026

The most common question we get is some version of “how do you actually choose the headlines?” The short answer is “a Python pipeline plus a human reviewer.” The longer answer is below.

Step 1: RSS ingestion, twice a day

We poll RSS feeds from roughly 30 publications organised across eight categories — TV & Film, Celebrity, Tech, Sport, Science & Health, Business, World News, and Food & Lifestyle. The scraper runs twice a day, deduplicating by URL against the database so we never process the same article twice.

For each new article we fetch the full HTML and extract the title, the OpenGraph image, the meta description, the URL slug, the publication’s own category tag, and — most importantly — the full article body. Many clickbait articles deliberately bury the actual point deep in the text, so the LLM needs the whole article to know what the piece really says.

Step 2: LLM processing

Each scraped article is sent to Claude, which returns a structured response containing the correct answer (one sentence describing what the article actually says), three plausible distractors, two or three keywords for the Stage 2 clue, a category, a 1–10 spectrum rating, and a quality score.

The distractors are the hardest part. They have to be plausible enough that a player could reasonably believe any of them given only the headline, but wrong. We tell the LLM to generate distractors in the same general domain as the correct answer and to avoid anything obviously absurd. We regenerate distractors during curation if they aren’t convincing.

The quality score (1–5) is the LLM’s assessment of how good a game question this headline is. 1 means the headline gives the answer away. 5 means the headline is maximally ambiguous. Anything below 3 gets auto-rejected before a human sees it.

Step 3: human curation

A reviewer (currently a person, eventually a small team) works through the processed queue once a day. The review interface shows the headline, the hero image, the proposed correct answer, the three distractors, the keywords, the category, and a link to the original article. The reviewer can approve, reject, edit any field and approve, or regenerate the distractors. Bulk approve and reject are also available for clear-cut cases.

Target throughput is roughly 30–50 review decisions per day, with around 10–20 approvals. That keeps the live pool of playable headlines at 50–100 at any time, which is enough variety that repeats are rare.

What gets rejected

Common rejection reasons:

Headline gives the article away — no ambiguity, no puzzle.
Distractors are too weak — you could pick the correct one by elimination.
Article is too thin — one paragraph, no real story to be misled about.
Content is offensive, gory, or unsuitable for general audiences.
Duplicate of an already-approved headline (same story, different publication).

Step 4: publishing

Approved headlines get pushed to the production Supabase database in batches. The pipeline tries to maintain category balance — if Tech is short, it prioritises Tech approvals; if Science & Health is well-stocked, it slows down. Headlines older than 30 days get archived so the live pool stays fresh.

What this means for the game

Two things follow from this setup. First, the content you play with is real. With the exception of the small AI-generated fallback pool for under-supplied categories, every headline came from a publication you could read directly. Second, the puzzles are genuinely puzzles — we’ve rejected the obvious ones before you saw them. If a Stage 1 guess feels hard, that’s by design.

Over time the pipeline gets smarter. The LLM’s quality scores get calibrated against actual player performance (headlines where most players miss Stage 1 are good; headlines where most players nail it on speed alone are too easy). Distractors that nobody ever picks get flagged for regeneration. Categories with thin coverage get more aggressive scraping. The game gets better as more people play it — not because of network effects, but because the editorial signal sharpens.

Practice what you just read

For Clicks Sake is a game built on real clickbait. The faster you can guess what the article actually says, the more you score.

Play a Quick Game →

More from the blog

Tips

Five quick rules for not getting fooled by online news

Industry

Why vague headlines win: an attention economy primer

Media literacy

The anatomy of a clickbait headline