Build a “13 KB mindset” scraper: proxy basics for tiny, reliable data pulls

js13kGames trains you to ship a full game in 13 KB zipped. That hard cap forces clear tradeoffs, sharp tooling, and less waste. You can use the same mindset for web data pulls, even if your codebase grows past 13 KB.

A small scraper beats a big one when it runs each day, logs cleanly, and fails in ways you can fix. Teams often lose time on brittle runs, not on code size. Here, the goal stays simple: fewer moving parts, fewer bans, and data you can trust.

Start with a request budget, not a feature wish list

In js13k, you count bytes because 13 KB equals 13,312 bytes. In scraping, you count hits. Each hit costs time, risk, and money.

Pick a target per run and stick to it. For SEO rank checks, you may only need the top 20 results. For a price check, you may only need the SKU page and one stock call.

Cache what you can. Store raw HTML for a short time so you can re-parse without re-fetch. You gain debug power, too, like a tiny replay file in a postmortem.

Use the web’s own signals. Treat 200 as a win, 301 and 302 as a map update, 403 as a block, and 429 as a hard “slow down.”

Proxies help, but only if you treat them like a pool of assets

Game jams teach you to reuse sprites, tiles, and audio stabs. Proxies work the same way. You want a pool you can test, tag, and swap with low fuss.

Pick the proxy type that fits the job

Datacenter IPs run fast and cheap for low-friction sites. Many shops use them for broad crawl work where a miss does not hurt. Sites that guard search, price, or ticket pages may spot them fast.

Residential IPs blend in better because they come from real user nets. They cost more, so spend them on high-value pages. Mobile IPs can help with app-like flows, but they bring a higher bill and more churn.

Do not mix jobs in one pool. A pool that hits login walls should not also fetch your easy docs pages. Keep pools small and tagged so you can trace bad runs.

Health-check your pool like you test a build

Every jammer runs a build step before submit. Do the same for proxies before a run. Test DNS, TLS, and basic reach to your key host.

A quick proxy checker. can save hours of false leads when your parser “breaks” but the real fault sits in the network path.

Log three facts per request: exit IP, status code, and total time. Those three lines let you sort “site changed” from “we got blocked” in minutes.

Rotate with care. Rotation helps most when you pair it with rate control and clean headers. Rotation alone often just spreads failure across more IPs.

Keep your scraper small by splitting fetch, parse, and store

Small games often split logic into tight loops and tiny helpers. Scrapers should do the same. If you glue fetch and parse into one block, each site tweak breaks the whole run.

Use a thin fetch layer that only does HTTP, retries, and timeouts. Put parse code in pure functions that take HTML and return JSON. Keep storage dumb and strict, so it rejects junk early.

// Pure parse: no network, no dates, no globals
export function parsePrice(html) {
  const m = html.match(/"price":\s*"(\d+(\.\d+)?)"/);
  return m ? Number(m[1]) : null;
}

This style makes tests easy. You can stash a “known good” page and rerun parse on it after each tweak. That mirrors the js13k habit of keeping a tiny test scene to check perf and bugs.

Reliability comes from polite pacing and clear rules

Most scraping pain comes from speed, not volume. Even a small crawl can look like abuse if you burst hard. Use a steady pace, add jitter, and cap parallel work per host.

Follow site rules. Read robots.txt and respect it where it applies to your use. Review terms before you ship a job into prod, and get counsel when you need it.

Design for failure on purpose. Save partial results, and mark gaps with a reason code. That lets you re-run only what failed, like fixing one level bug without rebuilding the whole game.

Write a short run report. Include counts of 200, 3xx, 4xx, and 5xx, plus median time. Those numbers act like a jam score sheet for your pipeline, and they help you spot drift fast.

Use the jam mindset to keep ops costs and risk low

js13k posts often praise smart tradeoffs, like in “Tiny Games, Big Reach” and “Why That’s a Win.” Scraping work rewards the same thinking. You do not win by crawling more, but by crawling right.

Build the smallest system that answers one real question. Then harden it with logs, tests, and a proxy pool you can trust. When the target site shifts, you will patch in hours, not days.

🔙 Back to Articles list.