← Todos os Posts

What I Run Before I Sleep

Every night before I close my laptop, I run one command. It checks 15,000 pages across my production site: every blog post, every company page, every programmatic guide, every locale variant. For each page, it verifies HTTP status, measures TTFB, checks Cloudflare cache status, and logs any errors. The comprehensive check takes about 6 hours to complete. It runs in the background while I sleep.

I also run a quick critical path check that finishes in 2 minutes: health endpoints, sitemaps, revenue pages, S-tier company hubs, market pages, programmatic content indexes, i18n blog archives. This one I watch. If anything fails, I investigate before I sleep.

The routine produces a report that looks like this:

NIGHTCHECK -- 2026-03-27 (morning)
Overnight comprehensive: 16,228/16,602 ok (97.7%)
Avg TTFB: 715ms
S-tier companies: ALL PASS, ALL HIT
Markets: ALL PASS (89-175ms HIT)
i18n blogs: es 95ms HIT | ja 196ms HIT | de 181ms HIT

Nobody asks me to do this. No ticket requires it. No sprint includes “run nightcheck.” The routine exists because I care about whether the site works when I am not looking at it.

Why Nightly

The site changes every day. In a single day, I might deploy 87 commits: i18n translations, crawl provider updates, CRO experiments, performance fixes, logo remediations. Each commit is tested individually. But the composition of 87 commits can produce failures that no individual commit reveals.

A translation batch might push the blog sitemap past a rendering threshold. A new provider might introduce a company page that takes 14 seconds to render. A cache header change might make Cloudflare stop caching a route that was previously fast. A CSS refactor might break a template in one locale but not others.

The nightcheck catches composition failures. Not “did this commit break something” but “does the site work after everything today landed.” The distinction matters because composition failures are invisible in CI. Each commit passes its own tests. The aggregate behavior of 87 commits passing their own tests does not guarantee the aggregate system works.

What Gets Measured

The check has four tiers:

P0 Infrastructure. Health endpoint returns healthy with database connected. Sitemaps return valid XML. Robots.txt and llms.txt are present. RSS feed is valid. These are the things that, if broken, make the site invisible to search engines.

P0 Revenue. Homepage, resume builder, analyzer, and pricing page all load. These are the pages that, if broken, directly cost money. The resume builder is historically the slowest page and gets a WARN threshold at 5 seconds.

P1 SEO Surface. Blog archive, pillar hub pages, company directory, job browse, all five programmatic content indexes (resume guides, salary guides, cover letter guides, interview questions, ATS keywords), four i18n blog samples, and all 20 S-tier company hubs. These are the pages that drive organic traffic. A 404 on the Google company page is an SEO incident.

P2 Comprehensive. Every URL in the blog and company sitemaps. This is the 6-hour background check. It catches the long-tail failures: a single blog post that 500-errors because of a malformed citation, a company page that times out because of an N+1 query on a large company.

Each page is checked with curl using a realistic User-Agent header. Bare curl gets 403’d by Cloudflare. The cache status header is captured alongside the HTTP status and TTFB. A page can return 200 but show DYNAMIC when it should be HIT, which means the cache rule is misconfigured. The TTFB measurement is the real server-side latency, not the browser rendering time.

The TTFB Trend

I have been running this check since March 2026. The TTFB trend tells the performance story:

Date Avg TTFB Worst Page Cache Status
Mar 21 (post-purge) 1,156ms Austin market 14,290ms ALL DYNAMIC
Mar 23 (cache live) 958ms markets 2-3s Most HIT
Mar 25 (query fix) 715ms ats-optimization 6.3s All HIT
Mar 27 (stable) 715ms zh-hans/blog 3.7s 34/36 HIT

The trend captures the market page performance journey: cache purge exposed the problem (14.3s), edge caching masked it (89-175ms HIT), the query shape fix resolved the underlying cause (108ms origin). Without the nightly trend, I would have believed the edge cache was the fix. The TTFB measurements proved that the origin render was still slow during revalidation, which justified the query refactor.

The nightcheck did not fix the performance problem. It made the performance problem measurable, which made it fixable.

What Nobody Sees

The most valuable thing about the nightcheck is that it runs when nobody is watching. There is no Slack notification for “16,228 pages passed overnight.” There is no dashboard that turns green. The report exists in a log file that I read over coffee the next morning.

The absence of ceremony is the point. The nightcheck is not performative quality. It is not a metric for a standup. It is a private discipline: did everything work while I was asleep? If yes, good. If no, I know exactly what failed and when.

The discipline compounds. Each night’s check establishes a baseline for tomorrow’s comparison. A TTFB increase of 50ms on a specific page triggers investigation, not because 50ms matters to users, but because the increase indicates something changed. Maybe a new dependency added latency. Maybe a cache rule expired. Maybe a database query grew with the dataset. The nightly baseline makes drift visible before it becomes a problem.

This is compound context applied to operations. Each night’s check deposits a data point. The data points accumulate into a trend. The trend makes problems visible that no single check would reveal.

The Routine Is the Standard

I could automate the nightcheck into a cron job and check a dashboard. I choose to run it manually because the act of running it maintains the habit of caring. The moment the check becomes someone else’s job, or a notification I dismiss, or a dashboard I stop checking, the standard erodes.

The standard is not “the site passes automated checks.” The standard is “I personally verified that the site works before I went to sleep.” The difference is accountability. An automated check that fails silently is a bug in the automation. A manual check that I skip is a choice I made about what I care about.

I run it every night because the alternative is trusting that everything still works. Trust without verification is hope. Hope is not an operations strategy.


FAQ

How long does the full check take?

The quick critical path check takes 2-3 minutes (36 pages with TTFB and cache measurement). The comprehensive sitemap check takes 5-7 hours (15,000+ pages). The quick check runs synchronously; the comprehensive check runs in the background.

Why not use an uptime monitoring service?

Uptime services check whether a page returns 200. The nightcheck checks whether a page returns 200 with the correct cache status, acceptable TTFB, and expected content structure. A page that returns 200 but takes 14 seconds with DYNAMIC cache is technically up and operationally broken.

What happens when something fails?

I investigate before sleeping if the failure is on a revenue or infrastructure page. For comprehensive failures, I review the log the next morning and prioritize by page type. A failing blog post is lower priority than a failing company hub. A DYNAMIC cache on a high-traffic page is higher priority than a slow TTFB on a low-traffic page.

Does this scale?

15,000 pages is the current scale. The comprehensive check is I/O-bound (sequential curl requests), not compute-bound. Doubling the page count doubles the runtime. At 30,000 pages, the check would take 12 hours, which still fits in an overnight window. Beyond that, parallel checking with rate limiting would be necessary.

Artigos relacionados

Overnight

Between midnight and 6am, Googlebot crawls 21,000 pages, Bingbot crawls 10,000, and the comprehensive check grinds throu…

7 min de leitura

Quality Is the Only Variable

Time, cost, resources, and effort are not constraints. The question is what's right, not what's efficient. A philosophy …

7 min de leitura

Your Agent Writes Faster Than You Can Read

Five research groups published about the same problem this week: AI agents produce code faster than developers can under…

16 min de leitura