Skip to main content
Back to Canvas
BlogJanuary 27, 2026verceldnshttpssslcdnhstsdevops

Debugging a Vercel HTTPS Incident: When CDN Edge Nodes Serve Expired Certificates

A debugging case where a Vercel site became inaccessible due to expired CDN edge certificates — despite DNS and certs looking correct in the dashboard.

Background

I host my personal blog on Vercel behind a custom domain: blogkangwei.com and its www.blogkangwei.com subdomain.

One day, Chrome suddenly started showing:

Your connection is not private, NET::ERR_CERT_DATE_INVALID

Because the site had HSTS enabled, the browser did not allow bypassing the warning — the site became completely inaccessible.

At first glance, this looked like a simple DNS or certificate misconfiguration.


What Actually Happened

What I saw: I checked Chrome plus a couple of online SSL checkers, and the results didn’t line up between the apex and www. blogkangwei.com resolved to 216.198.79.1 and presented a valid certificate expiring in roughly 45 days. But www.blogkangwei.com resolved to 66.33.60.193 and presented a certificate that was expired (about 77 days past).

That’s how I ended up with the weird situation where the apex domain was fine, but the www subdomain was getting an expired certificate from somewhere on the edge.

Why it was confusing: Vercel’s dashboard looked totally healthy. Both domains showed as configured correctly, my DNS matched what Vercel recommended, I had Let’s Encrypt CAA records in place, and the certificate for the domain had already been reissued. Yet some users (including me) were still getting the expired cert.

What it turned out to be: Vercel uses an Anycast CDN with many edge nodes worldwide. Some nodes were still serving an old, expired certificate for www.blogkangwei.com, while other nodes were already serving the new, valid one. Which version you saw depended on your resolver, network path, and geographic routing.

The SSL checker evidence made that pretty clear: blogkangwei.com consistently showed a valid certificate, while www.blogkangwei.com showed an expired one — and crucially it was tied to a different IP/edge route. So the issue wasn’t DNS, CAA, or issuance; it was stale CDN edge state. And with HSTS enabled, “a subset of users see a cert error” effectively becomes “the site is down” for those users.


The Fix

Instead of trying to “repair” the broken edge node, I changed how traffic flowed through the system. I made www.blogkangwei.com the primary domain, and set blogkangwei.com → 308 redirect → www.blogkangwei.com.

Then I updated DNS so www.blogkangwei.com used the current Vercel-recommended CNAME target. The practical effect was that requests stopped getting routed to the problematic historical edge IP (66.33.60.193).

Result:

Traffic is now routed only to healthy nodes with fresh certificates.

The site immediately became accessible again.


Takeaways

This did not fix the broken edge node.
It bypassed it.

In large CDNs, certificate rollouts aren’t always a single atomic “flip.” They’re asynchronous, incremental, and eventually consistent, which means a small number of nodes can miss an update or get stuck serving stale state longer than you’d expect. It’s rare, but it’s not impossible.

The big lessons for me were:

  • HSTS is unforgiving: a certificate issue isn’t a warning, it’s an outage.
  • “Dashboard says OK” doesn’t mean every edge node is OK.
  • DNS can be correct while TLS is broken at the edge.
  • Sometimes the right fix is routing around the failure instead of trying to “fix” a stuck node.

If I wanted to mitigate this more proactively next time, I’d consider delegating DNS to Vercel (so routing + TLS are fully under their control) and periodically testing TLS from multiple regions.

If this happens again, here’s the quick sanity checklist:

  • Check the apex and www separately (they can behave differently).
  • Compare the resolved IPs you’re getting from different networks/resolvers.
  • Inspect the certificate chain per IP/route, not just per hostname.
  • Don’t assume “CDN” means a single globally consistent state.

Timeline (in one sentence): an old certificate expired on some edge nodes, a new one was issued and deployed but not everywhere, some users kept landing on stale nodes (hard-failing due to HSTS), and switching the primary domain path changed routing and restored access immediately.