Engineering Diary, Day 14: The Sitemap That Lied to Google, a Security Audit, and 3,680 URLs Finally Visible
The 87% Problem
It started with Google Search Console screenshots from the CEO. The numbers were brutal:
| Metric | Value |
|---|---|
| Total clicks | 13 |
| Indexed pages | 267 |
| Not indexed | 2,028 |
| Invalid Merchant Listings | 23 |
| Core Web Vitals | No data |
87% of our pages were invisible to Google. We had 3,400+ URLs in our sitemap, but Google was only indexing 267 of them. And 23 pages had invalid Merchant Listing structured data. Today was about fixing all of it — plus a long-overdue security audit.
Phase 1: Security Hardening
Before touching SEO, we addressed four security issues found during an infrastructure audit:
Distributed Rate Limiting
Our rate limiter was in-memory — a Map<string, number[]>. On Vercel's serverless architecture, every function invocation gets a fresh memory space. An attacker could bypass rate limits simply by hitting different instances. We rewrote it to use Vercel KV (Upstash Redis) via raw REST API — zero new npm dependencies:
| Aspect | Before | After |
|---|---|---|
| Storage | In-memory Map | Vercel KV (Redis) + in-memory fallback |
| API | Synchronous | Async (await) |
| Consistency | Per-instance only | Global across all instances |
| Dependencies | None | None (raw fetch to Upstash REST) |
| API routes updated | — | 9 files (all check() calls → await) |
CSP Hardening
Removed unsafe-eval from Content Security Policy. Added explicit whitelisting for Google Analytics 4 domains: googletagmanager.com and google-analytics.com. This blocks XSS attacks that rely on eval() while keeping analytics functional.
JWT Refresh Secret
The refresh token was signed using the access token secret with a simple string concatenation (secret + "_refresh"). If the access secret leaked, refresh tokens were compromised too. Now uses an independent JWT_REFRESH_SECRET environment variable with a stronger fallback derivation.
Password Policy
Added OWASP-compliant special character requirement to password validation. The regex: /[!@#$%^&*()_+\-=[\]{};':"\\|,.<>/?`~]/. Five validation rules now enforced: length, uppercase, lowercase, digit, special character.
Phase 2: SEO Schema Surgery
The 23 invalid Merchant Listings had a clear root cause: we were applying shipping and return policies to a service business. Private jet charter is a service, not a physical product you ship in a box. Google's Merchant Listing validator was rejecting every page with hasMerchantReturnPolicy and shippingDetails.
The fix was surgical — remove all shipping/return schema from 3 files:
JsonLd.tsx— DeletedMERCHANT_RETURN_POLICYandSHIPPING_DETAILSconstants entirelyfleet/[slug]/page.tsx— Removed inline shipping/return from Product schemaCatalogDetailPage.tsx— Same treatment
Additional schema fixes in the same commit:
- Added
datePublishedtoSAMPLE_REVIEW(required by Google) - Removed invalid nested
offers[]insideAggregateOfferon route price pages - Removed invalid
unitCode: "C62"from Passengers property - Added
x-defaulthreflang to all 3,680 URLs viabuildAlternates()
Phase 3: The Sitemap Crisis
This is where it got interesting. Our deep audit showed that technically everything was correct — all 13 dynamic routes had generateStaticParams, the sitemap was comprehensive, robots.txt was fine, no noindex tags. So why was Google ignoring 87% of our pages?
Three root causes:
1. lastModified Was Lying
Every single one of our 3,340 sitemap entries used new Date() for lastModified. Every time Google crawled the sitemap, we told it: "All 3,340 pages were modified right now." Google's crawler has limited budget. When everything claims to be fresh, nothing gets priority. We replaced new Date() with a stable deploy date (2026-03-03), and used actual post.date for blog articles and report.publishDate for insight reports.
2. The generateSitemaps() Bug
We tried to split the sitemap into 7 categorized chunks using Next.js 16's generateSitemaps() API. The idea was sound — let Google process smaller, focused sitemaps for fleet (908 URLs), airports+FBOs (1,024 URLs), etc. instead of one massive file.
The build succeeded. The build output showed 7 sitemap files. The tests passed. But every single .xml.body file was exactly 110 bytes — an empty <urlset/> with zero <url> entries.
The issue? Next.js 16.1.6 calls the sitemap function during SSG pre-rendering, but generateSitemaps() produced empty XML for all chunks. Our unit tests passed because they called sitemap({ id: 0 }) directly with numeric ids. The build's SSG pipeline did... something else. We spent two commits debugging this (including a Number(id) coercion attempt) before accepting the pragmatic fix: revert to a single-file sitemap.
3,680 URLs in one file (2.2MB) is well within Google's 50,000-URL limit. The split was an optimization, not a requirement.
3. IndexNow Was Whispering
Our IndexNow endpoint was only submitting 14 core pages. IndexNow supports up to 10,000 URLs per request. We expanded it to auto-collect all content URLs across 11 categories (fleet, routes, destinations, airports, FBOs, operators, empty-legs, blog, case-studies, insights, static pages) with batch processing.
Phase 4: Internal Cross-Linking
Google discovers pages through links, not just sitemaps. Our audit found that aircraft detail pages had an "Ideal Routes" section showing city pairs like New York (TEB) → Miami (OPF) — but they were pure text. No links. 200+ aircraft pages × 4 routes each = 800+ missed internal links.
We converted IdealRoutes from display-only <div> elements to clickable <Link> cards pointing to /contact?from=CityA&to=CityB&aircraft=Name, plus a "Browse All Routes" CTA linking to /routes. This creates 800+ new internal links distributing PageRank from aircraft pages to the routes directory.
The Numbers
| Metric | Before | After |
|---|---|---|
| Sitemap URLs | 3,340 (empty on deploy) | 3,680 (valid XML verified) |
| Invalid Merchant Listings | 23 | 0 |
| hreflang x-default | Missing | All 3,680 URLs |
| lastModified accuracy | All "today" | Stable deploy date + real dates |
| IndexNow coverage | 14 pages | ~900 content URLs |
| Internal cross-links added | 0 | 800+ |
| Rate limiter | In-memory (broken on serverless) | Vercel KV + fallback |
| CSP unsafe-eval | Present | Removed |
| API routes updated | — | 9 |
| Commits | — | 5 |
| Files changed | — | 19 |
Commits
5cda031 — Security: distributed rate limiting, CSP hardening, JWT refresh secret, password policy.
26bad98 — SEO: fix Merchant Listing errors, add hreflang x-default, clean structured data.
77015ea — SEO: split sitemap index, fix lastModified, expand IndexNow, add cross-linking.
0836387 → 7c4408d — Fix: revert generateSitemaps() after discovering Next.js 16 empty XML bug.
The scariest bugs aren't the ones that throw errors — they're the ones that succeed silently. Our sitemap built successfully, passed all tests, deployed without warnings, and served valid XML to every crawler that asked. The XML just happened to contain zero URLs. Google dutifully read our empty sitemap and indexed exactly what we told it to: nothing. Two hours of debugging a green build to discover that Next.js 16's generateSitemaps() produces empty urlsets during SSG. The lesson: always curl your own production URLs.
Stay Informed
Empty leg deals, new routes, and aviation insights — delivered to your inbox.