Skip to content

Migrate to archive.sparkpost.com with host-conditional noindex#855

Merged
PauloJeunonSousa merged 4 commits into
mainfrom
archive-sparkpost-noindex
Jun 26, 2026
Merged

Migrate to archive.sparkpost.com with host-conditional noindex#855
PauloJeunonSousa merged 4 commits into
mainfrom
archive-sparkpost-noindex

Conversation

@PauloJeunonSousa

@PauloJeunonSousa PauloJeunonSousa commented Jun 25, 2026

Copy link
Copy Markdown
Contributor

Summary

Migrate the docs site to archive.sparkpost.com so the existing support.sparkpost.com hostname is free to be repointed at the new CloudFront redirect distribution (messagebird-dev/bird#2382, merged) that 301s every legacy URL to its closest bird.com counterpart. End goal: consolidate SEO link equity onto bird.com instead of fragmenting between two sites.

This PR is half of a two-part change. The other half is the CloudFront distribution (bird#2382). They have to land together for SEO — landing this PR alone (or far ahead of CloudFront) would deindex support.sparkpost.com while it still has all its inbound Google traffic, losing link equity that would otherwise transfer to bird.com. The edge function added here exists specifically to bridge that gap so the two pieces don't have to land simultaneously.

Changes

1. Base URL change + noindex (for archive.sparkpost.com)

The new archive hostname needs to be configured everywhere and emit clear noindex signals so search engines drop it (and stop competing with bird.com).

  • next-sitemap.js: default siteUrlhttps://archive.sparkpost.com; robots.txt generation adds Disallow: / to the * policy.
  • components/site/seo.tsx: <meta name="robots"> content → noindex, nofollow (was index, follow, max-image-preview:large, ...).
  • netlify.toml: add X-Robots-Tag: noindex, nofollow to the global [[headers]] block; CSP font-src support.sparkpost.comarchive.sparkpost.com (the prior value was hardcoded but functionally unused).

2. Host-conditional noindex (preserves support.sparkpost.com indexability during the gap)

Without this, the changes above would apply equally to both hostnames the Netlify deploy serves — including the current production support.sparkpost.com. That's incorrect during the period between this PR landing and the CloudFront cutover: support.sparkpost.com still has all its Google traffic, and deindexing it before the 301s exist loses equity rather than transferring it.

  • netlify/edge-functions/host-conditional-noindex.ts (new) — a Netlify Edge Function that runs on every request and, for Host: support.sparkpost.com only:
    • strips X-Robots-Tag from response headers
    • strips <meta name="robots" content="noindex,..."> from HTML responses
    • overrides /robots.txt with a permissive version (User-agent: *, no Disallow)

archive.sparkpost.com and all other hosts (deploy previews, *.netlify.app, etc.) pass through unmodified — noindex stays in place.

After CloudFront cutover (out of scope here)

Once support.sparkpost.com DNS flips to the CloudFront distribution, Netlify never sees those requests again — every viewer hits CloudFront and gets 301'd to bird.com. At that point the edge function is dead code on the Netlify side. Remove it (and the netlify.toml comment pointing at it) in a small follow-up PR.

Test plan

After deploy preview is up:

  • curl -I https://archive.sparkpost.com/docs/ → response includes X-Robots-Tag: noindex, nofollow
  • curl -I https://support.sparkpost.com/docs/ → response does NOT include X-Robots-Tag
  • curl -s https://archive.sparkpost.com/robots.txt → contains Disallow: /
  • curl -s https://support.sparkpost.com/robots.txt → permissive (User-agent: *), no Disallow
  • View source on https://archive.sparkpost.com/docs/... → contains <meta name="robots" content="noindex, nofollow">
  • View source on https://support.sparkpost.com/docs/... → does NOT contain that meta tag
  • Cypress suite passes in CI (no test asserts on the old robots meta string)

🤖 Generated with Claude Code


Note

Medium Risk
SEO and crawl behavior depend on correct Host-header branching in the edge function; a bug could deindex support.sparkpost.com or leave the archive indexed.

Overview
Repoints the docs site’s default canonical URL to archive.sparkpost.com and applies noindex everywhere the build emits SEO signals: robots meta in seo.tsx, global X-Robots-Tag in netlify.toml, Disallow: / in generated robots.txt via next-sitemap.js, plus CSP font-src updated from support.sparkpost.com to archive.sparkpost.com.

Because one Netlify deploy still serves support.sparkpost.com until CloudFront/DNS cutover, a new Netlify edge function (host-conditional-noindex.ts) runs only for that host: it serves a permissive /robots.txt, sets X-Robots-Tag: all, and strips the noindex robots <meta> from HTML so production support URLs stay crawlable and don’t lose equity before 301s to bird.com. archive.sparkpost.com and other hosts pass through with noindex unchanged.

tsconfig.json excludes the netlify/ folder so edge TypeScript isn’t typechecked with the Next app.

Reviewed by Cursor Bugbot for commit 54d49ea. Bugbot is set up for automated code reviews on this repo. Configure here.

Moves the docs site from support.sparkpost.com to archive.sparkpost.com
so the content can be archived (and later 301'd to bird.com) without
competing with bird.com in search. Adds noindex/nofollow at three layers
(robots.txt, <meta name="robots">, X-Robots-Tag header) so search engines
drop the archive.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@netlify

netlify Bot commented Jun 25, 2026

Copy link
Copy Markdown

Deploy Preview for support-docs ready!

Name Link
🔨 Latest commit 54d49ea
🔍 Latest deploy log https://app.netlify.com/projects/support-docs/deploys/6a3ee4d598178f000863e2d1
😎 Deploy Preview https://deploy-preview-855--support-docs.netlify.app
📱 Preview on mobile
Toggle QR Code...

QR Code

Use your smartphone camera to open QR code link.

To edit notification comments on pull requests, go to your Netlify project configuration.

The base-URL + noindex change in this branch applies to every hostname this
deploy serves. During the gap between this PR landing and the CloudFront
redirect distribution going live for support.sparkpost.com, that's incorrect
for SEO: archive.sparkpost.com SHOULD be noindex (it's the archived copy),
but support.sparkpost.com SHOULD still be indexed so its existing link equity
is preserved until CloudFront can 301 it to bird.com counterparts.

If support is noindex'd during the gap, Google can't see the eventual 301s
(robots.txt Disallow blocks re-crawl), so equity that would otherwise
transfer to bird.com is lost instead. The longer the gap, the more decays.

Add a Netlify Edge Function that runs on every request and, for requests
with Host: support.sparkpost.com:
  - serves a permissive /robots.txt (no Disallow:/)
  - strips X-Robots-Tag from response headers
  - strips the <meta name="robots" content="noindex,..."> tag from HTML

archive.sparkpost.com and all other hosts (deploy previews, *.netlify.app)
pass through unmodified — noindex stays in place.

REMOVE this function (and the netlify.toml comment pointing at it) once the
CloudFront cutover is complete. After that, support.sparkpost.com no longer
hits Netlify, so the hostname-conditional logic becomes dead code.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@PauloJeunonSousa PauloJeunonSousa changed the title Switch base URL to archive.sparkpost.com and noindex the site Migrate to archive.sparkpost.com with host-conditional noindex Jun 26, 2026

@cursor cursor Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, enable autofix in the Cursor dashboard.

Want reviews to match your repository better? Bugbot Learning can learn team-specific rules from PR activity. A team admin can enable Learning in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit 56fa499. Configure here.

Comment thread netlify/edge-functions/host-conditional-noindex.ts
Discovered via local netlify dev testing: `response.headers.delete('X-Robots-Tag')`
is silently ignored when the header is set in netlify.toml's [[headers]] block.
Netlify re-applies the netlify.toml headers after the edge function returns,
overriding any deletions — but it respects values the function explicitly sets.

Switch from delete() to set('X-Robots-Tag', 'all'). Google treats X-Robots-Tag:
all as equivalent to no header (canonical "ignore any prior noindex; index
normally"), so the effect is identical to the intended deletion. Other header
sets (X-Edge-Probe sentinel during testing) confirmed mutations survive — only
deletes of netlify.toml-sourced headers are eaten.

Verified locally with netlify dev across the full host × content-type matrix:
- archive.sparkpost.com HTML: X-Robots-Tag: noindex, nofollow + meta intact ✓
- support.sparkpost.com HTML: X-Robots-Tag: all + meta stripped ✓
- archive.sparkpost.com /robots.txt: Disallow: / served as built ✓
- support.sparkpost.com /robots.txt: permissive (User-agent: *) ✓
- support.sparkpost.com JSON: X-Robots-Tag: all + body untouched ✓
- archive.sparkpost.com JSON: X-Robots-Tag: noindex + body untouched ✓
- Deploy preview host (deploy-preview-*.netlify.app): noindex preserved ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…m tsc

Two issues flagged in the PR:

1. Cursor Bugbot (high severity): after rewriting an HTML response body in
   the edge function for support.sparkpost.com, only `content-length` was
   removed from the copied headers. If the origin's response was gzip- or
   brotli-encoded (which Netlify's CDN does opportunistically), the
   `response.text()` call decoded the body to plain text but the stale
   `Content-Encoding` header survived. Clients would then try to decompress
   plain text and fail. Fix: also delete `content-encoding` when rebuilding
   the response.

2. Build failure under tsc: `import type { Context } from '@netlify/edge-functions'`
   fails Next.js's TypeScript check because the package is only present in
   the Netlify edge build environment, not in node_modules. Edge functions
   run on Deno with Netlify-provided globals, not Node, so they're a
   separate compilation unit. Add `netlify` to tsconfig.json's `exclude` so
   `next build` (and the Cypress CI's prebuild) stops scanning the
   directory. Netlify's edge function build still typechecks the files on
   the server side with the correct types.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@PauloJeunonSousa PauloJeunonSousa merged commit 5b4c068 into main Jun 26, 2026
5 of 6 checks passed
@PauloJeunonSousa PauloJeunonSousa deleted the archive-sparkpost-noindex branch June 26, 2026 21:14
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants