How to Create Dynamic Sitemaps with Contentful, Next.js Without Webhooks

2022-10-14

Creating a sitemap.xml file was something that always nagged at me when working with headless content management systems. "What do you mean Contentful doesn't do sitemaps?!" my SEO colleagues would say--not understanding what headless means fundamentally. This was one thing that the old monolithic systems like wordpress seemed to have in the bag.

My Early Approaches

A year ago, I worked out an initial solution that involved using a chron job to create the file regularly. Sadly most cloud hosting providers (Heroku & Vercel) don't allow for adding files after the build is deployed so you now have to save this to a CDN like Amazon S3.

I later tried an approach that moved building the sitemap being triggered off of a webhook on every publish event inside of Contentful. The problem with this is that you have to make sure you are saving to the same URL inside S3 and that you still have the same added S3 dependency.

You could do a full rebuild on every webhook event to save the file which, is something many static site evangelists are comfortable with. However, as your site gets larger (and maybe handles lots of money), having builds happen at the drop of a hat just makes me uneasy. It's just more moving parts to worry about. There had to be a better way, I wanted to keep my site dynamic with a good cache, and ensure builds only happen for code changes not content changes. I also wanted to ditch the extra S3 dependency.

The New Method

Thankfully, Next.js can do this inside it's getInitialProps hook and serve up the XML file easily. You can setup the sitemap page, have it build on the server, set it and forget it.

First create the sitemap.js file inside of the pages directory.

touch ./pages/sitemap.js

Install the xmlbuilder package:

npm install xmlbuilder or yarn add xmlbuilder whichever you prefer.

Then configure the following to your liking based upon your Contentful models. I use a pages and articles model here as examples but you may have many more.

import { createClient } from '../services/contentful';
import * as builder from 'xmlbuilder';

const rootUrl = 'https://yourhomepage.com';

const buildUrlObject = (path, updatedAt) => {
  return {
    'loc': { '#text': `${rootUrl}${path}` },
    'lastmod': { '#text': updatedAt.split('T')[0] },
    'changefreq': { '#text': 'daily' },
    'priority': { '#text': '1.0' }
  }
}
const Sitemap = () => ( null );

Sitemap.getInitialProps = async ({ res }) => {
  try {
    const client = createClient();

    const pages = await client.getEntries({ 
      content_type: 'page', 
      limit: 1000,
      include: 1 
    });

    const articles = await client.getEntries({ 
      content_type: 'article', 
      limit: 1000,
      include: 1 
    });

    let feedObject = {
      'urlset': {
        '@xmlns': 'http://www.sitemaps.org/schemas/sitemap/0.9',
        '@xmlns:image': 'http://www.google.com/schemas/sitemap-image/1.1',
        'url': []
      }
    }

    for (const item of pages.items) {
      if (typeof item.fields.slug !== 'undefined') {
        feedObject.urlset.url.push(
          buildUrlObject(`/${item.fields.slug === 'index' ? '' : item.fields.slug}`, item.sys.updatedAt)
        );
      }
    }

    for (const item of articles.items) {
      if (typeof item.fields.slug !== 'undefined') {
        feedObject.urlset.url.push(
          buildUrlObject(`/blog/${item.fields.slug}`, item.sys.updatedAt)
        );
      }
    }

    for (const item of posts.items) {
      if (typeof item.fields !== 'undefined') {
        feedObject.urlset.url.push(
          buildUrlObject(`/the-salon/${item.fields.slug === 'index' ? '' : item.fields.slug}`, item.sys.updatedAt)
        );
      }
    }

    const sitemap = builder.create(feedObject, { encoding: 'utf-8' });

    if (res) {
      res.setHeader('Cache-Control', 's-maxage=5, stale-while-revalidate');
      res.setHeader('Content-Type', 'application/xml');
      res.statusCode = 200;
      res.end(sitemap.end({ pretty: true }));
    }

    return;
  } catch(error) {
    return { error: 404 };
  }
};

export default Sitemap;

Notes: I like to extract my Contentful service into a services directory but you can put the contentful package or whatever headless CMS you want to use in here instead. I also use the slug index for the homepage in Contentful so I have that ternary check in here to not include the slug. Again configure as needed. I've also limited this to 1000 articles and pages but if you have more you may want to do some pagination magic there as well.

Deployment

To configure this for deployment on now.sh you just need to head on over to your now.json file and setup accordingly. Also make sure you add the route for your robots.txt file here. This can be stored in static but you will want it accessible off of the route.

{
  "version": 2,
  "alias": "my-sitemap-sample",
  "name": "my-sitemap-sample",
  "builds": [{ "src": "next.config.js", "use": "@now/next" }],
  "routes": [
    { "src": "^/robots.txt",  "dest": "/static/robots.txt" },
    { "src": "/sitemap.xml", "dest": "/sitemap" }
  ]
}

Scaling

As your site grows it may take some time to build and serve up this file. I like to use a service like cloudflare and it's caching to mitigate this. So far I haven't hit any speed traps but know that on a super large sitemap it might be a good idea to break this into multiple sitemaps on different routes at a certain point.

discussion