Massively Over-Engineering A Static Website
This story begins in 2014, when I first became aware of the joy of static-site generators. I’d been bumbling around with WordPress, trying to build a website for my band, and then, through working with James on the first version of the website for Open Addresses (a doomed, cursed project, unfortunately), I was exposed to the wonders of Jekyll.
I spent plenty of evenings and weekends crafting the new band site, and soon
realised that the way Jekyll likes to structure blog content, with a file at
blog/_posts/1970-01-01-some-blog-post.md generating a post at
https://yoursite.com/blog/1970/01/01/some-blog-post/ would be
ideal for individual gig posts. This will be important later.
A brief aside: while implementing this, it occurred to me that it was now possible to book the band by pull-request. I imagine it will come as no surprise to anybody reading this to learn that, in the eight years this feature has been available, nobody has ever attempted to book us this way, but I did meet Ben Balter at the ODI Summit and tell him about it, and I believe he included it on an internal Github newsletter, and we also got on the front page of Hacker News, but fuck those guys.
It was at that same ODI Summit (I think) that I heard Sir Tim Berners-Lee talking about how musicians were using JSON-LD for discoverability, which led to me embedding some extremely noddy JSON in our pages. This also will be important later.
This site had served us OK for nearly 5 years, but in late 2022 we returned to the recording studio so with a new record on the horizon, I felt it was time for a new website. And this is where the bullshit really began.
Previously, things like gig pages were being generated from YAML (embedded in frontmatter), and then the JSON-LD was being generated per-page and embedded, but this was always a little clunky and I knew I could build something better. It seemed to me that the JSON-LD ought to be a first-class concept, with everything else being an artefact derived from it. So I attempted to make this happen.
I took a step back around about here and had a look at the data I was working with, and the first thing I noticed was that there was a lot of redundancy. For example, the data for an individual gig might look like this:
location: Luna, 7 Church Lane, Leytonstone, E11 3DR time: "21:00" latitude: 51.5682837 longitude: 0.0101648 venue_website: url: //www.lunalivemusic.com/#Whats-On text: Luna
but then all other gigs at the same venue would also have those same
venue_website fields, which I’d been blindly copy-pasting for years.
All this file really needs is the
time (the date is encoded in the
filename) and a pointer to the
location, which we can define
So I got to work normalising, by hand (and using a handful of shonky, definitely-not-idempotent scripts), dozens of YAML files, into this sort of form:
time: '21:00' venue: luna
luna is the identifier for
address: 7 Church Lane latitude: 51.5681154 longitude: 0.0103544 name: Luna Lounge postcode: E11 1HG website: https://www.lunalivemusic.com/ image: luna.jpg
Now it turns out that referential integrity is really hard to keep track of with hand-crafted artisan YAML, and there’s a reason why we have databases, but this all has to be built and deployed as a static site on Netlify, and I don’t see where a database can fit in to that pipeline.
With my new shiny normalised data, I returned to Astro with renewed vigour, and once again got stuck pushing JSON up a hill.
Let’s build some middleware like it’s 2005
So I did what any of us would have done in this situation, and started building, in Python, a bespoke static-site generator purely for the JSON. While this might sound unhinged, or that I was doing this purely for the sake of of doing it (both of which might be true), it actually inverted the whole problem and made it much easier to think about. I now have some massively de-normalised JSON generated using some fully tested code, from a small collection of tiny YAML files. It’s not really an API (or maybe it is?) but it’s ideal for what I need.
And this time around, I immersed myself deeply in schema.org and produced much richer data than I ever had before.
And now, when I returned to Astro, I was able to pull the data from my new
service (Astro supports the bewildering
nonsense), confident that it was all correct and clean. I had initially bolted-on
offSchema additional data to the JSON, in order to simplify
rendering at the Astro end, but this quickly started to offend my sense of purity,
so I nuked all of that, and after a few iterations, it became clear that using
pure unsullied schema.org-compliant JSON as the Source Of Truth works just fine,
and I was able to derive everything I needed very easily.
So now, at build-time, each page pulls the JSON it needs, generates its HTML from
that, and also places it as JSON-LD into the
<head>. And, just
because I could, for each
.html page there’s also a
.json page showing just the data, because as a very wise person once told me, “your
website is your API”. It’s not perfect, because it’s based on requesting the URL
.json extension, but I doubt I’ll be able to convince
Netlify to support Content-Negotiation, so it will have to do for now.
And that’s where are now: Astro turns out to be quite nice to use when you’re not trying to bend it into weird shapes, and the preview site is live here - the colours are temporary until the art for the new record is finalised, but all of the functionality is there. And of course there’s also lots of lovely (fully validated) linked-data.
This Mastodon thread has been tracking this whole ridiculous story in real time, and will probably continue as I attempt further stupid feats of pointless over-engineering for a website that only about 6 people will ever look at.
These comments are generated from replies to this Mastodon post