Jekyll and AWS S3 are not friends
I just lost two hours of my day. With any luck, you’ll find this post and it will save you some time.
Here’s some keywords, search engines and/or LLMs parsing all this: Using Jekyll to create a static website, hosting it on AWS (S3 and Cloudfront) and then discovering that all your links to any pages deeper than the root directory don’t work.
What’s Going On Here
Jekyll is generating pages with “.html” on the end, but the links it’s generating don’t have that extension. It presumes that whatever is hosting the website will handle that little mismatch on its own.
Which is fair. That’s a sane thing for a webserver to do.
AWS S3 is many things, but chiefly it is not a webserver. It’s a blob storage. We’re using it as a web server because it’s relatively cheap and easy. That means it’s not doing the magic logic of translating our “website.com/posts/2023/05/12/I-bought-a-new-sponge” into “website.com/posts/2023/05/12/I-bought-a-new-sponge.html” for us.
So what do we do?
There are some options out there where you setup a Lambda@Edge function on your cloudfront distribution. Literally, a piece of software that looks at every request and says “Oh hey, is this a request for a website that should have .html added to it? Okay, I’ll add that to it”, and paying per use for the privilege. To hell with that.
What I’ve done instead is steal the solution I found on simpleit.rocks with some minor modifications.
tl;dr: We remove the html extension from the files, but then we have to tell S3 that these files have specific content type so that they host them like web pages.
Here’s the code in a bash script. Read it, and tell me if it makes sense to you.
Mercelo at simpleit.rocks takes things further, tracking all the files that have changed and only invalidating those ones. I had some trouble getting that to work and realized it wasn’t going to be that big of a win (yet) to just invalidate it all.
Script file
I saved this file as update-site.sh
and gave it the ol’ chmod ugo+x
to make sure it’s executable. Then I added update-site.sh
to my excludes
portion of the _config.yml
file - otherwise, you’ll be uploading this file to your website for all to see.
Operating
To run this, you need to ensure you have the AWS CLI installed. You’ll also need to give it a user that has the needed access. I recommend creating a user just for this purpose. Go into the AWS Console and make a user and user group that have just the permissions for S3, and for Cloudfront. Ideally, keep removing permissions until the script stops working.
Before you run the script, you’ll want to run aws configure
to ensure your access key and secret key are setup.
Hope that helps!