top | item 23058289

Shrinking my static site

74 points| herohamp | 5 years ago |hampton.pw | reply

103 comments

order
[+] ig0r0|5 years ago|reply
I use Hugo as my static site generator, single binary, no dependencies, generating hundreds of pages in milliseconds ... so reading this feels so wrong, I want to call it JavaScript masochism.

Taking a simple concept like a static site and adding a ton of complex tooling around it because it is the trend now?

Why would you even need a docker image to run a static website? The best thing about a static website us you can host it everyone without requiring any extra resources, like putting it directly to some CDN as files, etc.

[+] sireat|5 years ago|reply
Speaking of complex tooling you don't think Hugo is a bit complex itself?

Took Hugo for a test drive using official quick start.

Half the themes from official docs wouldn't compile.

Started looking into architecture but official docs do not explain the big picture just a bunch of instructions.

The official documentation does not explain on what is going behind the scenes very well.

I mean, I will obviously want some sort of scaffolding and a way of changing it but this https://gohugo.io/getting-started/directory-structure/ is not helping very much.

Since you are not supposed to write go code everything has to be done through .toml config files as far as I could grok.

I feel I'd rather write my own static generator in language of my choice(say Python) than keep up with the Hugo docs.

[+] ShakataGaNai|5 years ago|reply
There is still software powering your static site. That software has to live somewhere or on something. Maybe you upload it to GitLab Pages, then you're relying on the cloud to power your static site. Maybe you upload it to your local server where nginx is running. But something is still running it, and if you're doing it yourself - you can run it in a docker container.

I've taken both routes. I have a static site that uses Hugo to generate - just like you mention. It's distribution pipeline is a Docker container inside of GitLab CI/CD. So that it has all the tools needed to both run hugo and upload to AWS S3.

Other static sites I have are generated into HTML, dumped into a docker container with nginx. Then are pulled on relevant servers which run everything else in docker as well, along with the front end load balancer. It would actually be Significantly more work to NOT have the static site in Docker.

[+] BrandoElFollito|5 years ago|reply
I have a few sites (also generated by Hugo) and the output is powered by a caddy container.

I also have this one, single site I want to be independent and not at risk of me misconfiguring caddy, playing around etc. Truly standalone.

So I generate the static code, embark a minimal web server and push it to my registry.

[+] oplav|5 years ago|reply
At work, we build a couple SPA React application for internal tools. All of these SPAs talk to other internal APIs via REST, so you can think of these as static websites.

In production, we do exactly what you suggest: serve through a CDN.

However, for our dev environments and PR builds, programmatically setting up CDN to handle these use cases is not easily integrated with our CI/CD workflow. Our CI/CD environment is, however, good at quickly deploying containerized services.

For this, using an nginx container to serve the produced static bundle enables us to easily have PR builds for the dev team and product owners to see changes without having to check out and build the code.

[+] aledalgrande|5 years ago|reply
That's exactly how I was doing it years ago: Hugo and script to upload to S3. Engineers like to overengineer stuff... get away from that habit folks.
[+] carapace|5 years ago|reply
I just looked at Hugo. Their docs menu doesn't work with JS disabled. I know it's a tangent from the point of this thread, but it just makes me crestfallen.

https://gohugo.io/documentation/

[+] fock|5 years ago|reply
best thing: you could easily pack your static Hugo binary in a container with nothing else. But somehow nobody does this without talking about microkernels (and probably internal google-apps)
[+] rumanator|5 years ago|reply
This post says a lot more about the javascript ecosystem than Docker. Multi-stage image builds are nothing new or extraordinary, and in fact it's Docker 101. However, being forced to install 500MB worth of tooling and dependencies just to deploy a measly 30MB static website on nginx is something unbelievable.
[+] throwaway8941|5 years ago|reply
How is that any different from build tooling for any other language? On my system, gcc with a bunch of commonly used dependencies requires just about the same space (and it's a full Linux system, not a trimmed down container).
[+] nikeee|5 years ago|reply
I'd suggest changing this:

    COPY package.json .
    RUN npm install
to this:

    COPY package.json package-lock.json .
    RUN npm ci
`npm ci` installs the exact dependencies specified in the lockfile. This way, transitive dependencies that were upgraded via `npm audit fix` are guaranteed to be installed. It therefore forces the image to be rebuilt when a transitive dependency changes. Copying only the package.json wouldn't do that. It also errors if the lockfile and the package.json are inconsistent.

https://docs.npmjs.com/cli/ci.html

[+] noahtallen|5 years ago|reply
+1. npm ci is typically quite a bit faster too in my experience.
[+] francislavoie|5 years ago|reply
You can probably shrink it even more. The Caddy alpine image is 14MB compressed.

https://hub.docker.com/_/caddy/

You also get automatic TLS certificate management and tons of other goodies that nginx doesn't offer out of the box.

[+] herohamp|5 years ago|reply
all of my docker containers are placed behind traefik which handles TLS certifactes, HTTPS redirect, compression, and routing to the correct container
[+] 1337shadow|5 years ago|reply
So, if you want to host multiple sites on the same port of your server, from the same process, then you can also have Caddy behind Traefik, but then why would you need Caddy ? Traefik got docker load balancing right: watch docker.sock and self configure on the fly.

I missed the whole Caddy thing anyway because the source code was not completely open back in the days I was looking for nginx alternatives that provided with LetsEncrypt automatically.

However, it seems Caddy can be used as an alternative to Traefik if you use a plugin: https://caddyserver.com/v1/docs/docker Apparently it requires swarm though, which I don't need anyway, so, still no reason for me to try Caddy unfortunately because it does look pretty cool now.

[+] jrururufuf666|5 years ago|reply
in my eyes this is all madness. deploying sites via github to some docker shit.

how about good old ftp and a cheap shared webhost? like its been done for 30 years.

[+] onion2k|5 years ago|reply
I've only been doing it for 25 years rather than the full 30 but in my noob opinion I'll take Docker and Github over it every time. The number of times a site failed to update because an FTP file failed to transfer completely, or that the permissions were wrong, or that the FTP client changed a file's CRLFs, or that a directory on the server wasn't writable by the FTP user, or... well, let's just admit that FTP sucked and things are far more reliable now.
[+] epmaybe|5 years ago|reply
To be honest, GitHub sites have been amazing for me. I just have a simple static website with personal information. HTML and CSS with minimal JS. Purchasing a shared webhost still means extra cost on top of domain registration. With GitHub, I just followed their instructions with my domain, and everything just worked. I'm also not paying anything per month or year besides domain registration which is really nice when you have to start budgeting tightly.

I don't know about docker and all of that, though.

[+] sh87|5 years ago|reply
Yeah but...but... that would just get the job done. Nothing to write about, nothing to whine about, no weird dependencies being pulled, nothing to hand-wave, nothing to yell and humblebrag about. No theatre.

You are right, this IS madness.

[+] quadrifoliate|5 years ago|reply
If there's a blog post: "I got my Ford F-350 to use less fuel by using this One Weird Trick", it's not much use to say how a Honda Civic is a much more simple and efficient solution to the given problem; the domain has already been defined to be the complicated solution.

I'd probably choose the sftp and webhost route myself, but you know.

[+] api|5 years ago|reply
That doesn't have enough buzzwords.
[+] mattacular|5 years ago|reply
Sometimes you can get space savings on docker images from seemingly odd sources. For example, I found that running a chown command on files after they've been COPY'd in bloats the image size significantly (100s of MB). However, at some point Docker added a "--chown" flag to the COPY command which brings it back in line.
[+] globular-toast|5 years ago|reply
Why even have a docker image for a static site? If the site has to be built like this one then just put the output of the build process behind some webserver. We were doing this back in the 90s and didn't have to write blog posts about how not to make your site 400MB.
[+] IanCal|5 years ago|reply
> just put the output of the build process behind some webserver.

That's what this is doing.

You are skipping over two other parts which are installing and setting up nginx / similar and creating a process for the build itself. This ties the three together.

[+] quezzle|5 years ago|reply
Static site and docker shouldn’t be seen together.
[+] antsar|5 years ago|reply
So if I have a kubernetes env hosting all my stuff, with standardized CI /deployment flows, and one site is static, hosting it from a container like everything else would be a cardinal sin? Thats a strangely black-and-white way to put it...
[+] steve_adams_86|5 years ago|reply
I understand the sentiment and agree to a point. At the same time, I sometimes use docker for seemingly trivial problems because the most important requirement is build consistency. I don't know how to get around using containers when I need that. I can see a static site generator actually needing a reliable build process. What would you suggest though? Just manually/automatically test builds?
[+] 1337shadow|5 years ago|reply
Disagreed, because an LB like Traefik will easily self configure watching docker.sock, otherwise, you'd need to change your LB configuration manually every time you add / remove a site.
[+] tailsdog|5 years ago|reply
Could you elaborate on that statement, why should static site and docker not be seen together?
[+] jt2190|5 years ago|reply
This approach is akin to installing all of the build tooling inside of Docker, then generating the build artifact. I'd think it'd be even slimmer to generate the build artifact first, then just copy that into the container.

Is there an advantage to building inside of Docker?

[+] steve_adams_86|5 years ago|reply
Wouldn't the advantage be consistent build fragments? If you use the build tools outside of docker, you won't get the same repeatable build artifact across different machines (or as your own host system changes). Maybe I'm not understanding you though.
[+] IanCal|5 years ago|reply
> I'd think it'd be even slimmer to generate the build artifact first, then just copy that into the container.

That's what this is doing. It creates builder containers, uses them to make the artifact and copies it in:

    FROM nginx:1.17.10-alpine
    RUN rm -r /usr/share/nginx/html/
    COPY --from=builder /app/_site/ /usr/share/nginx/html/

    EXPOSE 80 
The final image is nginx with the static files copied over.
[+] oplav|5 years ago|reply
We've found one advantage to be more reproducible builds since you don't have to worry about different versions of build tooling affecting the artifact.
[+] Drdrdrq|5 years ago|reply
Aside: one should not use package.json to install dependencies. Use either package-lock.json (and command "npm ci") or yarn.lock (and... I forget). Keep the lock file as part of repo too, or each build could be different.
[+] slezyr|5 years ago|reply
> or each build could be different.

Or not working.

[+] saagarjha|5 years ago|reply
Can we put "Docker image" in the title somewhere? Otherwise, it seems like the article is talking about the site itself (i.e. having less JavaScript, optimizing the images, …)
[+] superkuh|5 years ago|reply
Back in 2015 when cloud offerings were still marginally new, a lot of big providers were gettting into the game with Docker offerings (ie, IBM Bluemix) where the charge was based entirely on RAM*Hours.

Naturally this lead to me gaming the system and making my docker images as in RAM usage small as possible. In the end I even abandoned SSH as too heavy and switched to shadowsocks (2MB resident) for networking the docker instances together.

[+] azangru|5 years ago|reply
> This docker image resulted in a 419MB final image and took about 3 minutes to build. There are some obvious issues with this. For instance every-time I change any file it must go through and reinstall all of my node_modules.

He doesn't tell whether there have been any build time improvements after the changes to the Dockerfile. Will the builder docker images get cached and thus reduce the build and deployment time?

[+] 1337shadow|5 years ago|reply
TBH I have such a setup on yourlabs.io/oss/blog because I like SASS too (SASS, the nice CSS language that integrates nicely with webpack, as long as there's node-gyp which needs python2 and g++ to build CSS, guess I'm not doing it right ><), and to serve as a template project for others that would require more elaborated frontends...

... But in my experience it's pretty boring to wait for the whole JS rebuild when you just add a post, I think next time I'm going to remove the JS build from CI and just commit the built things when I change the frontend code, which is not often.

[+] licebmi__at__|5 years ago|reply
Well, the article mentions an obvious improvement. The node_moduoes won't be rebuilt after any changes to the code, only changes specific to the packages.json.

Basically the main step is the COPY . . step (previously COPY . /app) which will invalidate the cache every change to the code.

Also that on the builder, the steps beyond npm run are not needed, but well they won't improve much on the performance or space of the overall process.

[+] alpb|5 years ago|reply
If you're using multiple steps anyway, there's no need to use nginx base on every step.

    FROM nginx:1.17.10-alpine as npmpackages
    RUN apk add --update nodejs npm

Just do:

    FROM node:10
    RUN npm [...]
[+] herohamp|5 years ago|reply
that is true, ill switch to that when i get around to it
[+] discordance|5 years ago|reply
How about binary patching your container? - pretty sure I've seen this done somewhere but can't find the link to it.
[+] philshem|5 years ago|reply
The initial build time is “about 3 minutes” but I’d like to know the build time of the final image.
[+] 1337shadow|5 years ago|reply
I have no idea why they need an nginx image in their second FROM, they are just doing some npm.
[+] miganga|5 years ago|reply
Why are we scared of disk space in 2020?
[+] watersb|5 years ago|reply
I'm scared of everything in 2020.