Having never used a static site generator in anger, can someone explain to me like I'm five what's going on here?
My understanding is that Gatsby is a tool that converts a bunch of markdown files into a static HTML website. Why is slow builds a problem for any static site generator? Why does it need a cloud?
In other words, what problem am I supposed to be having that any of this solves?
Note, I'm trying not to be skeptical here - my company's website is hand-maintained HTML with a bunch of PHP mixed in so I can totally imagine that things may be better. But I don't understand the kinds of situations where using a 3rd party cloud to generate some static HTML solves a problem.
Gatsby is a fairly complex static site generator. At the highest level, it provides an ingest layer that can take any data sources (CMS, markdown, json, images, or anything that a plugin supports) and bring them into a single centralized GraphQL data source. Pages (which are built using React) can query this graph for the data they need to render. Gatsby then renders the React pages to static HTML and converts the queries to JSON (so there's no actual GraphQL in production).
This process is fairly fast on small/simple sites. Gatsby is overall very efficient and can render out thousands of pages drawing from large data sources rather quickly. The issue is that Gatsby isn't just used for personal blogs. As you can imagine, a site with thousands of pages of content that is processing thousands of images for optimization starts taking a long time to build (and a lot of resources). For example, I'm building a Gatsby site for a photographer than includes 16000+ photos totaling a few hundred GB. Without incremental builds, any change (e.g. fixing a typo) means every single page needs to be rebuilt.
Incremental builds means you don't have to rebuild everything. Because the data is all coming from the GraphQL (which Gatsby pre-processes and converts to static JSON), it is possible to diff the graphs (i.e. determine what data a commit has changed) and determine what pages it affects (i.e. which pages include queries that access that field). From there, Gatsby can only rebuild that changed pages.
This not only means faster build times, it also means that only the changed pages and assets have to be re-pushed to your CDN. This way, content that hasn't changed will remain cached and only modified pages will have to be sent down to your site's users.
Server-side rendering (like Wordpress) generates HTML in response to a URL. Static site generators just visit every possible URL at build time and save the final HTML as files. This makes it easy to deploy and scale when your site is static and doesn't need any features of dynamic server-side rendering.
Gatsby (and other frameworks) automate this process by going through whatever data sources you have (directory of markdown files, databases, etc) and producing the HTML. Gatsby uses React for the templating logic and any client-side interactivity on the pages. Build times scale with the size of your content and number of pages to generate so that's the reason for the cloud.
Overall, static sites are in the hype phase of the software cycle. Most sites are just fine using Wordpress or some other CMS and putting a CDN in front to cache every pageview. Removing that server completely is nice but most static sites end up using some hosted CMS anyway and at that point you just replaced one component for another. There's also advantages to completely separating the frontend code from the backend system for fancy designs or large teams.
I'll take a stab at this, Kyle just shoot me if I get something wrong below :D
1) There's a server-centric approach and a client-centric approach:
--a) hand-maintained HTML + php falls into the first camp
--b) React (/Angular/Vue) fall into the second
2) If you go with the second camp (b), you end up having a higher initial page load time (due to pulling in the whole "single page app" experience), but a great time transitioning to "other pages" (really just showing different DIVs in the DOM)
3) Gatsby does some very clever things under the hood, to make it so that you get all the benefits of the second camp, without virtually any downsides.
4) There are of course all kinds of clever code-splitting, routing & pre-loading things Gatsby does, but I hope I got the general gist right.
If not, Kyle, get the nerf gun out! -- how would you describe the Gatsby (& static sitegen) benefits? :)
I've only done simple stuff with Gatsby, but it fully supports generating static HTML from dynamic data sources. The difference between that and traditional JS frameworks is the generation for Gatsby happens at build time instead of runtime.
I love it because I vastly prefer serving static assets to server-side rendering because of the numerous simplicities it provides (aggressive caching, predictable latency, etc). In most cases you get to have the cake of complex sites generated from template and eat the cake of static asset serving.
It doesn't have to be markdown files. Gatsby supports a wide range of data sources which is available to use in your templates via graphql. If your website is big and gets frequently updated with data from the backend triggering the builds, new content on the site can take few minutes to appear as the generator will need to build a static site (html/css/js files) which I assume is a problem for big publication sites.
For a hundred markdown files, no big deal. But if your site has tens of thousands of pages, those build times become a real pain point. Why should every single page rebuild if you only changed one of them?
Instead of a Markdown file, imagine your data is somewhere in a REST API, or many REST APIs. Gatsby (and next.js, which I vastly prefer) will query these APIs during the BUILD process to generate your static sites - and that can be slow. Imagine you have a site that list the top 1000 IMDB movies with details. To generate your static site, you need to make 1,000 REST calls to the IMDB API during build time to get the necessary data. Parallelizing and caching it makes it faster.
If it were just Markdown files you probably wouldn't need this since parsing and transforming local Markdown files it fast. But this is Javascript, so nothing is truly fast.
Static site generators output static HTML files instead of running a server that renders every request, it doesn't say anything about what language they consume. Gatsby is built around Javascript and React, not Markdown.
None of the answers/comments below even come close to answering the simple question that started this thread. This looks like an overly complex solution looking for a problem to me.
Really appreciate the feedback and support for our launch today! The team worked super hard to get Incremental Builds live in public beta but are taking all the feedback (here and all over the web) as we go into full launch. Let us know what you think. Thanks!
Thanks for the great piece of software! I really like how the Gatsby community is dabbling in templates for more than just blogs. I've seen documentation sites, landing pages, and notes.
Here are some thoughts on my experience with Gatsby:
- You've done a lot of work to make configuring Gatsby easier, but I still seem to constantly hit roadblocks trying to get the config I want. For example I was running into problems getting MermaidJS, embedded video (that I was hosting on my own machine, not on YouTube), and mdx files all working together.
- I've been thinking that Gatsby is the perfect framework for creating semantic web content. E.g., you could have calendar events sprinkled through a website and create a GraphQL API for listing those calendar events, and that API would be accessible during the build process.
This is great! Is there any technical limitation keeping this from being part of the open source version?
I get that Gatsby company put a lot of effort into this and wants a return on that investment, and good for them. I assume a third party could offer the same but why would they compete at the same value prop.
However an open source version to not be reliant on any company would be compelling to many.
To reliably provide near real-time deployments, we need tight integration with the CI/CD environment to optimize and parallelize the work; that's why you’ll see the fastest builds and deploys through Gatsby Cloud — the platform is purpose built for Gatsby!
How much is the speed issue related to the language used? I know Hugo is an order of magnitude faster than most static site generators for example - it's written in Go with e.g. 2 seconds to generate about 10K pages https://forestry.io/blog/hugo-vs-jekyll-benchmark/.
I would have thought the generation process could be massively parallelised and a typical blog page would only need a modest amount of computation e.g. concat header, footer, pull in body text, resolve a few URLs. I can't help but think about how much work a typical computer game is doing in comparison 60 times per second even without a GPU.
I don’t think it’s a language issue. Even for JavaScript bundlers you have the slow extensible bundle and the “new super fast bundler” that dies in a month because it only fits one use case.
How flexible is Hugo? And how many plugins does someone generally use?
Some of it is due to the language and the general JS tooling being bloated and slow.
However, in many cases build time is slow because you're doing something that's slow, like calling a REST API. You are not going to generate 10k pages in 2sec if you need to make 10k REST requests, each taking 100ms, to a remote API to fetch the data for your pages. This kind of "data integration" from various sources is a standard use cases for site generators like gatsby and next.js. It seems like what this is targeting is smarter caching to avoid such expensive calls when possible.
Hugo is different in that it basically just transforms local HTML/Templates/Markdown. That's always fast. Even JS can handle that.
so this is cool release, and no objection on that
but if your pipeline has automated testing, security scans and more then you are not actually deploying in 10s
more technical details would be good but I guess either I missed it or they look at it as IP
The 'backend' here is ... HTML. For read-only a blog, that's likely more than enough. Otherwise, for dynamic content like contact forms and such, I don't know if there's a meaningful benefit to building out a whole site in PHP/Python/Rails or something (and paying commensurately more in hosting) than to use Formspree or something similar.
Yes, it calls an API. And thankfully with Formspree, it's pretty easy to see the price breakeven points vs. hosting, but there are benefits to be had.
skrebbel|5 years ago
My understanding is that Gatsby is a tool that converts a bunch of markdown files into a static HTML website. Why is slow builds a problem for any static site generator? Why does it need a cloud?
In other words, what problem am I supposed to be having that any of this solves?
Note, I'm trying not to be skeptical here - my company's website is hand-maintained HTML with a bunch of PHP mixed in so I can totally imagine that things may be better. But I don't understand the kinds of situations where using a 3rd party cloud to generate some static HTML solves a problem.
elviswolcott|5 years ago
This process is fairly fast on small/simple sites. Gatsby is overall very efficient and can render out thousands of pages drawing from large data sources rather quickly. The issue is that Gatsby isn't just used for personal blogs. As you can imagine, a site with thousands of pages of content that is processing thousands of images for optimization starts taking a long time to build (and a lot of resources). For example, I'm building a Gatsby site for a photographer than includes 16000+ photos totaling a few hundred GB. Without incremental builds, any change (e.g. fixing a typo) means every single page needs to be rebuilt.
Incremental builds means you don't have to rebuild everything. Because the data is all coming from the GraphQL (which Gatsby pre-processes and converts to static JSON), it is possible to diff the graphs (i.e. determine what data a commit has changed) and determine what pages it affects (i.e. which pages include queries that access that field). From there, Gatsby can only rebuild that changed pages.
This not only means faster build times, it also means that only the changed pages and assets have to be re-pushed to your CDN. This way, content that hasn't changed will remain cached and only modified pages will have to be sent down to your site's users.
manigandham|5 years ago
Gatsby (and other frameworks) automate this process by going through whatever data sources you have (directory of markdown files, databases, etc) and producing the HTML. Gatsby uses React for the templating logic and any client-side interactivity on the pages. Build times scale with the size of your content and number of pages to generate so that's the reason for the cloud.
Overall, static sites are in the hype phase of the software cycle. Most sites are just fine using Wordpress or some other CMS and putting a CDN in front to cache every pageview. Removing that server completely is nice but most static sites end up using some hosted CMS anyway and at that point you just replaced one component for another. There's also advantages to completely separating the frontend code from the backend system for fancy designs or large teams.
denster|5 years ago
1) There's a server-centric approach and a client-centric approach:
--a) hand-maintained HTML + php falls into the first camp
--b) React (/Angular/Vue) fall into the second
2) If you go with the second camp (b), you end up having a higher initial page load time (due to pulling in the whole "single page app" experience), but a great time transitioning to "other pages" (really just showing different DIVs in the DOM)
3) Gatsby does some very clever things under the hood, to make it so that you get all the benefits of the second camp, without virtually any downsides.
4) There are of course all kinds of clever code-splitting, routing & pre-loading things Gatsby does, but I hope I got the general gist right.
If not, Kyle, get the nerf gun out! -- how would you describe the Gatsby (& static sitegen) benefits? :)
freedomben|5 years ago
I love it because I vastly prefer serving static assets to server-side rendering because of the numerous simplicities it provides (aggressive caching, predictable latency, etc). In most cases you get to have the cake of complex sites generated from template and eat the cake of static asset serving.
searchableguy|5 years ago
tvanantwerp|5 years ago
deltron3030|5 years ago
It saves time, especially for larger pages, because instead of rebuilding the entire site with all its pages, you just rebuild those that change.
WnZ39p0Dgydaz1|5 years ago
If it were just Markdown files you probably wouldn't need this since parsing and transforming local Markdown files it fast. But this is Javascript, so nothing is truly fast.
akiselev|5 years ago
jtdev|5 years ago
unknown|5 years ago
[deleted]
dustingetz|5 years ago
kylemathews|5 years ago
Really appreciate the feedback and support for our launch today! The team worked super hard to get Incremental Builds live in public beta but are taking all the feedback (here and all over the web) as we go into full launch. Let us know what you think. Thanks!
denster|5 years ago
Just read the post, congrats on the launch!
We've been using Gatsby on:
https://mintdata.com
for the past few years, and are huge fans of your work.
I still recall the day when I brought Gatsby into our org, our front-end guys almost ate me alive :D
They said: a React.render(...) + GraphQL thing, why do we need it? What's the big deal?
Fast forward a few years later, and Gatsby dominates (in my opinion) the best way to build a static website based on React.
Keep up the awesome work!
Your true fan, Denis
sandGorgon|5 years ago
I know you guys are working towards SSR as well, but do you see a particular point of convergence between what you're doing and Nextjs.
Because it seems that given Nextjs SSR, SSG and everything else working now...Gatsby will get to where Nextjs is today.
noworriesnate|5 years ago
Here are some thoughts on my experience with Gatsby:
- You've done a lot of work to make configuring Gatsby easier, but I still seem to constantly hit roadblocks trying to get the config I want. For example I was running into problems getting MermaidJS, embedded video (that I was hosting on my own machine, not on YouTube), and mdx files all working together.
- I've been thinking that Gatsby is the perfect framework for creating semantic web content. E.g., you could have calendar events sprinkled through a website and create a GraphQL API for listing those calendar events, and that API would be accessible during the build process.
turadg|5 years ago
I get that Gatsby company put a lot of effort into this and wants a return on that investment, and good for them. I assume a third party could offer the same but why would they compete at the same value prop.
However an open source version to not be reliant on any company would be compelling to many.
kylemathews|5 years ago
To reliably provide near real-time deployments, we need tight integration with the CI/CD environment to optimize and parallelize the work; that's why you’ll see the fastest builds and deploys through Gatsby Cloud — the platform is purpose built for Gatsby!
seanwilson|5 years ago
I would have thought the generation process could be massively parallelised and a typical blog page would only need a modest amount of computation e.g. concat header, footer, pull in body text, resolve a few URLs. I can't help but think about how much work a typical computer game is doing in comparison 60 times per second even without a GPU.
turnipla|5 years ago
How flexible is Hugo? And how many plugins does someone generally use?
WnZ39p0Dgydaz1|5 years ago
However, in many cases build time is slow because you're doing something that's slow, like calling a REST API. You are not going to generate 10k pages in 2sec if you need to make 10k REST requests, each taking 100ms, to a remote API to fetch the data for your pages. This kind of "data integration" from various sources is a standard use cases for site generators like gatsby and next.js. It seems like what this is targeting is smarter caching to avoid such expensive calls when possible.
Hugo is different in that it basically just transforms local HTML/Templates/Markdown. That's always fast. Even JS can handle that.
gnalck|5 years ago
dergachev|5 years ago
alexgvozden|5 years ago
more technical details would be good but I guess either I missed it or they look at it as IP
ascorbic|5 years ago
WnZ39p0Dgydaz1|5 years ago
Javascript re-invents "Promises" because callback hell
Javascript re-invents "compilers" (babel)
Javascript re-invents "build systems" (webpack, etc)
Javascript re-invents "caching" (incremental builds) - but paid, and in the cloud
Because why not.
kaishiro|5 years ago
arpowers|5 years ago
bmelton|5 years ago
Yes, it calls an API. And thankfully with Formspree, it's pretty easy to see the price breakeven points vs. hosting, but there are benefits to be had.