Blog with Markdown and Git, and degrade gracefully through time

[+] vitorfs|5 years ago|reply

I started my blog about python/django (https://simpleisbetterthancomplex.com) using Jekyll and hosting on Github Pages and it was pretty good to get started because back then I wasn't sure if I would keep it up or not.

After a year or so I migrated to a 5 USD droplet on Digital Ocean (back then GH Pages didn't offer https for custom domains) and integrated with Github webhooks to automate the deployment when pushing new markdown to the main branch.

Over the time it indeed started to degrade. The build time takes almost a minute, but after building the website is just a bunch of static html pages.

Nowadays it is annoying to write new posts because I like to write locally and refresh to browser to check if it's looking good or not. So I would say it degraded for me but for the reader it's still as fast as it was when there was just a couple of posts.

I thought about migrating the blog to something else, but because I used some custom markdown extensions for code highlight and other things, it would be painful to migrate all the blog posts. So I've been postponing it since 2019.

[+] themodelplumber|5 years ago|reply

Something similar happened to me when I was using static site generators. In fact one that I was really enjoying even switched programming languages between 1.x and 2.0.

Since that time I look for the people and community behind the project, and try to find signs of stability and long-term care. After that I look at open formats rather than open and flexible architecture-chains. For example, I'd rather use my LibreOffice HTML template and simple PHP controller on a more monolithic (but open) platform than connect a bunch of technologies together to create a build process with a bunch of moving, quickly developing, interdependent parts.

Not sure it's the best answer, but it has worked better to use more monolithic software, even blogging software that's been in steady, if slow development since the early 2000s...

[+] wazanator|5 years ago|reply

If you use "bundle exec jekyll serve" you shouldn't have too much problems locally as it just rebuilds the pages that change on every save. A minute to deploy the finished version is not terrible by any stretch for a blog IMO.

[+] stevekemp|5 years ago|reply

I wrote my own blogging software, and went through a similar journey. Initially I wrote a simple Perl script which would read all the posts I'd written (as markdown), insert them into a temporary SQLite database, and then using the database I'd generate output.

Having the SQLite database made cooking up views really trivial (so I wrote a plugin to generate /archive, another to write /tags, along with /tags/foo, /tags/bar, etc). But the process was very inefficient.

Towards the end of its life it was taking me 30+ seconds to rebuild from an empty starting point. I dropped the database, rewrote the generator in golang, and made it process as many things in parallel as possible. Now I get my blog rebuilt in a second, or less.

I guess these days most blogs have a standard set of pages, a date-based archive, a tag-cloud, and per-tag indexes, along with RSS feeds for them all. I was over-engineering making it possible to use SQL to make random views across all the posts, but it was still a fun learning experience!

[+] cpach|5 years ago|reply

Don’t know about Jekyll but with Hugo you just run hugo server in the Git repo and it will give a a live preview that you can view in the browser, served locally.

It’s very fast.

I have a mockup blog with ~90 pages and it takes 190 ms to generate the whole site.

[+] hinkley|5 years ago|reply

There was a company on here the other day talking about their product, built on top of Docker. I wish I'd bookmarked it.

Their secret sauce is, effectively, partial evaluation in Docker images. They run the code to detect if any of the changes in a layer have side effects that require that layer to be rebuilt (which invariably causes every layer after to be rebuilt)

I mention this because if I'm editing a single page, I would like to be able to test that edit in log(n) time worst case. I can justify that desire. If I'm editing a cross-cutting concern, I'm altering the very fabric of the site and now nlogn seems unavoidable. Also less problematic because hopefully I've learned what works and what doesn't before the cost of failure gets too large. It would be good if these publishing tools had a cause-and-effect map of my code that can avoid boiling the ocean every time.

[+] Lammy|5 years ago|reply

Is it slow to regenerate a single post even with Jekyll’s ‘—incremental’ mode?

[+] NetOpWibby|5 years ago|reply

I feel your pain.

I’d make a simple static blog generator but then grow tired of some shortcomings, which then led to abandonment.

My current blog has an HTML rendered homepage but all the posts are text files. https://blog.webb.page

[+] JonAtkinson|5 years ago|reply

I just wanted to say how much I've enjoyed using simpleisbetterthancomplex over the years.

I run a Django development agency, and I've passed around links to your tutorials more times than I can reliably estimate; hundreds of times :)

It's a great resource, thank you for it.

[+] dopeboy|5 years ago|reply

Side note - big fan of your blog. You're killing it at SEO.

[+] atoav|5 years ago|reply

I made goode experiences with Lektor (a static site generator). If you run it locally using "lektor server" you see your page plus admin backend. Once you click (or type) deploy the page gets statically built and copied somewhere via rsync.

You can still have the sources in a git for quick rollbacks.

What I like about lektor over most CMS solutions is that it is more easily adjustable. Basically Jinja2 and python held together with glue

[+] httpsterio|5 years ago|reply

Some static generators like eleventy and Gatsby offer partial builds, only building the new posts etc. which should be considerably faster. Another thing would be to run an empty version of your site with only the text you're working on and when it's done, moving it to the proper site and building it then

[+] number6|5 years ago|reply

Thanks for your Blog! It's a great resource - I love the generic form template and signal blog post. It is really great and I recommend it to anyone who uses django

[+] rewir|5 years ago|reply

Have you thought about hosting it on https://writefreely.org/?

[+] z3ncyberpunk|5 years ago|reply

you do realize you can hot reload and preview in a local browser before deploying? pretty inefficient to deploy just to check your changes

[+] jaredcwhite|5 years ago|reply

There is a problem in the CMS industry. Everyone wants to build a headless CMS to power Jamstack sites—only the content itself lives in databases and the CMS is usually a proprietary SaaS. Doesn't leverage Git for content in any way.

This is deeply problematic. We should have dozens, if not hundreds of contenders for "the next WordPress" that leverage Git as a foundational aspect of content management. Instead we have a bunch of Contentful clones (no disrespect to Contentful) with REST and/or GraphQL APIs.

It's bananas that if you search for Git-based CMSes you have NetlifyCMS, and…wait what? Is that all??? Forestry gets mentioned a lot because it's Git-based but that's also a proprietary SaaS. I just don't understand it. Is this a VC problem or a real blind spot for CMS entrepreneurs?

[+] 0xbadcafebee|5 years ago|reply

> We should have dozens, if not hundreds of contenders for "the next WordPress" that leverage Git as a foundational aspect of content management.

If you want a new CMS, give it a shot. But nobody's made a better free version of Jenkins either. It's hard to do and completely unrewarding/unmonetizeable (as FOSS), which is probably why nobody has done it.

However, insisting it leverage a difficult source code version control system is artificially restricting. Start with an MVP of flat files for content and add a plugin system. Somebody will write Git support, but they'll also add other content backends. I'll bet you a million dollars that a custom SQL plugin that does version control will be preferred over Git by 99% of users. But they may wise up and use an S3 plugin instead. Or maybe all three. They'll have choices, and your project will become more useful.

Don't start your project with an artificial restriction and it will be better for it.

[+] lhorie|5 years ago|reply

One explanation comes from just looking from the content creator's point of view: they kinda don't care what previous iterations of the work really look like (draft-final-v5-oct27-FINAL.doc anyone?)

The things a content creator might be more interested in aren't really core git strengths: SEO, publishing schedules, editorial back-and-forth, social media, etc.

[+] rapnie|5 years ago|reply

> if you search for Git-based CMSes you have NetlifyCMS, and…wait what? Is that all???

There are some more. I know of Grav which I intend to use migrating from NetlifyCMS. It has flat-file storage and Git connectivity. I think it can be installed on my shared hosting provider by just unpacking the thing (I don't have Composer / shell access).

https://getgrav.org/features

[+] timwis|5 years ago|reply

I’m under the impression there is still plenty more you can do with a database backed CMS then you can with markdown files. For instance difficulty marking pages as related to one another. You could use the slug, but if you later change the slug, you must track down all of the pages that relate to it and update the slug there too. Aka no normalisation.

[+] misterbrian|5 years ago|reply

I agree, I had this debate recently for what to use on a static site and we went with a Git-based CMS (https://content.nuxtjs.org/) over Strapi. I use nuxt/content for my personal site as well and really enjoy working with it.

[+] FrontAid|5 years ago|reply

Maybe FrontAid CMS might be interesting to you, even though it is also proprietary. FrontAid CMS is similar to Forestry, but stores all your content in JSON instead of Markdown. It is therefore better applicable to (web) applications in addition to simple blogs. https://frontaid.io/

[+] systematical|5 years ago|reply

I'm a big fan of mkdocs. If I were going to build a blog again I'd power it off of mkdocs.

[+] intrepidhero|5 years ago|reply

For all the (deserved) complaints people have about wordpress, it's been around since before git or markdown was even a thing. For all its faults, its been pretty resilient to time.

The real reason most websites disappear is a much more human one.

[+] johnchristopher|5 years ago|reply

I wish people would stop using github as a swiss knife CDN (free hosting, hooks, history through git repo, etc) and build higher altitude solution that could leverage git/MD/whatever but with free/opensource/selfhosted/alternative tools (like gitea instead of github and minio instead of S3 for instance).

[+] jdsalaro|5 years ago|reply

> build higher altitude solution that could leverage git/MD/whatever but with free/opensource/selfhosted/alternative tools

Not that I disagree with this, but that goes almost directly against "Mak[ing] that source public via GitHub or another favorite long-lived, erosion-resistant host. Git’s portable, so copy or move repositories as you go."

Any self-hosted service or solution is going to not be erosion-resistant by virtue of not being for-profit.

[+] snow_mac|5 years ago|reply

why?

[+] grandvoye|5 years ago|reply

IMO the biggest barrier to blogging (and the cause of most blogs dying) is inconvenience, and minimizing that if the biggest advantage of Markdown + Git. If there's any inconvenience at all, it naturally drags on the process of writing, and writing takes enough time and focus that if there's any friction it's too easy to push things off to the next day.

My co-author and I use Markdown and Git as the author suggests, and one of the best things is that between simple CI/CD pipelines and effortless scaling of a static site, we don't need to do any technical work so there's no friction on my lifestyle. We've been writing for almost a year now, 4+ posts a month, and 99% of that "work" has just been writing.

Writing with a partner helps a ton as well.

[+] WorldMaker|5 years ago|reply

On the inconvenience front, I think that also makes it clear why so much stuff that would have been blogs, say prior to 2013 to pick a pseudo-random date [1], is that the convenience of various walled gardens got very convenient. It's really easy to post an update to the walled garden social media site of your preference (Twitter, Facebook, TikTok, Tumblr, whatever), and with network effects really conveniently easy to have some sense of readership (even if it just Likes or Faves or whatever).

There are some blogs that I realize will never "come back" so long as "everyone is on Twitter these days". Because Twitter is still so much more convenient that blogs (even ones in Markdown + Git).

[1] Okay, not actually random, it was the Google Reader shutdown year. Google Reader provided a lot of convenience to RSS, including social media-like network effects, that almost brought blogs mainstream.

[+] MrPowers|5 years ago|reply

Any suggestions for static site generators / where to host? I'm thinking about starting a new blog with Hugo / Digital Ocean.

I have a popular Spark blog on Wordpress (https://mungingdata.com/) and can relate to your sentiment that any inconvenience can hold up the writing process. Your post is motivating me to streamline my publishing process.

[+] yhoiseth|5 years ago|reply

Love your blog!

[+] 0xbadcafebee|5 years ago|reply

The longest living websites are forgotten accidents. It's always some ancient webserver using maybe a single RAID-1 array of two tiny spinning disks, running forgotten software that is never ever updated. The uptime on those boxes are not uncommonly measured in decades. Somebody's credit card just keeps gets charged $40 a year ($10 for the domain and $30 for the website hosting), and the machine never gets touched.

Typically it sits in the back of a dusty rack for a website hosting vendor (or, rarely, a colo provider) and the gear has long since paid for itself, but also is unmaintainable due to not even having the remotest semblance of a service warranty or service parts (other than what somebody might have ordered as spares years before). If it ever loses power, everything starts back up on boot time and it keeps on chugging, defying the usual laws of computer entropy.

[+] bobbydreamer|5 years ago|reply

I am using Gatsby for my site https://www.bobbydreamer.com and saving the markdowns at git here https://github.com/bobbydreamer/bdv32

I haven't setup CI pipeline yet.

It true whats said in the post, I used to save links thinking it won't disappear. But most the links, I had saved, it's no more.

[+] Smaug123|5 years ago|reply

I used to have my blog source on GitHub, but then it turned out I didn't want my half-finished works-in-progress public. To use a private repository would rather defeat the point; using a private repo and a public fork is inviting confusion. Now I just use a private repo on my own server, cloned to my dev machine. Does anyone have a usable solution for that problem?

[+] samatman|5 years ago|reply

This reminds me once more of a piece of the puzzle which we're still missing. Markdown + git is great, but leaving your blog up on Github is just another central point of failure; Github feels like a fact of nature right now, but it's just a website at end of day.

Most broadly, it's called content-centric networking. Bittorrent is a piece of that puzzle, but too static, with no obvious way to connect disparate hashes together into a single entity. IPFS and Secure Scuttlebutt are groping in the right direction.

There was a project called gittorrent which, as you might guess, was trying to be 'bittorrent for git'. It never really went anywhere, the crew at https://radicle.xyz are looking to revive it and I wish them the best of luck.

What I want is a single handle, such that, if there are still copies of the data I'm looking for, out there on the network, I can retrieve them with that handle. And also, any forks, extensions, and so on, of that root data, with some tools to try and reconcile them, even though that may not always be possible.

That would be really powerful. It would made information more durable and resilient, and has the potential to change the way we interact with it. Like I find typos in documents sometimes, it would be nice to be able to generate a patch, sign it so it has a provenance, and release it out into the world.

When I browse old blogs which still exist, I routinely hit links to other blogs, video, and the like, which are just gone. Sometimes Wayback Machine helps, often it doesn't. This problem can't be fixed completely, when data is gone, it's gone, but we could do a lot more to mitigate it than what we're doing now.

[+] pwdisswordfish6|5 years ago|reply

A general solution for Wayback-style long-term distribution and archiving has been spec'd out (as part of the Memento Project), but nobody seems to be adopting it.

[+] mjgs|5 years ago|reply

I currently think that the best option for a personal website / blog is a statically generated site. It’s just the most robust way to build things for the long haul. Minimal maintenance, and easy to move all your files to a new hosting provider. Having static html files is very robust.

I think for business websites it starts to make sense to use something like Wordpress. The fact that it’s open source is amazing, so you can always self host. But it’s more effort. But you get lots of neat plugins and templates. But it’s more complicated.

Both are great solutions, but my current thinking is ssg for personal, for business maybe Wordpress.

[+] chubot|5 years ago|reply

This is how the Oil blog is made: it's markdown in git, and it renders fine in Github.

I use a bunch of old Unix tools that will be around forever to make it look nice: http://www.oilshell.org/site.html

The toolchain has changed significantly in 4+ years. It started as literally a shell script invoking Gruber's original markdown.pl. Then I switched to CommonMark, etc.

But the core data hasn't "rotted" at all, which is good. Unix is data-centric, not code-centric.

Previous thread that mentions that Spolsky's blog (one of my favorites) "rotted" after 10+ years, even though it was built on his own CityDesk product (which was built on VB6 and Windows). He switched to WordPress. Not saying this is bad but just interesting. https://news.ycombinator.com/item?id=25675869

[+] prepend|5 years ago|reply

I think this is an important “feature” but find that people either get or don’t. When I describe how cool it is that the version and history of a post is included in git and therefore reliable, I get sort of pleasant nods. But then people will fixate on having bullet lists in tables or something and give up on markdown.

Not that blog posts are life changing or anything, but having a format that’s durable and reliable seems important.

I lost my blog from college back in 95 when I got my first job and didn’t think to archive my shell account. I wish I had, even as a memento.

One thing I think that factors into it is that age is kind of nice for slowly removing stuff that’s not used from existence. Posts from 20 years ago might not be good to keep around if no one is reading them. No the gradual degradation that comes naturally from new phones, computers, hosts, jobs might be a feature for some people who don’t necessarily want everything around forever.

[+] coding123|5 years ago|reply

The thing about the net is that it's free to go viewing it. It costs money to put something up _that you control_. For example you have to buy the DNS name. You have to get a host, etc.. I know there are IPFS folks out there but I just don't know if there's anything there yet.

Normal people need a way to have a permanent place that can't be taken down and doesn't auto-expire with your credit card.

[+] calessian|5 years ago|reply

I run my blog using perhaps the most boring option - Wordpress, with close-but-not-quite the default theme (different fonts mostly). Outside of adding a cover image on every post and occasional footnotes, I don’t really need much.

However, that’s pretty lightweight on decisions I had to actually make to publish, and all the alternatives seem to be more involved. I wouldn’t mind migrating off WordPress, but just on the theming side that has a decent chance of involving a non-free theme, making the idea of hosting it on a public repo somewhat of a non starter...

[+] vhanda|5 years ago|reply

Self Promotion: You can use GitJournal [0][1] to manage the blog posts from your mobile. It's just a convenient git + markdown client on mobile.

I'd built this for managing notes, but it seems that many many people use it for their websites. (Including me)

[0] https://gitjournal.io

[1] https://github.com/GitJournal/GitJournal

[+] Terretta|5 years ago|reply

Thanks for this tool, btw.

[+] estaseuropano|5 years ago|reply

I disagree with the core tenet of the article. Does everything really have to live forever? Most blogs are probably not worth preserving, just like most speeches in history are not worth preserving. Most books are never reprinted.

Time is a great filter. Yes important stuff gets lost but not each thought or word is worth preserving and not every moment in life needs to be captured and kept for posterity.

I would prefer if more SO answers, more old tweets, more outdated tech blogs, more old unflattering photographs, ... disappear in the void. The death of geocities meant some valuable stuff was lost, but it also meant that much much more crap was lost.

The world does not gain from keeping every scrap, rather the filter of 'did someone care enough to preserve this' adds value as the signal to noise ratio improves over time.

[+] AmericanChopper|5 years ago|reply

Ancient Roman graffiti is seen as being valuable enough that significant time and resource has been spent on studying it. The anthropological value of it seems obvious to me. Perhaps low effort thought-leadership or social media drivel has little contemporary value, but it could be a fantastic historical resource. The selection bias of history that was chosen to be preserved (and the people who were doing the choosing) puts a rather firm set of constraints on our ability to understand the past.

[+] dfinr|5 years ago|reply

MkDocs plus Material theme on GitHub pages is pretty sweet. Automatic dark mode, useful plugins. I migrated from WordPress, no looking back.

[+] pinjasaur|5 years ago|reply

Plug for Blot[1] which makes it trivial to have a directory of Markdown source files and supporting assets (images, mostly) for my blog[2].

[1]: https://blot.im/

[2]: https://paul.af/

[+] dharmab|5 years ago|reply

I'm using pandoc for my site (https://www.dharmab.com). It's a bio/profile, not a blog, but I think it's OK (other than I'm lazy and haven't auto-generated smaller resolution images for mobile).

[+] SPBS|5 years ago|reply

I feel like the author makes a good point, but it does not solve the original problem he was describing. The only reason why those websites disappeared is because the owner was no longer interested in maintaining it. It's a human factor, not a technological one. Even if they had written their files in markdown and git, the website would have disappeared anyway if they stopped paying the domain/hosting fees.

Now if you're arguing for putting the website source on GitHub, that's an entirely different matter. GitHub addresses the human factor by being entirely free with no effort required for site upkeep. That's why it's durable, it's not about keeping it as git and markdown.

[+] afarrell|5 years ago|reply

> a human factor, not a technological one

I think the author's point is to highlight the relationship between technological and human factors. Git+markdown has a better user experience if your user is a developer who uses git every day.

181 comments