BCHS: OpenBSD, C, httpd and SQLite web stack

[+] dleslie|4 years ago|reply

I'd be fine with this, even totally on-board, if C weren't so awful with respect to text. You don't even have to worry too much about free()ing your malloc()s if you design around short-lived processes. But this is just asking for security concerns among the tangled web of string and input processing your bespoke C routines are likely to develop into.

Pair it with a better, more modern, and safer native-compiled language and get the same effect. Zig, Nim, Go, hell even Carp.

[+] littlestymaar|4 years ago|reply

> Pair it with a better, more modern, and safer native-compiled language and get the same effect. Zig, Nim, Go, hell even Carp.

I love how trollish it is not to talk about Rust in that context.

[+] the_only_law|4 years ago|reply

I just wish there were better tools for navigating C codebases.

There’s been more than one time where I’m in some large auto tools based project trying to figure something out and there’s a call out to some dependencies I have no idea of.

Also many of the projects lack and sort of documentation or source code commenting. These aren’t someones pet project either. One of them was from a notable name in the open source community and the other one was a de-facto driver in a certain hardware space.

[+] tharne|4 years ago|reply

It does seem sometimes that a lot of folks use C for philosophical rather than practical reasons.

That being said, I love seeing a push for simple stacks like this.

[+] rkagerer|4 years ago|reply

You don't even have to worry too much about free()ing your malloc()s

*gasp!* Such lack of symmetry... it disturbs something deep in my soul.

[+] rnkn|4 years ago|reply

Is there a good string-manipulation C library?

[+] ainar-g|4 years ago|reply

Considering that it's a stack that uses OpenBSD, my first thought would be Perl, although it's not a language that one could call “modern”, heh. It's included into the base system and has rich libraries for text processing, (Fast)?CGI, HTML, and all that.

[+] pjmlp|4 years ago|reply

If using C is a must, having static analysis as part of CI/CD pipeline and using libraries like SDS should be considered a requirement.

Otherwise, yes using anything safer, where lack of bounds checking isn't considered a feature is a much better option.

[+] kloch|4 years ago|reply

I wrote my first web app in 2000 using C/mysql. It was Insanely fast but very awkward to implement. I used C because it was (and still is) the only language I knew well.

At least if you are going to use C, you (should) know to be extremely paranoid about how you process anything received from the user. That doesn't remove the risk but at least you are focused on it.

[+] lazyfuture|4 years ago|reply

Same guy wrote a rad tool that will generate server code, schema, frontend using a markup language called ORT.

Will generate Rust and typescript if ya want.

[+] synergy20|4 years ago|reply

well there are still many large software written in C, e.g. nginx, lighttpd, even linux kernel.

I checked BCHS a few years back, the key piece is that it's Openbsd, if it's Linux it might have caught on, due to linux's popularity, good or bad. This could be useful for embedded device for example, but not so many embedded devices running OpenBSD, if any at all.

[+] unknown|4 years ago|reply

[deleted]

[+] pcranaway|4 years ago|reply

Never heard of Carp, looks cool!

[+] jimbob45|4 years ago|reply

Why would you ever choose C anymore? The killer feature of C++ is “you don’t pay for what you don’t use”. There’s virtually no reason ever not to use C++.

[+] teleforce|4 years ago|reply

SQLite author is an avid Tcl user and he recently introduced a small, secure and modern CGI based web application called wapp [1],[2].

[1] Wapp - A Web-Application Framework for TCL:

https://wapp.tcl.tk/home

[2] EuroTcl2019: Wapp - A framework for web applications in Tcl (Richard Hipp):

https://www.youtube.com/watch?v=nmgOlizq-Ms

[+] adamrezich|4 years ago|reply

this is very cool. I only have a passing familiarity with Tcl, but I've been building my own toy web framework and this is a fantastic reference! they made a lot of the same choices I made API-wise but the way they went about it is worth studying.

[+] dmux|4 years ago|reply

I'd like to point out that Wapp doesn't necessarily need to be run as a plain-old CGI application, I've had success running it with it's own built in web-server behind NGINX, for example.

[+] theamk|4 years ago|reply

It seems pretty crazy to write web-facing apps in C, with no memory safety at all.

(They do have "pledge" but even in the most restricted case, this still leaves full access to database)

[+] rossy|4 years ago|reply

It seems like the database libraries they recommend for security, ksql and sqlbox, mitigate the risk with process separation and RBAC, so the CGI process doesn't have full access to the database.

It's definitely contrary to modern assumptions about web app security, but it's interesting to see web apps that are secure because they use OS security features as they were designed to be used, rather than web apps that do things that are insecure from an OS-perspective, like handling requests from multiple users in the same process, but are secure because they do it with safe programming languages.

[+] tyingq|4 years ago|reply

Though the majority of running web servers, load balancers, protocol proxies like php-fpm, etc, are probably written in C :)

[+] galdosdi|4 years ago|reply

Funny to reflect that there was a time not so long ago when writing web apps (CGI usually) in C wasn't at all unusual (shortly before Perl became much more popular for this). And today, it is indeed kind of crazy.

[+] rkeene2|4 years ago|reply

How about a web-facing she'll that allows arbitrary code execution ? [0]

There's nothing fundamentally insecure about allowing C or any arbitrary code to execute on behalf of a user -- this is basically what cloud computing (especially "serverless") is.

As you identify, though, you need a Controlled Interface (CI) which accounts for this model for all resources and all kinds of resources and many tools do not (yet) allow for it.

[0] https://rkeene.dev/js-repl/?arg=bash

[+] unknown|4 years ago|reply

[deleted]

[+] km|4 years ago|reply

Writing C might be challenging for some, but as others have mentioned, one can use some other language which gives a statically linked binary to place in the httpd chroot. It won’t be BCHS then.

For uptime.is I’ve used a stack which I’ve started calling BLAH because of LISP instead of C.

[+] jamal-kumar|4 years ago|reply

People love to talk all sorts of trash on this kind of stack but it's really quite solid for what it does. If anyone was ever curious what a sizeable codebase in this kind of code would even look like, check out the source code for undeadly.org [1]. Yeah these people may be crazy but they're also OpenBSD developers and we really love to see what we can get away with using nothing other than what's available in the base distribution. I think a lot of what you see being written for production ends up being very similar to this kind of approach, maybe just utilizing rust or golang as the web application backend language if that's what is the more comfortable thing. Nothing but the base system and a single binary, not relying on an entire interpreter stack, sure can be smooth.

There's other examples of this kind of approach, too, writing straight C Common Gateway Interface web applications in public-facing production use - What comes to mind is the version control system web frontend that the people who write wireguard use, cgit [2] - If it's really so crazy then how come the openbsd and wireguard people - presumably better hackers than you - are just out there doing it?

Other places you see C web application interfaces include in embedded devices (SCADA, etc) and even the web interfaces for routers, which unfortunately ARE crazy because check out all the security problems! Good thing people at our favorite good old research operating system have done the whole pledge(2)[3] syscall to try and mitigate things when those applications go awry - understanding this part of the whole stack is probably key to seeing how any of it makes any sense at all in 2022. It sure would be nicer if those programs just crashed instead of opening up wider holes. Maybe we can hope these mitigations and a higher code quality for limited-resource device constraints all become more widespread.

[1] http://undeadly.org/src/ [2] https://git.zx2c4.com/cgit/ [3] https://learnbchs.org/pledge.html

[+] foxfluff|4 years ago|reply

> If it's really so crazy then how come the openbsd and wireguard people - presumably better hackers than you - are just out there doing it?

Probably precisely because they're better? I can see why people who are struggling with malloc and off-by-ones (https://news.ycombinator.com/item?id=29990985) would think it's crazy.

[+] visireyi|4 years ago|reply

we really love to see what we can get away with using nothing other than what's available in the base distribution

pkg_add sqlite3

Can't get away.

[+] rnkn|4 years ago|reply

The Dunning-Kruger effect is stronger in people who spend a lot of time alone, e.g. programmers, which we will now see unfold below.

[+] unknown|4 years ago|reply

[deleted]

[+] petee|4 years ago|reply

Another great stack for writing C (or now python) is https://kore.io which offers quite a few helper features, and its easy to get started

[+] RcouF1uZ4gsC|4 years ago|reply

> How do I pronounce BCHS?

I think the correct pronunciation is “Breaches”. Using C in this place as other have mentioned is very, very likely to lead to security issues. Even C++, with its better string handling would be a step up.

[+] ThinkBeat|4 years ago|reply

I remember writing a lot of early web stuff in Perl/CGI. The "servers" I wrote were fast. Perl had most things you could desire built in already.

Database stuff took a good deal of doing, but with little in terms of abstraction, it was also quite fast.

I would like to see a rennescance of using different protocols than HTTP and different content markup than HTML.

[+] harryvederci|4 years ago|reply

Interesting CGI content linked on there.

I've been reading about / hacking on CGI recently, and it's been kinda fun!

Question: One thing I keep reading is how inefficient it is to start a new process for each incoming connection. Could someone explain to me why that's such a bottleneck? I imagine it being an issue back when CGI was used everywhere, people moving away from CGI, and forgetting about it. But hasn't there been improvements in the meantime? Computers from today can run circles around those from a few decades back. Has everything improved except the speed / efficiency of starting a new process?

(I don't have a computer science background, but I guess you could already tell from the above.)

[+] lelanthran|4 years ago|reply

> Interesting CGI content linked on there.

>

>I've been reading about / hacking on CGI recently, and it's been kinda fun!

>

>Question: One thing I keep reading is how inefficient it is to start a new process for each incoming connection. Could someone explain to me why that's such a bottleneck? I imagine it being an issue back when CGI was used everywhere, people moving away from CGI, and forgetting about it. But hasn't there been improvements in the meantime? Computers from today can run circles around those from a few decades back. Has everything improved except the speed / efficiency of starting a new process?

>

It's not as bad as you think it is; just change the webserver to pre-fork. From this link[1], and the nice summary table in this link[2] - I note the following:

1. pre-forked servers perform very consistently (the variation before being overwhelmed) and appears at a glance to only be less consistent than epoll.

2. For up to 2000 concurrent requests, the pre-forked server performed either within a negligible margin against the best performer, or was the best performer itself.

3. The threaded solution had the best graceful degradation; if a script was monitoring the ratio of successfull responses, it would know well beforehand that an imminent failure was coming.

4. The epoll solution is objectively the best, providing both graceful degradation as well as managing to keep up with 15k concurrent requests without complete failure.

With all of the above said, it seems that using CGI with a pre-forked server is the second best option you can choose.

I suppose that you then only have to factor in the execution of the CGI program (don't use Java, C#, Perl, Python, Ruby, etc - very slow startup times).

[1] https://unixism.net/2019/04/linux-applications-performance-i...

[2] https://unixism.net/2019/04/linux-applications-performance-p... 1.

[+] jim_lawless|4 years ago|reply

It's not just the start-up and shut-down costs. A CGI process might need to attain connections to databases or other resources that could be pooled and re-used if the process didn't completely terminate.

You might want to look at using FastCGI:

https://en.wikipedia.org/wiki/FastCGI

Basically, the CGI processes stay alive and the servers supporting FastCGI ( like Apache and nginx ) communicate with an existing FastCGI process that's waiting for more work, if available.

[+] tonyarkles|4 years ago|reply

I’m smiling at your question!

Yes, it’s less efficient than having a persistent server, but as all things are, it exists in a spectrum.

The load time for one of these processes is going to be almost trivial. I’m on mobile right now, but I would guess that it would be in a handful of milliseconds, especially when the binary is already in cache (due to other requests).

But if you want to compare this against a lot of the prevailing systems, it’ll still probably win on single request efficiency. Network hops, for example, are frequently quite slow and, if efficiency is your primary metric, should be avoided as much as possible. Things like Serverless go the opposite way and tore both your incoming request through a complex set of hops, and also your backend database requests.

[+] aidenn0|4 years ago|reply

Time a python program that imports a few things and then immediately exits. It's significantly more CPU time than you might think. If you use a language with fast startup times, preforking CGI servers can be quite fast.

[+] Zababa|4 years ago|reply

Lots of opinions but little facts in the comment. I'd love to see an experiment with people using that and their preferred web stack. Is this really slower to develop? By how much? Is this really unsecure? Is this really simpler, faster?

[+] exdsq|4 years ago|reply

I’d wager a good portion of my salary that a skilled BCHS developer is slower than a skilled Django/RoR developer to build a usual web app (with auth, payment gateways, admin panels, etc). Not to say BCHS doesn’t look like a laugh to use.

[+] da39a3ee|4 years ago|reply

I’d like to love man pages but

- I feel that they are linux only. On my MacOS system I can’t rely on man x being the man page for the right version of x. I know that in principle there are environment variables that make sure i’m getting the gnu core utils version or the base homebrew version rather than the system BSD version, but it’s too many moving parts. Furthermore even if I get it right, I can’t expect people I’m working with or mentoring to get it right, hence I can’t recommend man to them for documentation. God knows about man pages on Windows.

- I feel that a small amount of plain text documentation should be stored in the executable, not separately. Isn’t it a holdover from the vastly more constrained computing environments of the 70s and 80s that we’re keeping man pages separate from the executable? Its just asking to get out of sync / incorrectly paired up.

[+] tiffanyh|4 years ago|reply

s/C/NIM

Why don’t more folks use NIM for web development. Seems like the perfect blend of performance, ergonomics and productivity.

[+] 0xbadcafebee|4 years ago|reply

I have written web applications in a lot of languages, including C. C was the worst.

[+] dreamsbythelake|4 years ago|reply

What a coincidence! Lovely topic, even registered account for this :-)

I _just_finished_ my own comparative benchmarks to (re)check my projects from ~7 years ago, all in similar stack.

Back then I wrote the logic as Apache modules, in C. It was using Cairo to draw charts (surprisingly, the traces of trigonometry knowledge was enough for me to code that :-), and I had absolutely crazy "hybrids" of bubble charts with bars, alpha channel overlays etc. It was extremely useful for my projects back then and I never seen any library, able to produce what I "tailored" ...)

The 7-years-ago end-to-end page generation time was ~300 mcs (1e-6 sec), with graphics, data store IO and request processing, preparing the "bucket brigade" and passing it down the Apache chain.

This Jan I re-visited my code and implemented logic for OpenBSD httpd as:

** 1) Open BSD httpd "patch" to hijack the request processing internally, do necessary data and graph ops and push the result into Bufferevent buffer directly, before httpd serves it up to the client.

** 2) FCGI responder app, talking to httpd over unix socket. BTW: this is most secure version I know of, I could chroot / pledge / unveil and, IMO, it beats SELinux and anything else.

3) CGI script in ksh<=>slowcgi<=>FCGI=>httpd

4) CGI program (statically linked) in pure C<=>slowcgi<=>FCGI=>httpd

5) PHP :-) page (no frameworks)<=>php-fpm (with OpCache)<=>FCGI=>httpd

To my extreme surprise, the outcome was clear - it did not matter what I wrote my logic in, _anything today_ (including CGI shell script) is so fast, that 90% of time was spent on Network communication between the WebServer and the Browser. (And with TLS it is like 2x penalty ...)

All options above gave me end-to-end page generation time about 1-1.5 ms.

Guess what? Beyond "Hello World", with page size of 500Kb+, PHP was faster than anything else, including native "httpd patch" in C.

As side effect, I also confirmed that Libevent-based absolutely gorgeous OpenBSD httpd works slightly slower than standard pre-fork Apache httpd from pkg_add. (It gave me sub-ms times, just like 7 years ago)

Who would say ...

What also happened is that any framework (PHP or I even tried nodejs) or writing CGI in Python increased my end-to-end page generation time 10x, to double-digit ms.

I remember last week someone here was talking about writing business applications / servers for clients in C++, delivering them as single executable file.

I would be very interested to hear how that person's observations correlate with mine above.

G'day everyone!

[+] exdsq|4 years ago|reply

Is anyone using this for anything? I'd love to know!

[+] bitfoxtop|4 years ago|reply

for this old environment, why not perl but C?

[+] jolux|4 years ago|reply

Parsing untrusted input in C never hurt anyone, did it?

[+] guggle|4 years ago|reply

If you're going to promote a stack, try at least to showcase all its components in the first example you give. Where is the SQLite part in your "BSD, C, httpd, SQLite" ? https://learnbchs.org/easy.html

Hello world apps don't mean much.

[+] edfletcher_t137|4 years ago|reply

This feels like an unreasonable eschewing of all the advancements in programmer ergonomics & tooling that have been made over the course of decades.

"Just because you can, doesn't mean you should."

[+] pull_my_finger|4 years ago|reply

Imagine if you were a C developer who needed to create some web do-dads, this is probably a fantastic stack. If there was 1 right solution for the perfect stack we'd all be using it.

[+] Ostrogodsky|4 years ago|reply

> "Just because you can, doesn't mean you should."

Ironically I could say the same about the JS ecosystem.

149 comments