top | item 32174986

(no title)

ttgurney | 3 years ago

Funny seeing this here as I've been thinking a lot about text-based browsers lately. Just a couple days ago I tried to build this one from source, but I put it aside due to the dependencies on PCRE and a JavaScript engine. (I am running a hand-rolled Linux "distro" so I can't just install ready-made binary packages.)

I do really appreciate that this one uses libcurl on the backend. Surprisingly few browsers do this--Lynx, Links, and w3m all have their own networking code. They have bespoke HTML parsing and rendering as well. I'm lately thinking I want to see a text-mode browser that just glues together libcurl, curses, simple HTML rendering, and maybe an existing HTML parsing library. No text-based HTML rendering library exists that I'm aware of.

Also these classic text browsers have their own implementations of FTP, NNTP, and some other legacy cruft. I'm thinking most of this could easily be provided by libcurl (if at all).

discuss

order

shiomiru|3 years ago

> I'm lately thinking I want to see a text-mode browser that just glues together libcurl, curses, simple HTML rendering, and maybe an existing HTML parsing library.

I had a similar idea a while ago, except mine was to glue together components from the nim stdlib.

So I wrote something like that, then I thought "hey, why not implement some CSS too?" and that sent me down the rabbit hole of writing an actual CSS-based layout engine... I eventually also realized that the stdlib html parser is woefully inadequate for my purposes.

In the end, I wrote my own mini browser engine with an HTML5 parser and whatnot. Right now I'm trying to bring it to a presentable state (i.e. integrate libcurl instead of using the curl binary, etc.) so I can publish it.

Anyways, if there's a moral to this story it's that writing a browser engine is surprisingly fun, so go for it :)

ttgurney|3 years ago

I look forward to seeing your 1st release of this program!

> Anyways, if there's a moral to this story it's that writing a browser engine is surprisingly fun, so go for it :)

Good to know. I'd been fairly intimidated by the idea.

augusto-moura|3 years ago

It depends on quickjs for the JavaScript implementation, which should be fairly simpler to compile on a hand rolled Linux. I'm not so sure about PCRE though

ttgurney|3 years ago

Oh I'm sure the actual work to compile those packages is not much. It's more to do with keeping the number of packages on my system to a minimum.

Actually I would not be surprised if the JavaScript engine can be omitted with just a little bit of patching work... assuming there's not actually a build configuration that leaves it out. I've found that with some software projects and their dependencies, "required" does not always mean required.

smaudet|3 years ago

Call it Unixy or something - unix philosophy of having each program do something separate.

Makes more sense, that's what this guy does anyways with the js engine?

> Surprisingly few browsers do this--Lynx, Links, and w3m all have their own networking code

I think people are suspicious of curl because it is a common utility, and they think it can't possibly have got it right - plus there's something mildly fun about figuring out how to monitor a socket and send/receive IP packets for the first time.

I have played around a bit with the Curl code a bit, in part I also suspect other programs do it to get "closer" i.e. being able to manage/dispatch events from a thread directly instead of some signal from a curl thread, probably something about security and thread safety too...

shiomiru|3 years ago

The main reason for the aforementioned browsers not using libcurl is mostly historical, as it simply didn't exist back when they were created. (The newest of them is links, first released in 1999 - and according to the curl website, the first libcurl release with a proper interface was in 2000.)

w3m even uses its own regex engine for search, because there was no free regex engine with Japanese support the author could've used back then.

1vuio0pswjnm7|3 years ago

https://github.com/google/oss-fuzz-vulns/tree/main/vulns/cur...

https://github.com/curl/curl/commit/68ffe6c17d6e44b459d60805...

https://www.cvedetails.com/product/25084/Haxx-Curl.html?vend...

Instead of only "thinking a lot about text-based browsers", I have been actively using them on a daily basis for the past 26 years.

Links already uses ncurses. I am glad that it does not use libcurl and that it has its own "bespoke" HTML rendering. In over 25 years time, I still have yet to see any other program produce better rendering of HTML tables as text. I have had few if any problems with Links versions over the years. I am quite good at "breaking" software and for me Links has been quite robust. The source code is readable for me and I have been able to change or "fix" things I do not like, then quickly recompile. I can remove features. Recently I fixed a version of the program so that a certain semantic link would not be shown in Wikipedia pages. No "browser extension" required.

Links' rendering has managed to keep up with the evolution of HTML and web design sufficiently for me. Despite the enormous variation in HTML acrosse the www, there are very few cases where the rendering is unsatisfactory.^1 I cannot say the same for other attempts at text-only clients. W3C's libwww-based line-mode browser still compiles and works,^2 although I would not be satisifed with its rendering. Nor would I be satisfied with edbrowse, or something simpler such as mynx.^3

I use Links primarily for reading and printing HTML. I use a variety of TCP clients for making HTTP requests, including djb's tcpclient which I am quite sure beats libcurl any day of the week in terms quality, e.g., the programming skill level of the author and the care with which it was written. This non-libcurl networking code is relatively small and does not need oss-fuzz. I do not intentionally use libcurl. It is too large and complex for my tastes. For TLS, I mainly use stunnel and haproxy.

1. One rare example I can recall is https://archive.is

2. https://github.com/w3c/libwww

3. https://github.com/SirWumpus/ioccc-mynx

ttgurney|3 years ago

Hey thanks for your perspective and a couple of mentions of software I'd not heard of (like tcpclient).

I agree that curl is pretty big and bloated. I would not call it a deficiency that Links et al. don't depend on it.

I mostly just was thinking that since I already have curl on my system, it'd be nice to have a browser that reuses that code. Especially since curl has upstream support for the much smaller BearSSL rather than depending on OpenSSL/LibreSSL.

marttt|3 years ago

Interesting post, many thanks. What's your view on w3m, as compared to the others you mention? (Side note: I'm a daily w3m user.)