top | item 43643793

(no title)

waddlesplash | 10 months ago

> with new techniques and materials on top of old work done in a way that was usual at the time.

But is there any long-lived project for which this isn't true? Linux and the BSDs surely have many components that fall into this category.

> For example there's BString and BList.

BString is a much nicer string class to work with (IMO) than std::string. It lacks some modern conveniences, and it has some unfortunate footguns where some APIs return bytes and some return UTF-8 characters (the former should probably all be considered deprecated, indeed that's a BeOS holdover), but I don't think there's any intent to drop it.

BList could be better as well, but it's still a nicer API in many ways than std::vector. Our other homegrown template classes also are nicer or have particular semantics we want that the STL classes don't, so I don't think we'd ever drop them.

> Haiku also has seams of BSD code where there'd be a project to do Whatever (WiFi, TLS, drivers, etc.) "properly" in a way unique to Haiku

What would be the point of implementing WiFi drivers from scratch "uniquely" for Haiku? Even FreeBSD has started just copying drivers from Linux, so that may be in our future as well. I don't know that anyone ever really considered writing a whole 802.11 stack for Haiku; there was some work on a "native" driver or two at one point, but it was for hardware that we didn't have support for from the BSDs, and it still used the BSD 802.11 stack. Writing our own drivers there just seems like a waste of time; we might as well contribute to the BSD ones instead.

discuss

order

tialaramex|10 months ago

> But is there any long-lived project for which this isn't true?

I don't think any other project like this exists. You're coming up on your 25th anniversary without shipping the release software !

I see that BString itself also uses this weird phrase "UTF-8 character". That's not a thing, and rather than just being technically wrong it's so weird I can't tell what the people who made it thought they meant or what the practical consequences might be.

I mean, it can't be worse than std::string in one sense because hey at least it picked... something. But if I can't figure out what that is maybe it's not better.

UTF-8 has code units, but they're one byte, so distinguishing them from bytes means either you're being weird about what a "byte" is or more likely you don't mean code units.

Unicode has characters, but well lets quote their glossary: "(1) The smallest component of written language that has semantic value; refers to the abstract meaning and/or shape, rather than a specific shape (see also glyph), though in code tables some form of visual representation is essential for the reader’s understanding. (2) Synonym for abstract character. (3) The basic unit of encoding for the Unicode character encoding. (4) The English name for the ideographic written elements of Chinese origin. [See ideograph (2).]"

So given BString is software it's probably working in terms of something concrete. My best guesses (plural, like I said, I'm not sure and I'm not even sure the author realised they needed to decide):

1. UTF16 code units. This is the natural evolution of software intended for UCS-2 in a world where that's not a thing, our world.

2. Unicode code points. If you were stubbornly determined to keep doing the same thing despite the fact UCS2 didn't happen, you might get here, which is tragic

3. Unicode scalar values. Arguably useful, although in an intensely abstract way, the closest thing a bare metal language might attempt as a "character"

4. Graphemes. Humans think these are a reasonable way to cut up written language, which is a shame because machines can't necessarily figure out what is or is not a grapheme. But maybe the software tries to do this? There have been better and worse attempts.

I don't love std::vector but I can't see anything to recommend BList at all, it's all using type erased pointers, it doesn't have the correct reservation API, it provides its own weird sorting - which doesn't even say whether it's a stable sort,

waddlesplash|10 months ago

It's Unicode code points. I don't know why you say this is "tragic", it's a logical unit to work in here.

GoblinSlayer|10 months ago

I suppose it means text encoding is known to be UTF-8.