> Additionally, in C++, requesting that heap memory also requires a syscall every time the container geometrically changes size
That is not true - no allocator I know of (and certainly not the default glibc allocator) allocates memory in this way. It only does a syscall when it doesn’t have free userspace memory to hand out but it overallocates that memory and also reuses memory you’ve already freed.
There's a time-space tradeoff to basically any means of error checking.
Including checking return codes instead of exceptions. It's even possible for exceptions as implemented by g++ in the Itanium ABI to be cheaper than the code that would be used for consistently checking return codes.
Nice work and good writeup. I think most of that is very sound practice.
The codegen switch with the offsets is in everything, first time I saw it was in the Rhino JS bytecode compiler in maybe 2006, written it a dozen times since. Still clever you worked it out from first principles.
There are some modern C++ libraries that do frightening things with SIMD that might give your bytestring stuff a lift on modern stupid-wide high mispredict penalty stuff. Anything by lemire, stringzilla, take a look at zpp_bits for inspiration about theoretical minimum data structure pack/unpack.
But I think you got damn close to what can be done, niiicccee work.
Overall, this is a fantastic dive into some of RISC-V's architecture and how to use it. But I do have some comments:
> However, in Chata's case, it needs to access a RISC-V assembler from within its C++ code. The alternative is to use some ugly C function like system() to run external software as if it were a human or script running a command in a terminal.
Have you tried LLVM's C++ API [0]?
To be fair, I do think there's merit in writing your own assembler with your own API. But you don't necessarily have to.
I'm not likely to go back to assembly unless my employer needs that extra level of optimization. But if/when I do, and the target platform is RISC-V, then I'll definitely consider Ultraseembler.
> It's not clear when exactly exceptions are slow. I had to do some research here.
There are plenty of cppcon presentations [1] about exceptions, performance, caveats, blah blah. There's also other C++ conferences that have similar presentations (or even, almost identical presentations because the presenters go to multiple conferences), though I don't have a link handy because I pretty much only attend cppcon.
You might look into using memory mapped IO for reading input and writing your output files. This can save some memory allocations and file read and write times. I did this with a project where I got more than 10x speed up. For many cases file IO is going to be your bottleneck.
Exactly. I don’t know too many assembly language programmer's who are griping about slow tools, particularly on today’s hardware. Yea, Orca/M on my old Apple II with 64k RAM and floppy drives was pretty slow, but since then not so much. But sure, as a fun challenge to see how fast you can make it run, go for it.
ASM should compile at hundreds of MB/s. All the ASM you could write in your entire life will compile instantly. There is no one in decades that has thought their assembler is too slow.
I wonder if you thought about perfect hashing instead of that comparison tree. Also, flex (as in flex and bison) can generate what amounts to trees like that, I believe. I haven't benchmarked it compared to a really careful explicit tree though.
I also have to ask where all this assembly code is coming from, that has to be compiled fast. If it's compiler output, maybe you could hack the compiler back end to generate tokenized assembly code, eliminating a lot of scanning and stuff. It would still be human readable through a simple program that converted the tokens back to mnemonics. The tokens could be 4 digit hex numbers or something like that, so it would still be an easily handled text file.
Lots of simple compilers generate object code directly instead of assembly code, so the above is not so bad by comparison.
vlovich123|6 months ago
That is not true - no allocator I know of (and certainly not the default glibc allocator) allocates memory in this way. It only does a syscall when it doesn’t have free userspace memory to hand out but it overallocates that memory and also reuses memory you’ve already freed.
idiomat9000|6 months ago
aidenn0|6 months ago
mpyne|6 months ago
Including checking return codes instead of exceptions. It's even possible for exceptions as implemented by g++ in the Itanium ABI to be cheaper than the code that would be used for consistently checking return codes.
netr0ute|6 months ago
By definition, that's zero-overhead because Ultrassembler doesn't care about space.
netr0ute|6 months ago
Feel free to ask me any questions to break the radio silence!
benreesman|6 months ago
The codegen switch with the offsets is in everything, first time I saw it was in the Rhino JS bytecode compiler in maybe 2006, written it a dozen times since. Still clever you worked it out from first principles.
There are some modern C++ libraries that do frightening things with SIMD that might give your bytestring stuff a lift on modern stupid-wide high mispredict penalty stuff. Anything by lemire, stringzilla, take a look at zpp_bits for inspiration about theoretical minimum data structure pack/unpack.
But I think you got damn close to what can be done, niiicccee work.
inetknght|6 months ago
> However, in Chata's case, it needs to access a RISC-V assembler from within its C++ code. The alternative is to use some ugly C function like system() to run external software as if it were a human or script running a command in a terminal.
Have you tried LLVM's C++ API [0]?
To be fair, I do think there's merit in writing your own assembler with your own API. But you don't necessarily have to.
I'm not likely to go back to assembly unless my employer needs that extra level of optimization. But if/when I do, and the target platform is RISC-V, then I'll definitely consider Ultraseembler.
> It's not clear when exactly exceptions are slow. I had to do some research here.
There are plenty of cppcon presentations [1] about exceptions, performance, caveats, blah blah. There's also other C++ conferences that have similar presentations (or even, almost identical presentations because the presenters go to multiple conferences), though I don't have a link handy because I pretty much only attend cppcon.
[0]: https://stackoverflow.com/questions/10675661/what-exactly-is...
[1]: https://www.youtube.com/results?search_query=cppcon+exceptio...
NooneAtAll3|6 months ago
If I'm honest, I've never looked into pmr, but I always thought that that's where std has arena allocators and stuff
https://en.cppreference.com/w/cpp/header/memory_resource.htm...
msla|6 months ago
jclarkcom|6 months ago
IshKebab|6 months ago
I feel like this might mostly be useful as a reference, because currently RISC-V assembly's specification is mostly "what do GCC/Clang do?"
drob518|6 months ago
CyberDildonics|6 months ago
benreesman|6 months ago
throwaway81523|6 months ago
netr0ute|6 months ago
Sesse__|6 months ago
StilesCrisis|6 months ago
throwaway81523|6 months ago
Lots of simple compilers generate object code directly instead of assembly code, so the above is not so bad by comparison.
unknown|6 months ago
[deleted]
stuaxo|6 months ago