dumael's comments

dumael | 2 years ago | on: USPS facility in Utah does nothing but decipher handwriting

An aside but "Making Money" by Terry Pratchett references a similar sub-office of the Post Office in Ankh-Morpork--the Blind Letter Office. Amusingly, Lord Vetinari is able to assist with some as their addresses are mis-spelled vague directions rather than strict addresses.

dumael | 3 years ago | on: UK's controversial plan to deport asylum seekers to Rwanda ruled lawful by court

The Evening Standard (https://www.standard.co.uk/news/uk/rwanda-asylum-deal-uk-asy...) quotes 200 people from the Rwandan Government, "but that it could be scaled up".

A research briefing document from the House Of Common Library (https://researchbriefings.files.parliament.uk/documents/CBP-...) states that the Memorandum of Understanding on which the scheme relies states no upper limit on the amount of people that can be sent to Rwanda.

dumael | 3 years ago | on: UK's controversial plan to deport asylum seekers to Rwanda ruled lawful by court

rescue.org (https://www.rescue.org/uk/article/why-uk-government-should-r... section 5) reports GBP 140m so far, GBP 120m on up front costs along with The Times reporting a further payment of GBP 20m.

The Migration Observatory (https://migrationobservatory.ox.ac.uk/resources/commentaries... costs section) also cites the GBP 120m figure, along with a citation of Select committee evidence that there will be further per-person costs to HMG.

dumael | 4 years ago | on: What Every C Programmer Should Know About Undefined Behavior

'signed' and 'unsigned' on their own--act as short-hand for 'signed int' and 'unsigned int' in C and C++.

Note that the size of an 'int' is dependant on the "data model"[1]. As for whether it's the most optimal is far too context dependant.

A data model that defaults 'int's to 32 bits on today's (and yesterday's) architectures is fine in many cases as the range of that type is acceptable for most usages without excessive wastage.

Certain data models do specify that 'int' is 64 bits which can break some programmer's assumptions, and also lead to space wastage as a struct member or a stack slot has to have 64 bits allocated for it on paper.

Data models are part of the ABI your program uses, so it's not necessarily optimal for any given system.

[1] Wikipedia has a table summarizing some of the differences: https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_m...

dumael | 4 years ago | on: Architecture of the Playstation

> This is a great overview. I don't remember having to put in padding instructions to prevent the pipeline issues mentioned here; maybe we just never ran into that. (I wrote pretty much all the R3000 code for Crash 1 and just do not recall problems like that coming up.)

If you were using the GNU assembler, it automatically fills branch delay slots with nop instructions unless you prefix assembly code as using `.set noreorder`. GAS would also handle load delay slots as well.

dumael | 7 years ago | on: MIPS Goes Open Source

MIPS has what's called an Application Specific Extension (ASE) basis for extending a given MIPS core for particular areas.

The MIPS DSP ASE extends the base instruction set with certain instructions applicable to various codecs of the day that the ASE was defined for. It's essentially extending a general purpose cpu to efficiently perform DSP like tasks.

dumael | 7 years ago | on: MIPS Goes Open Source

It would be more likely that nanoMIPS would be considered for open sourcing if that the implementation was to be made open source. Otherwise it would be the fobbing off of releasing the code to the InterAptiv which has MIPS16e(2) support.

dumael | 7 years ago | on: MIPS Goes Open Source

MIPSR6 does away with the HI/LO registers and have multiplication instructions which return the result to GPR registers.

Pre-R6 MIPS cores have the MUL instruction which hide the usage of the HI/LO registers, but do clobber those registers.

dumael | 7 years ago | on: MIPS Goes Open Source

nanoMIPS doesn't have delay slots IIRC. microMIPSR6 also deprecated delay slots for branches. MIPSR6 got rid of delay slots for a family of branches called 'compact branches' which have 'forbidden slots' which require that back-to-back branches be separated by a nop or other instruction.

dumael | 9 years ago

GC write barriers are implemented by expanding sequences that update pointers to perform some sort of additional action.

Their purpose is to capture some information about the updated pointer that the GC can then use to avoid a full heap scan. Card marking marks a 'region' as dirty (such as 128/256/512 bytes of memory). This buffer recording the dirty/clean areas is rescanned as part of the evacuation of the generation being collected. Any pointers to the generation then being collected are updated.

Sequential store buffers (SSBs) can be used to record the address of the object being updated or a pointer to the pointer itself. Again rescanned during collection. SSBs can be easily thread-local avoiding the need for thread synchronisation, except during a collection cycle which already requires thread synchronisation.

Write barrier tend to be optimised heavily as they're used quite frequently. The use of atomic barriers or branch instructions (barring fast path exits) would inhibit performance.

dumael | 9 years ago | on: IR is better than assembly (2013)

Almost every modern compiler uses some form of intermediate representation. The choice of IR is shaped by history and design. As the posted article shows, LLVM uses a SSA based IR to describe programs. GCC in contrast uses 2 IRs, GIMPLE and a LISP based IR called RTL. GHC uses Core Haskell (Haskell without the syntactic sugar).

The purpose of every IR is to remove the ambiguities and language complexities of programs. By simplifying programs into series of statements such as "%3 = op $type %1, %2", generic optimisers can be built easily. Certain language specific optimizations can be written for the frontend of the compiler as they have knowledge of the language being compiled. Generic LLVM-IR may not be optimised to deal with issues such as devirtualization in C++ (though there is work being done in that area).

LLVM's IR undergoes fairly occurrent changes to better handle "new" problems.

dumael | 9 years ago | on: Questions about Superoptimization

> It's pretty cool that clang uses this when it knows the value in the first argument is byte-sized.

Clang is using the 8 bit subregister due to how it legalizes types.

When LLVM-IR is compiled for a target, it undergoes a process called "legalization" where invalid operations for the target are Expanded (replace the invalid operation with a semantically equivalent but legal series of operations), or Promoted (e.g. promote operations on boolean types to character types), Libcall (call out to the likes of libgcc.a) or Legal where the target directly supports the operation.

Since X86(_64) supports 8, 16, 32 (and 64) bit register accesses and operations, operations on variables of those sizes will be matched to the corresponding operation and register sizes.

If you were to compile that code for the likes of MIPS, ARM or PowerPC you'd see fully 32 bit code.

dumael | 9 years ago | on: Bone Lisp – Lisp Without Garbage Collection

An aside:

Tolpin and Toft designed an extension to the functional language ML which used region based memory managed instead of traditional garbage collection for ML.

This lifted the lifetimes of variables into ML's type system (!!!) while the underlying implementation IIRC could achieve O(1) memory behaviour except when an exception occurred.

While this sounds amazing, there were draw-backs on the implementation / theory as certain optimisations were near necessary to get good performance. I.E. word sized integers had to live in the heap as opposed to registers. Another issue was that loops had to be restructured from idiomatic ML style to a slightly different one, other the region inference logic would cause O(N) allocations in a loop which would otherwise use O(1) allocations.

http://www.elsman.com/pdf/retro.pdf

or "Tauplin and Toft region based memory management retrospective" should lead you to the paper.

dumael | 10 years ago | on: Intel's Changing Future: Smartphone SoCs Broxton and SoFIA Officially Cancelled

> At the time Intel had already introduced some mobile silicon, but there was little uptake. So they were iterating; they wanted to improve for each succeeding generation. But they had a kind of design-by-committee process. One person or group wanted a certain feature, another group wanted something else, a third group though that yet another thing was important. And so on. Sorry if that sounds vague, I won't write anything more specific.

> The end result was a chipset that had a lot of features. A LOT OF FEATURES. Gold plated features. But that meant higher power consumption than the competition, higher cost, larger form factor, longer time to market.

I have some experience in this field, and this sounds utterly bizarre. Most of the customers are fairly selective in what bits they want, so provided everything (including the kitchen sink) in a product is useless.

Being able to comfortably ship any variant of your SoC without certain parts is important.

page 1