dumael's comments

dumael | 1 year ago | on: Detecting a PS2 Emulator: When 1*X does not equal X

The MIPS-I chip also had load delay slots along with branch delay slots.

dumael | 2 years ago | on: USPS facility in Utah does nothing but decipher handwriting

An aside but "Making Money" by Terry Pratchett references a similar sub-office of the Post Office in Ankh-Morpork--the Blind Letter Office. Amusingly, Lord Vetinari is able to assist with some as their addresses are mis-spelled vague directions rather than strict addresses.

dumael | 3 years ago | on: UK's controversial plan to deport asylum seekers to Rwanda ruled lawful by court

The Evening Standard (https://www.standard.co.uk/news/uk/rwanda-asylum-deal-uk-asy...) quotes 200 people from the Rwandan Government, "but that it could be scaled up".

A research briefing document from the House Of Common Library (https://researchbriefings.files.parliament.uk/documents/CBP-...) states that the Memorandum of Understanding on which the scheme relies states no upper limit on the amount of people that can be sent to Rwanda.

dumael | 3 years ago | on: UK's controversial plan to deport asylum seekers to Rwanda ruled lawful by court

rescue.org (https://www.rescue.org/uk/article/why-uk-government-should-r... section 5) reports GBP 140m so far, GBP 120m on up front costs along with The Times reporting a further payment of GBP 20m.

The Migration Observatory (https://migrationobservatory.ox.ac.uk/resources/commentaries... costs section) also cites the GBP 120m figure, along with a citation of Select committee evidence that there will be further per-person costs to HMG.

dumael | 4 years ago | on: What Every C Programmer Should Know About Undefined Behavior

'signed' and 'unsigned' on their own--act as short-hand for 'signed int' and 'unsigned int' in C and C++.

Note that the size of an 'int' is dependant on the "data model"[1]. As for whether it's the most optimal is far too context dependant.

A data model that defaults 'int's to 32 bits on today's (and yesterday's) architectures is fine in many cases as the range of that type is acceptable for most usages without excessive wastage.

Certain data models do specify that 'int' is 64 bits which can break some programmer's assumptions, and also lead to space wastage as a struct member or a stack slot has to have 64 bits allocated for it on paper.

Data models are part of the ABI your program uses, so it's not necessarily optimal for any given system.

[1] Wikipedia has a table summarizing some of the differences: https://en.wikipedia.org/wiki/64-bit_computing#64-bit_data_m...

dumael | 4 years ago | on: Architecture of the Playstation

> This is a great overview. I don't remember having to put in padding instructions to prevent the pipeline issues mentioned here; maybe we just never ran into that. (I wrote pretty much all the R3000 code for Crash 1 and just do not recall problems like that coming up.)

If you were using the GNU assembler, it automatically fills branch delay slots with nop instructions unless you prefix assembly code as using `.set noreorder`. GAS would also handle load delay slots as well.

dumael | 7 years ago | on: What every compiler writer should know about programmers (2015) [pdf]

Indeed, the likes of x86 little endian assumptions of programmers (even compiler engineers) can be hilarious (in the sad clown way) when targeting something like MIPS64 big endian.

dumael | 7 years ago | on: Brexit: MPs reject Theresa May's deal by 149 votes

> There is no legal reason why ministers cannot reject Brexit;

There is, for a revocation of article 50 notice to be legal, Parliament has to pass a motion directing the Government to do so[1].

[1] https://ukandeu.ac.uk/revoking-article-50-after-the-ecjs-rul...

dumael | 7 years ago | on: Brexit Deal Fails in Parliament

Nit: It's Parliament cannot bind a future Parliament rather than UK governments cannot bind future UK governments.

https://www.parliament.uk/about/how/role/sovereignty

The subtly here is that usually a government commands an overall majority of Parliament, but weak/minority governments have to worry about government rebels.

dumael | 7 years ago | on: MIPS Goes Open Source

MIPS has what's called an Application Specific Extension (ASE) basis for extending a given MIPS core for particular areas.

The MIPS DSP ASE extends the base instruction set with certain instructions applicable to various codecs of the day that the ASE was defined for. It's essentially extending a general purpose cpu to efficiently perform DSP like tasks.

dumael | 7 years ago | on: MIPS Goes Open Source

It would be more likely that nanoMIPS would be considered for open sourcing if that the implementation was to be made open source. Otherwise it would be the fobbing off of releasing the code to the InterAptiv which has MIPS16e(2) support.

dumael | 7 years ago | on: MIPS Goes Open Source

MIPSR6 does away with the HI/LO registers and have multiplication instructions which return the result to GPR registers.

Pre-R6 MIPS cores have the MUL instruction which hide the usage of the HI/LO registers, but do clobber those registers.

dumael | 7 years ago | on: MIPS Goes Open Source

nanoMIPS doesn't have delay slots IIRC. microMIPSR6 also deprecated delay slots for branches. MIPSR6 got rid of delay slots for a family of branches called 'compact branches' which have 'forbidden slots' which require that back-to-back branches be separated by a nop or other instruction.

dumael | 9 years ago

GC write barriers are implemented by expanding sequences that update pointers to perform some sort of additional action.

Their purpose is to capture some information about the updated pointer that the GC can then use to avoid a full heap scan. Card marking marks a 'region' as dirty (such as 128/256/512 bytes of memory). This buffer recording the dirty/clean areas is rescanned as part of the evacuation of the generation being collected. Any pointers to the generation then being collected are updated.

Sequential store buffers (SSBs) can be used to record the address of the object being updated or a pointer to the pointer itself. Again rescanned during collection. SSBs can be easily thread-local avoiding the need for thread synchronisation, except during a collection cycle which already requires thread synchronisation.

Write barrier tend to be optimised heavily as they're used quite frequently. The use of atomic barriers or branch instructions (barring fast path exits) would inhibit performance.

dumael | 9 years ago | on: IR is better than assembly (2013)

Almost every modern compiler uses some form of intermediate representation. The choice of IR is shaped by history and design. As the posted article shows, LLVM uses a SSA based IR to describe programs. GCC in contrast uses 2 IRs, GIMPLE and a LISP based IR called RTL. GHC uses Core Haskell (Haskell without the syntactic sugar).

The purpose of every IR is to remove the ambiguities and language complexities of programs. By simplifying programs into series of statements such as "%3 = op $type %1, %2", generic optimisers can be built easily. Certain language specific optimizations can be written for the frontend of the compiler as they have knowledge of the language being compiled. Generic LLVM-IR may not be optimised to deal with issues such as devirtualization in C++ (though there is work being done in that area).

LLVM's IR undergoes fairly occurrent changes to better handle "new" problems.

dumael | 9 years ago | on: Questions about Superoptimization

> It's pretty cool that clang uses this when it knows the value in the first argument is byte-sized.

Clang is using the 8 bit subregister due to how it legalizes types.

When LLVM-IR is compiled for a target, it undergoes a process called "legalization" where invalid operations for the target are Expanded (replace the invalid operation with a semantically equivalent but legal series of operations), or Promoted (e.g. promote operations on boolean types to character types), Libcall (call out to the likes of libgcc.a) or Legal where the target directly supports the operation.

Since X86(_64) supports 8, 16, 32 (and 64) bit register accesses and operations, operations on variables of those sizes will be matched to the corresponding operation and register sizes.

If you were to compile that code for the likes of MIPS, ARM or PowerPC you'd see fully 32 bit code.

dumael | 9 years ago | on: Bone Lisp – Lisp Without Garbage Collection

An aside:

Tolpin and Toft designed an extension to the functional language ML which used region based memory managed instead of traditional garbage collection for ML.

This lifted the lifetimes of variables into ML's type system (!!!) while the underlying implementation IIRC could achieve O(1) memory behaviour except when an exception occurred.

While this sounds amazing, there were draw-backs on the implementation / theory as certain optimisations were near necessary to get good performance. I.E. word sized integers had to live in the heap as opposed to registers. Another issue was that loops had to be restructured from idiomatic ML style to a slightly different one, other the region inference logic would cause O(N) allocations in a loop which would otherwise use O(1) allocations.

http://www.elsman.com/pdf/retro.pdf

or "Tauplin and Toft region based memory management retrospective" should lead you to the paper.

dumael | 10 years ago | on: Intel's Changing Future: Smartphone SoCs Broxton and SoFIA Officially Cancelled

> At the time Intel had already introduced some mobile silicon, but there was little uptake. So they were iterating; they wanted to improve for each succeeding generation. But they had a kind of design-by-committee process. One person or group wanted a certain feature, another group wanted something else, a third group though that yet another thing was important. And so on. Sorry if that sounds vague, I won't write anything more specific.

> The end result was a chipset that had a lot of features. A LOT OF FEATURES. Gold plated features. But that meant higher power consumption than the competition, higher cost, larger form factor, longer time to market.

I have some experience in this field, and this sounds utterly bizarre. Most of the customers are fairly selective in what bits they want, so provided everything (including the kitchen sink) in a product is useless.

Being able to comfortably ship any variant of your SoC without certain parts is important.

dumael | 10 years ago | on: Skylake's Linux power management is dreadful you shouldn't buy until it's fixed

My old Q6600 could do this to a degree. Sometimes the fan wouldn't spin up on boot, so there was just passive cooling. i'd try playing a game/doing something and performance would start digging it's way to China.

At which point I'd take the side of my PC off, and manually spin the fan until it got the idea.

dumael | 10 years ago | on: “Fiercely resist any further broadening of the scope of the C UB problem”

I've dealt with bugs that existed at -O0, -O2 was fine. But the bug was in the compiler itself.