The big issue is that it doesn't really need to be a GPR. You never find yourself using the PC in instructions other than in, say, the occasional add instruction for switch case jumptables, or pushes / pops. So it ends up wasting instruction space, when you could've had an additional register, or encoded a zero register (which is what AARCH64 does nowadays).
PC-relative instructions is how ARM loads 32-bit literals. "ldr,=label" is a psuedoinstruction that becomes relative to the PC, and it picks the nearest literal pool as the acutal address to read from.
Funnily enough, for my own toy CPU project I elected to make the program counter a GPR, both readable and writable - because it allowed me to eliminate jump and branch instructions.
Reads from the PC return the address of the next instruction to be executed, so a simple exchange between two registers performs the branch, and supplies the return address. (I did end up special-casing the add instruction so that when adding to the PC the return address ends up in the source register.)
But it is (or was originally) used in lots of places, not just jump tables, generally to do relative addressing, for example when you want to refer to data nearby, e.g.
The thing is... it isn't just another regular register. Reads from the PC are 8 bytes ahead, because that's where the PC is in the pipeline at read time. Branches are PC+8 relative for the same reason.
It's a design choice that makes sense in the classic RISC world where pipeline details are leaked to make implementation simpler. Delay slots in other RISCs work the same way. But it causes a lot of pain as implementations evolve beyond past the original design, and it's why Aarch64 junked a lot of Aarch32's quirks.
This is one of the features of early ARM cores that show that the thing is not a classical RISC design, but traditional CISC core with RISC-like ISA and optimized internal buses.
Yeah, it was that way for all previous ARM processors too, for exactly that reason. Adding special cases would have increased the transistor count, for no great benefit.
The only downside was that it exposed internal details of the pipelining IIRC. In the ARM2, a read of the PC would give the current instruction's location + 8, rather than its actual location, because by the time the instruction 'took place' the PC had moved on. So if/when you change the pipelining for future processors, you either make older code break, or have to special case the current behaviour of returning +8.
Anyway, I don't like their reaction. What they mean is 'this decision makes writing an emulator more tricky' but the author decides that this makes the chip designers stupid. If the author's reaction to problems is 'the chip designers were stupid and wrong, I'll write a blog post insulting them' then the problem is with the author.
Hey, I'm the author, sorry it came off that way, I was really just poking fun. I should've definitely phrased that better!
But no, I really think that making the program counter a GPR isn't a good design decision - there's pretty good reasons why no modern arches do things that way anymore. I admittedly was originally in the same boat when I first heard of ARMv4T - I thought putting the PC as a GPR was quite clean, but I soon realized it just wastes instruction space, makes branch prediction slightly more complex, decrease the number of available registers (increasing register pressure), all while providing marginal benefit to the programmer
> Adding special cases would have increased the transistor count, for no great benefit.
No, it's exactly backwards: supporting PC as a GPR requires special circuitry, especially in original ARM where PC was not even fully a part of the register file. Stephen Furber in his "VLSI RISC Architecture and Organization" describes in section 4.1 "Instruction Set and Datapath Definition" that quite a lot of additional activity happens when PC is involved (which may affect the instruction timings and require additional memory cycles).
DevilStuff|1 year ago
Dwedit|1 year ago
robinsonb5|1 year ago
Reads from the PC return the address of the next instruction to be executed, so a simple exchange between two registers performs the branch, and supplies the return address. (I did end up special-casing the add instruction so that when adding to the PC the return address ends up in the source register.)
joosters|1 year ago
ADD r0, r15, #200
LDR r1, [r15, #-100]
etc
mmaniac|1 year ago
It's a design choice that makes sense in the classic RISC world where pipeline details are leaked to make implementation simpler. Delay slots in other RISCs work the same way. But it causes a lot of pain as implementations evolve beyond past the original design, and it's why Aarch64 junked a lot of Aarch32's quirks.
dfox|1 year ago
userbinator|1 year ago
joosters|1 year ago
The only downside was that it exposed internal details of the pipelining IIRC. In the ARM2, a read of the PC would give the current instruction's location + 8, rather than its actual location, because by the time the instruction 'took place' the PC had moved on. So if/when you change the pipelining for future processors, you either make older code break, or have to special case the current behaviour of returning +8.
Anyway, I don't like their reaction. What they mean is 'this decision makes writing an emulator more tricky' but the author decides that this makes the chip designers stupid. If the author's reaction to problems is 'the chip designers were stupid and wrong, I'll write a blog post insulting them' then the problem is with the author.
DevilStuff|1 year ago
But no, I really think that making the program counter a GPR isn't a good design decision - there's pretty good reasons why no modern arches do things that way anymore. I admittedly was originally in the same boat when I first heard of ARMv4T - I thought putting the PC as a GPR was quite clean, but I soon realized it just wastes instruction space, makes branch prediction slightly more complex, decrease the number of available registers (increasing register pressure), all while providing marginal benefit to the programmer
Joker_vD|1 year ago
No, it's exactly backwards: supporting PC as a GPR requires special circuitry, especially in original ARM where PC was not even fully a part of the register file. Stephen Furber in his "VLSI RISC Architecture and Organization" describes in section 4.1 "Instruction Set and Datapath Definition" that quite a lot of additional activity happens when PC is involved (which may affect the instruction timings and require additional memory cycles).