top | item 20335046

(no title)

alphaBetaGamma | 6 years ago

In some/many cases there is code before the branch that does not influence the branch condition. In that case, would it be possible to architect the cpu such that it could execute the branch 'early'? I'm thinking of a special asm instruction like 'execute the next 4 instructions, and only then do the conditional jump'.

E.g. instead of:

  i=0
  StartLoop:
  i+=1
  do_stuff_0
  do_stuff_1
  do_stuff_2
  do_stuff_3
  if i<N goto StartLoop
we would have:

  i=0
  StartLoop:
  i+=1
  do_stuff_0
  do_stuff_1
  if i<N goto StartLoop in 2 instructions
  do_stuff_2
  do_stuff_3

discuss

order

projektfu|6 years ago

Yes, that was made explicit in the MIPS instruction set with branch (and load) delay slots, and it's implicit in out-of-order processors. As I understand it the branch delay slot paradigm did not pan out as well as was hoped and it has not found its way into many other architectures.

gpderetta|6 years ago

The issue with the branch delay slot is that it assumes there is exactly one clock cycle (I.e. one stage) between fetch and execute. This was true on some of the early 5 stage in order riscs, but hasn't been true for a while. In an high frequency in order design there are maybe a dozen stages/clock cycles which would be hard to fill with delay slots. OoO is even more complicated.

chrisseaton|6 years ago

That’s called delay slots. It turns out to be so hard to work out what to put in them that they’re basically useless. Look at the massive flop that was the Itanium before you try it again!

brandmeyer|6 years ago

ARMv8-M has a set of extensions for this kind of branch. There is a set of prepare-to-branch instructions as well as counted loop instructions. In order to support a variable-length delay slot, they don't have very many options. In order to handle interrupts, you still need an ordinary branch instruction from the launch point as well. So if you take an interrupt in the middle of the sequence, the ordinary branch instruction still gets executed.

prepare-to-branch <jump-target> <launchpoint-label> ... launch-label: branch <jump-target>

The utility is pretty limited, but it can help for strictly in-order machines.

amelius|6 years ago

Wouldn't that be equivalent to loading i and N into registers, so that a conditional jump would be fast?

  i=0
  StartLoop:
  i+=1
  do_stuff_0
  do_stuff_1
  REG1 = i, REG2 = N
  do_stuff_2
  do_stuff_3
  if REG1<REG2 goto StartLoop