Hey, awesome! And it produces a true goto as well, working on the bytecode level. I wonder if there's a way to remove the extra nops?
This is really, really useful for doing language-to-language translation. Without the ability to arbitrarily branch from basic block to basic block, it's possible for the basic block graph produced by a program in one language to be simply inexpressible in your target language. You end up having to fake it with a while loop and switch statement, which has horrible performance implications.
(Emscripten has some exotic algorithms to try and express the source program's basic block graph using the target language's structured programming constructs. This is vitally important to make goto-less Javascript perform well, but it can't work in all cases, and some C functions will force Emscripten to fall back to loop-and-switch. And, of course, these functions are typically the ones which have been carefully hand-optimised for speed...)
> I wonder if there's a way to remove the extra nops?
Yes, technically it would be possible to remove the code instead injecting NOPs. But then I'd have to adjust the jump targets. However, I don't think these NOPs are too bad. Note that goto jumps directly after the NOP ramp of the label, and the NOPs of the goto itself are never reached. The only scenario where NOPs are actually seen by the interpreter is when the natural code path visits a label.
EDIT: Instead re-assembling the bytecode and adjusting jumps, I went to use JUMP_FORWARD(3) instead 7 NOPs now. Note that there are still 4 NOPs left to fill the gap, these however are never executed as they are skipped by the preceding jump instruction.
https://github.com/snoack/python-goto/commit/2b0f5e5069cbb88...
This is cute but of course horribly unsafe since it doesn't respect control statements.
Some examples you'd expect to work that will crash the interpreter:
@with_goto
def fun1():
while True:
for x in [1]:
goto .next
label .next
@with_goto
def fun2():
goto .next
for x in [1]:
label .next
@with_goto
def fun3():
while True:
try:
goto .next
finally:
print('skipped')
label .next
Note how Lisp's TAGBODY/GO doesn't have this problem.
(loop
(tagbody
(loop for x in '(1 2 3)
do (go next))
next
(print 'out)))
(OUT is printed repeatedly.)
How it works is that tagbody contains a bunch of labels mixed with forms. The tagbody establishes an exit point for GO expressions used in these forms. When a (go label) occurs, it works by abandoning the current sub-form being evaluated, and initiating a control transfer to the exit point, which then transfers control to the appropriate label.
The upshot is that whatever form is interrupted will cleanly unwind, as necessary:
Note that you can only have labels and GO in certain forms like TAGBODY and PROG, not just willy nilly in any construct. The labels of a given TAGBODY are all on the same level of nesting: immediate children of the TAGBODY form. SO this isn't possible:
Here, impossible is not considered a label associated with the tagbody, so the (go impossible) is branching to a nonexistent label. If it were allowed, of course it would create the problem of what becomes of the initialization of x.
Thus, compilers only have to reason about GO within specific constructs, and rely on those GO's to only be performing abandonment followed by a simple lateral move.
Jumps out of loops and other blocks work now. However, there are some limitations:
1. As I only have 4 bytes left to inject POP_BLOCK instructions, you can't exit more than 4 blocks with a goto. But this is handled and results into a SyntaxError rather than crashing Python.
2. Jumps into loops still doesn't work. However, it's handled now, and causes a SyntaxError as well.
3. When jumping out of a try- or with-block the finalizer is skipped.
If you're interested in this sort of thing, you might also check out https://github.com/llllllllll/codetransformer, which is a general-purpose library for doing these sorts of shenanigans in an almost-sane way. (Disclaimer: I'm one of the library authors).
As an example, I've got a PR open right now that lets you do things like:
@mutable_locals
def f():
out = []
x = 1
out.append(x)
locals().update({'x': 2, 'y': 3})
out.append(x)
out.append(y)
return out
assert f() == [1, 2, 3]
which works using a combination of ctypes hackery and replacing all LOAD_FAST instructions with appropriately-resolved LOAD_NAME instructions.
The idea that it needs optimizing, no doubt because someone's using it in a hot code, is hilarious. That's either the best satire or the worst truth I've heard all day.
>I think goto's are fine, and they are often more readable than large amounts of indentation. That's _especially_ true if the code flow isn't actually naturally indented (in this case it is, so I don't think using goto is in any way _clearer_ than not, but in general goto's can be quite good for readability).
>Of course, in stupid languages like Pascal, where labels cannot be descriptive, goto's can be bad. But that's not the fault of the goto, that's the braindamage of the language designer.
[+] [-] david-given|10 years ago|reply
This is really, really useful for doing language-to-language translation. Without the ability to arbitrarily branch from basic block to basic block, it's possible for the basic block graph produced by a program in one language to be simply inexpressible in your target language. You end up having to fake it with a while loop and switch statement, which has horrible performance implications.
(Emscripten has some exotic algorithms to try and express the source program's basic block graph using the target language's structured programming constructs. This is vitally important to make goto-less Javascript perform well, but it can't work in all cases, and some C functions will force Emscripten to fall back to loop-and-switch. And, of course, these functions are typically the ones which have been carefully hand-optimised for speed...)
[+] [-] wallunit|10 years ago|reply
Yes, technically it would be possible to remove the code instead injecting NOPs. But then I'd have to adjust the jump targets. However, I don't think these NOPs are too bad. Note that goto jumps directly after the NOP ramp of the label, and the NOPs of the goto itself are never reached. The only scenario where NOPs are actually seen by the interpreter is when the natural code path visits a label.
EDIT: Instead re-assembling the bytecode and adjusting jumps, I went to use JUMP_FORWARD(3) instead 7 NOPs now. Note that there are still 4 NOPs left to fill the gap, these however are never executed as they are skipped by the preceding jump instruction. https://github.com/snoack/python-goto/commit/2b0f5e5069cbb88...
[+] [-] masklinn|10 years ago|reply
Sure you just have to recompute all the jump destinations and patch the jumps themselves.
[+] [-] mrsirduke|10 years ago|reply
I'm sure neither should be used in a production project.
[1]: https://github.com/snoack/python-goto/blob/master/goto.py
[+] [-] simgidacav|10 years ago|reply
Sorry, I had to.
[+] [-] wallunit|10 years ago|reply
[+] [-] zarex|10 years ago|reply
[+] [-] tveita|10 years ago|reply
Some examples you'd expect to work that will crash the interpreter:
[+] [-] kazinator|10 years ago|reply
How it works is that tagbody contains a bunch of labels mixed with forms. The tagbody establishes an exit point for GO expressions used in these forms. When a (go label) occurs, it works by abandoning the current sub-form being evaluated, and initiating a control transfer to the exit point, which then transfers control to the appropriate label.
The upshot is that whatever form is interrupted will cleanly unwind, as necessary:
Note that you can only have labels and GO in certain forms like TAGBODY and PROG, not just willy nilly in any construct. The labels of a given TAGBODY are all on the same level of nesting: immediate children of the TAGBODY form. SO this isn't possible: Here, impossible is not considered a label associated with the tagbody, so the (go impossible) is branching to a nonexistent label. If it were allowed, of course it would create the problem of what becomes of the initialization of x.Thus, compilers only have to reason about GO within specific constructs, and rely on those GO's to only be performing abandonment followed by a simple lateral move.
[+] [-] wallunit|10 years ago|reply
Jumps out of loops and other blocks work now. However, there are some limitations:
1. As I only have 4 bytes left to inject POP_BLOCK instructions, you can't exit more than 4 blocks with a goto. But this is handled and results into a SyntaxError rather than crashing Python.
2. Jumps into loops still doesn't work. However, it's handled now, and causes a SyntaxError as well.
3. When jumping out of a try- or with-block the finalizer is skipped.
[+] [-] RussianCow|10 years ago|reply
[+] [-] polemic|10 years ago|reply
And presented at KiwiPycon 2014: https://www.youtube.com/watch?v=DdU8I09BGsU
[+] [-] moretti|10 years ago|reply
[+] [-] ssanderson11235|10 years ago|reply
As an example, I've got a PR open right now that lets you do things like:
which works using a combination of ctypes hackery and replacing all LOAD_FAST instructions with appropriately-resolved LOAD_NAME instructions.[+] [-] veddox|10 years ago|reply
Neat hack, though.
[1] http://www.u.arizona.edu/~rubinson/copyright_violations/Go_T...
[+] [-] jsbg|10 years ago|reply
https://en.wikipedia.org/wiki/COMEFROM
[+] [-] Veedrac|10 years ago|reply
The idea that it needs optimizing, no doubt because someone's using it in a hot code, is hilarious. That's either the best satire or the worst truth I've heard all day.
[+] [-] sklogic|10 years ago|reply
[+] [-] ainiriand|10 years ago|reply
[+] [-] ceronman|10 years ago|reply
[+] [-] aap_|10 years ago|reply
[+] [-] WalterBright|10 years ago|reply
[+] [-] Intermernet|10 years ago|reply
See http://blog.regehr.org/archives/894 , http://thread.gmane.org/gmane.linux.kernel/84823/focus=85436 and https://github.com/torvalds/linux/search?q=goto for some discussion and examples.
From Mr Torvalds (as diplomatic as ever...):
>I think goto's are fine, and they are often more readable than large amounts of indentation. That's _especially_ true if the code flow isn't actually naturally indented (in this case it is, so I don't think using goto is in any way _clearer_ than not, but in general goto's can be quite good for readability).
>Of course, in stupid languages like Pascal, where labels cannot be descriptive, goto's can be bad. But that's not the fault of the goto, that's the braindamage of the language designer.
[+] [-] Retra|10 years ago|reply
Now could we maybe explain what jobs we're talking about and why it's the right tool for them?
[+] [-] parados|10 years ago|reply
[+] [-] csl|10 years ago|reply
I know this is a problem for at least the Byteplay module.