top | item 19855439

Self Driving Desktop

103 points| verdverm | 6 years ago |github.com | reply

38 comments

order
[+] mcklaw|6 years ago|reply
One of the most unknown tools (>10 years) http://sikulix.com/ It allows play mouse/keyboard event scrips BUT it allows to find components (coords) via screen OCR so you can make your scripts multi resolution/desktop independent. Also, it's Java based so you can play it multi SO.
[+] flarg|6 years ago|reply
This is an excellent tool! But you forgot to mention that the user codes it in python, it comes with a purpose built ide, it recognises both text and images the latter with an approximation capability.
[+] shoo|6 years ago|reply
Yeah! I've used sikuli to automate some legacy ui-driven application with an embedded scripting engine-- wanted to rig CI to run an automated test suite to test scripts that executed in the application, but there needed to be lots of pointing and clicking to get the app into a state where it was willing to execute scripts. Sikuli was handy! The embedded image recognition is cool and pretty easy to use -- detecting the buttons certainly wasn't the most fragile part of that rube goldberg test automation setup
[+] eastendguy|6 years ago|reply
I don't think Sikuli is unknown at all - I have used it for a long time. But there has not been much progress over the last years, especially the OCR features are lacking. A good alternative to Sikuli is the newer Kantu, which is also much easier to install (just a browser extension + small native EXE).

https://a9t9.com/kantu/x/desktop-automation

[+] ladberg|6 years ago|reply
Sikuli is amazing! I've used it (to great success) for data processing automation and MMO grinding.
[+] majewsky|6 years ago|reply
In 2008, we were at CeBIT showing off the then-brandnew KDE 4 desktop. (The booth was sponsored by a Linux-focused media company.) The biggest attention magnet was a script that we hacked together the evening before, that clicked through the application menu and demoed various desktop features in a loop. For a booth, it's absolutely vital to have something that moves, not just static posters and people standing around waiting.
[+] gwbas1c|6 years ago|reply
What is it? The page just says "Desktop Automation framework" and then lists a bunch of commands and switches.

Perhaps 2-3 paragraphs describing what it does?

[+] zapzupnz|6 years ago|reply
At a glance, macros. Or maybe the "System Events" portion of Applescript, for Linux. Something like that. Indeed, the page would benefit from an explanation and maybe rationale.
[+] michaelmrose|6 years ago|reply
What's different about this compared to a shell script that invokes xdotool save for being much more verbose.
[+] reilly3000|6 years ago|reply
I wish this had a ‘Record’ feature. That kind of logging could be incredibly useful. I use tools like Katalon on the web and they are great for making a first pass at test development. It doesn’t need to be entirely visual but if it can capture the flow visually it can be refactored in code and be much more accessible and usable.
[+] verdverm|6 years ago|reply
I use OBS for recording and Flowblade for editing. Got sick of editing my mistakes out, so then this repo came to be. Planning to add some playlists to start that up, set file names, begin/end recording.

self-driving-desktop will be part of a demo automation framework that is in the progress.

[+] flukus|6 years ago|reply
> mv x y s;: move the mose to x,y in s seconds

The problem with tools like this is that they create an API that the developers don't know about and have no intention of supporting. I broke one recently by having the app maximize on startup, but everything from adding UI elements, rearranging them or timing differences can introduce breakages.

Considering it's scripting anyway, an actual API would be easier.

[+] laythea|6 years ago|reply
It would have been cool to have screenshots on the front page. It gives so much more sense as to what the thing on github actually is, because I didn't understand it (without further time) from just the github.
[+] keerthiko|6 years ago|reply
I think I have been looking for a framework this simple and straightforward for about...12 years now? Ever since I got my own personal computer as a college student, pretty much.

I can't wait to completely go off the wrong quadrant of this chart with it.

https://xkcd.com/1205/

[+] albertshin|6 years ago|reply
re: xkcd, sometimes, it's not just about the time in minutes you save in aggregate. I often find routines especially helpful during flow states -- maximizing time for more creative work.

There's also just something satisfying about using something like Alfred to launch a complex sequence of things that would have taken many mouse clicks and hand movement. Or using keyboard shortcuts to resize and move multiple windows around monitors. It feels almost... powerful? Not sure why.

[+] verdverm|6 years ago|reply
I love this xkcd but it's hard to see the compounding or exponential savings the arises
[+] imjustsaying|6 years ago|reply
Is it normal for devs to be able to read and understand github reps without any explanations, introductions or context beyond the title? I remember much more of this in github's early days and always wondered if this doesn't faze the talented devs reading it.
[+] lejar|6 years ago|reply
I think it would be fair to say that you shouldn't expect anyone to be able to understand a bare repo with just a glance, but if you're well versed with the technologies that the repo uses and you know of similar products, then I think you can guess it.

Here's how my thought process went on this one:

# I open the repo on github and look at the readme

1. Okay it's doing something automatic

2. It uses python

3. Okay there's this playlist thing which has a bunch of commands in it. Looks like of like an autohotkey script.

# I look at the file list

4. Okay I know lark. Looks like the author wrote a domain specific language parser for their input files. They probably get those commands out as a nested list from the parser.

# I look in test.txt

5. Okay that doesn't tell me much new

# I look in main.py

6. Oh there aren't any comments in here...

7. Alright the main function parses the commands from the input file and runs "do" on them.

8. Okay this is just like autohotkey

[+] dwiel|6 years ago|reply
For mac there is also talonvoice.com which allows a lot of similar functionality along with methods for connecting to keyboard shortcuts, voice/dictation control and noise control.
[+] satyanash|6 years ago|reply
Ruby would've suited well for the DSL this project is trying to implement.
[+] Aeolun|6 years ago|reply
I really was hoping for a desktop computer on wheels :(
[+] rhizome|6 years ago|reply
Kind of like Kixtart IIRC.
[+] softgrow|6 years ago|reply
The title is a bit misleading leading to disappointment. I was expecting something like a self driving car. You just give the desktop an objective and it figures out how to get there and then gets you there.