top | item 46517846

Show HN: DeskSlice – controlling a VS Code agent from my phone

3 points| frudas24 | 1 month ago |github.com

DeskSlice is a small Go tool that lets you remotely view and control a VS Code AI agent from a mobile browser.

The problem I wanted to solve was very practical: I wanted to comfortably interact with a local VS Code agent (read outputs, scroll, and type prompts) from my phone, without reimplementing the UI or relying on editor internals or private APIs.

Instead of building a full remote desktop, DeskSlice streams only a calibrated slice of the desktop where the agent UI lives, and maps touch gestures back to mouse and keyboard input on the host.

I originally implemented this using WebRTC, but after hitting reliability and complexity issues (signaling, renegotiation, RTP quirks), I pivoted to MJPEG over HTTP. For LAN use, MJPEG turned out to be much simpler, easier to debug, and reliable enough for UI-driven workflows.

Key ideas: - Manual fullscreen calibration to select the exact agent panel, input area, and scroll area - Cropped video stream (not the full desktop) - Touch-first interaction model (tap, drag-scroll, typing) - No UI scraping, no state persistence — it operates the real VS Code agent UI - Simple password gate for LAN use

This is intentionally not a general-purpose remote desktop. It’s a focused control surface for interacting with a local AI agent through its existing UI.

Repo: https://github.com/frudas24/deskslice/

5 comments

order

Sean-Der|1 month ago

Would you mind explaining the complexity issues around WebRTC more? Why did you need to do renegotiation? What RTP stuff hit you?

thanks

frudas24|1 month ago

The WebRTC complexity came from our pipeline being ffmpeg → H.264 RTP over UDP → pion/webrtc TrackLocalStaticRTP (instead of a “normal” WebRTC source). Any time we changed monitor/crop or restarted the capture, the RTP stream effectively reset (SSRC/seq/timestamps and sometimes SPS/PPS cadence), and mobile browsers can stall the decoder and just stay black. We added “restart/renegotiation” because recreating the PeerConnection is the most reliable way to recover from those discontinuities.

What we still need to debug to make WebRTC solid:

Capture-side: full ffmpeg stderr logs + exact args when it goes black. RTP ingest: log SSRC/PT/seq gaps and verify SPS/PPS are regularly re-sent (e.g., with every keyframe). WebRTC states: log signaling/ICE/connection state transitions to catch races and “remote description not set” timing. Confirm whether the black screen is a capture issue vs a decode/packetization issue (capture works via MJPEG, so likely the latter).