top | item 24710948

Calla – spatial video conferencing software based on Jitsi Meet

200 points| thekyle | 5 years ago |github.com | reply

85 comments

order
[+] trynewideas|5 years ago|reply
Does anyone remember Metaplace,[1] circa 2009? It was an isometric 2.5D-ish "virtual world" that, if it shipped today, would probably be pitched as a sort of Roblox competitor for building games and interactive installations within a virtual space. (At the time it compared itself to Second Life, which either baffled people who didn't know what it was or turned off people who did.)

I used it briefly in 2009 and one of the things we tried to do with it was exactly this — spatial voicechat in a virtual environment with contrived "physical" modeling of spaces to adjust how sound travels. (This also reminds me of TeamSpeak's spatial integrations from around the same era.)

I know it's the pandemic that's inspired a ton of these projects, but they've been around in some form since live voice-over-IP came about and haven't really taken off in non-game applications.

What keeps blocking them from mainstreaming? I suspect maybe it's adding contrived physical interactions to the already high relative overhead of meeting virtually, but I'm not sure if that's just a problem for me. I could see myself enjoying something like this but not many others on my team seeing enough value to take the efficiency hit of having either zero or multiple open voice channels that do away with the abstraction.

[1] https://www.engadget.com/2008-10-22-massively-interviews-rap...

[+] WorldMaker|5 years ago|reply
Notable game designer Raph Koster was involved in Metaplace and his blog included a lot of making of/behind the scenes articles and discussions from around that time, including some post-mortem thoughts.

> What keeps blocking them from mainstreaming?

Most likely it is business model. Getting the spatial model correct is a lot of work, and making it generic enough to be reusable is even harder. (Both Second Life and Metaplace flirted with idea of being platforms that other services could build on.) It's easier to get some financing for it as a "game" (it's also useful to note a bunch of Metaplace ideas came out of experiments in failed MMO Star Wars Galaxy), but it's harder to sell it as a general chat application (much less platform) or especially as an "Enterprise tool" if it looks too much like a toy or a game. On the flipside it's harder to build it as an Enterprise tool first, because the impulse to keep things sparse/spartan/"profesional" also leaves a air of "lifelessness" with not enough things to do outside of/between meetings/during downtime inside of meetings, not enough of a feeling that it is a space to "inhabit" rather than visit briefly because a meeting required it.

These business model bootstrap hurdles are a criticism I often bring to a lot of the conversations about why the Cyberpunk "ideal" of a single shared VR space (including the one most recently popularized by Ready Player One) seems so extremely unlikely in the real world.

(Aside: Ready Player One especially suffers from a lot of basic game economy mistakes and likely shouldn't have survived very long in its world for very long, much less have taken over nearly everything including education and enterprise so much as it did. My reading of the book was that the game economy directly caused a lot of the collapse and dystopian economy outside of the game, which made the book better than the author seemed to have intended as further reading and other evidence suggests the author was not aware of how terrible the economics of the game were, they were just fun ideas thrown out for verisimilitude.)

[+] stevenicr|5 years ago|reply
IT had not rung a bell until I clicked your link and as soon as I saw the picture I remembered immediately - thanks for the reminisce.

I was thinking that the old 2d / move around 3-d kind of space chat from waaay back - worldsChat - you had an avatar and moved around a space station chatting.. I always wondered what held that back from being more popular / more used by more people..

Which brings me to the time of having all sorts of friends and family pinging on yahoo messenger for a while - then that was killed off.

Perhaps if worldsChat had this spatial talking kind of option it would of gotten more traction. I'm sure moderating and other factors also apply.

[+] phaer|5 years ago|reply
Can't wait to try this with some friends on the weekend, thanks!

I think it's beautiful how it's "just" a wrapper on top of Jitsi Meet as it demonstrates how Free Software can (also) be used to experiment with and adapt user interfaces to specific and/or special needs instead of locking you into ever-more-bloated proprietary apps <3

[+] llarsson|5 years ago|reply
This is cool and actually something I have been considered missing during, e.g., academic conferences. The "hallway track" is often the most interesting one! Being able to wander around and overhear various conversations until you find one that interests you or you can add to is great.

A Zoom breakout room is nothing like that experience. This, or something like this, could very well be what is needed.

[+] kaoD|5 years ago|reply
Funnily enough, "calla" means "shut up" in Spanish.

Can't wait to try this with my co-workers. One of the big barriers for remote casual conversation is the spatiality, since all video conference systems are focused on a single speaker.

[+] moron4hire|5 years ago|reply
Yeah, someone told me that a few weeks ago.

Calla is a type of lilly. I name all of my projects after plants. Originally, I named this project "Lozya", which is the Hungarian word for "vines", because Jitsi is the Hungarian word for "wires". But nobody could pronounce it correctly, so I renamed it.

Incidentally, I work for a foreign language instruction company. I brought it up with one of our teachers and she didn't think it was an issue. She said that, in context, it wouldn't be read as "shut up", that it's kind of like "lead" (to guide something) and "lead" (a metal) in English.

[+] foxbarrington|5 years ago|reply
Love seeing all the activity in this space. For https://js.la we've been using https://rambly.app and it's been amazing for transitioning our meetup to online.
[+] nanna|5 years ago|reply
Would love to use this for post-conference hangouts. How many users have you had logged in at one time before issues kick in?

Edit: And is it possible to self host?

[+] bweitzman|5 years ago|reply
We've a few spatial chat apps pop up in the past few months. It's part of the reason we set out to build https://align.link/, an extensible video chat platform.

Products like these (and others like https://www.macro.io/) could exist as meeting apps, allowing people to to tailor rooms specific scenarios like one on ones, retrospectives, happy hours, etc.

[+] kasbah|5 years ago|reply
Seems similar to https://gather.town but self-hostable.
[+] michaelmior|5 years ago|reply
I've used Gather for a couple virtual conferences this year and I was surprised how much I liked it. It's obviously still no substitute for in-person gatherings, but it does recreate some of the atmosphere of being in a room full of people while still being able to have meaningful small group conversations.
[+] moron4hire|5 years ago|reply
Kind of crazy to see this on the front page. This is my project.

I started this project as an experiment, half to see if it would work for my saturday morning tech meetup, half to see if I could make spatialized audio conferencing work in the browser at all, as at the time I was considering switching my VR app at work from Unity to WebXR (which, ultimately, I did).

The repository has been a little neglected in the last few weeks, but that's because I've been working on redesigning a few parts to make it work better in both 3D and 2D and haven't settled enough yet to commit the work.

[+] saghul|5 years ago|reply
Jitsi team member here. This looks awesome, kudos and thanks for sharing!
[+] moron4hire|5 years ago|reply
I know the webcam isn't working for some folks. I kinda don't care. I've got a lot more important things to focus on.

I think the video stream in teleconferencing has absolutely zero redeeming qualities. People think it's there to be able to convey facial expressions and facilitate non-verbal communication, but I think it's a complete failure at that task.

For one thing, few listeners are actually looking at the speaker as the speaker is talking. They're most likely looking at themselves or some other thing going on with their computer, so their facial reactions are not really based on what the speaker is saying.

On the flip side, the speaker never gets to see people looking at them. Almost nobody looks at their camera instead of something on the screen, so the speaker never gets that "eye contact" feeling. Best-case scenario, you get a group of people trained to move the speaker video feed directly under the camera lens and they are diligent about making sure they are looking at the speaker. Even then, there is still a "20 yard stare" look to everyone. It also causes exhaustion as it puts you into a feeling that you're in an interrogation of some kind.

Additionally, it's such a narrow field of view for the camera. Non-verbal communication is more than just facial expressions, it's also body posture and standing distance. There are facial ticks that are also lost in the low-quality of the webcam feed, and the non-uniformity of every user's personal lighting settings creates an unnatural scenario where every person is lit differently than you'd expect, or from each other.

And finally, while teleconferencing has a lot of trouble with latency between when a person speaks and when the other people hear them, there is also a lot of latency between when you hear a person speak and when you see them. The audio and video feeds are not synced correctly.

By completely eliminating the video feed, conversations actually work a lot better. I get so many people who demand to have that video feed for the reasons they've been indoctrinated on, with little to no effort to even try audio-only conferencing.

And frankly, as a listener, I don't want the speaker to see my reflex reactions. I don't want them to see into my room. I really only want them to see the what I choose to let them see.

Thus, the avatars and the emoji reactions.

[+] askvictor|5 years ago|reply
I've been wanting this kind of thing, but for Minecraft (Education Edition in my case; I think there is/was a mod for Java edition to do this, but MinecraftEdu is not based on Java edition). I can easily get a list of users and their in-game co-ordinates, so need a spatial conference system that has an API rather than it's own game. Any suggestions?
[+] edejong|5 years ago|reply
After a long search, we found two workable 'office neutral' solutions: Virbela and its successor FrameVR. I'll be online on FrameVR in my own personal frame at: https://framevr.io/testerdetest Please use headphones, because the echo-cancelation is a bit iffy.
[+] ElijahLynn|5 years ago|reply
Very cool idea!

I tried gather.town the other day for something similar but couldn't get it to work on Linux, couldn't detect my mic and camera correctly.

I am able to hear some background noise on the Calla room as I approach some others but I do not hear any talking and I do not know if others can hear me. I also can't get my webcam working, just a black screen and my webcam doesn't activate. It activated the first time I joined but now seems to have chosen a different camera.

I think camera/mic selection options need to be added for systems with multiple devices.

Really hope this gets refined!

[+] moron4hire|5 years ago|reply
There is a camera/mic selection option. Lower-left corner, next to the mute-audio/video buttons, there is a button labeled "change".
[+] m463|5 years ago|reply
I wonder if your map of the room has to map the other people's map of the room?

sort of like ... can you have a map that blocks someone without letting them know you just distanced yourself? :)

[+] airstrike|5 years ago|reply
Please couple this with VR + avatars + realistic settings and the future will finally be here. Business meetings will never again require flying
[+] moron4hire|5 years ago|reply
That's actually what I'm doing for my day job and part of why I originally built this. I built the 2D map for Calla because I was still in the middle of converting all my VR work from Unity to WebXR. But pretty soon, VR will be possible.

Technically, Calla is just the library for driving Jitsi and adding spatialization. You can do whatever graphics you want with it. The graphical elements and the interactions are all up to you to build, separately from the teleconferencing.

[+] flatline|5 years ago|reply
Have we not been living through the same pandemic? Nobody in the US has a reliable internet connection, services are controlled by third parties and randomly black out, infrastructure is overloaded, meetings are insecure, people step away from their keyboards and are unreachable, there are screaming kids in the background, etc.? I would happily board a plane for work travel right now, this is like living in a parody of my previous job.
[+] ciberado|5 years ago|reply
This is so cool... thank you very much :) I will use it during my next workshop sessions to help my students in case they run into problems following the instructions of the lab: looks like it is far more natural than the "raise hand" feature. And they just need to group themselves to enjoy the equivalent of a private room for group exercises!
[+] lksslr|5 years ago|reply
shameless plug: we saw a similar problem, but our solution is spatial video calls in 3D instead of 2D: https://laptopsinspace.com it's also built on top of the jitsi meet api but uses an extra backend for managing the game data.
[+] moron4hire|5 years ago|reply
I had originally thought of 3D. This isn't my first time making a spatialized audio chat system. I built one for WebVR some 4 or 5 years ago. And I'm actually building one at work right now (with Calla as the underpinning). But for when I started this project (at the beginning of the pandemic) and for what purpose I built it (testing the current temperature of the waters in WebRTC land, as I hadn't done a lot with it since that last app, plus trying to support a weekend tech meetup that was finding Zoom to be a bit annoying, plus seeing if FOSS WebRTC libraries were at a level that I could use for my day job), I wanted to make the graphics portion of the app a simple as possible, support the most people as possible, and not get in the way.

One thing I've noticed with a lot of the "competitor"[0] apps out there is that they're more focused on the graphics than the audio. They're more focused on making a 90's-era RPG than on audio conferencing. I'm the opposite. I'm more focused on conferencing than the game. I get a lot of requests in the github repo for functionality in the game. The "game" is beside the point. It's just an exercise of the audio conferencing.

[0] I don't really see myself as in competition with gather.town or High Fidelity or any of the other apps because I'm not building Calla to be a startup. I love my day job and I won't be leaving it. Calla is just a component of a much larger thing I'm building.