They're most likely have two "agents" working in tandem to listen and speak, and it seems like the listener takes precedence over the speaker agent but underneath they share the same context window. Programming wise, probably using multithreading and channels architecture, depending on the programming language.
mvkel|2 years ago