My understanding is that they trained a separate model to specifically estimate when they have enough context to begin translating, as a skilled translator would.
My mom used to do English/French translation. Her favorite example was the word "file". That word has multiple translation in French depending on the context, and that context may simply be implied by who is speaking. You may not be able to figure it based on the conversation alone.
scherlock|2 years ago