(no title)
bpcrd | 6 months ago
In terms of how our technology works, our research team has trained multiple detection models to look for specific visual and audio artifacts that the major generative models leave behind. These artifacts aren't perceptible to the human eye / ear, but they are actually very detectable to computer vision and audio models.
Each of these expert models gets combined into an ensemble system that weighs all the individual model outputs to reach a final conclusion.
We've got a rigorous process of collecting data from new generators, benchmarking them, and retraining our models when necessary. Often retrains aren't needed though, since our accuracy seems to transfer well across a given deepfake technique. So even if new diffusion or autoregressive models come out, for example, the artifacts tend to be similar and are still caught by our models.
I will say that our models are most heavily benchmarked on convincing audio/video/image impersonations of humans. While we can return results for items outside that scope, we've tended to focus training and benchmarking on human impersonations since that's typically the most dangerous risk for businesses.
So that's a caveat to keep in mind if you decide to try out our Developer Free Plan.
Eisenstein|6 months ago
I think the most likely outcome of a criminal organization doing this is that they train a public architecture model from scratch on the material that they want to reproduce, and then use without telling anyone. Would your detector prevent this attack?
coeneedell|6 months ago
As for actual lead time associated with our actual strategy, that’s probably not something I can talk about publicly. I can say I’m working on making it happen faster.