top | item 41999015 (no title) pvarangot | 1 year ago It's because it's probably trained with "professional audio", ads, movies, audiobooks, and not "normal people talking". Like the effect when diffusion was mostly trained with stock photos. discuss order hn newest No comments yet.
No comments yet.