(no title)
dmsnell | 8 months ago
They are special because they are invisible and sequences of them behave as a single character for cursor movement.
They mirror ASCII so you can encode arbitrary JSON or other data inside them. Quite suitable for marking LLM-generated spans, as long as you don’t mind annoying people with hidden data or deprecated usage.
akoboldfrying|8 months ago
dmsnell|8 months ago
Finding cryptographic-strength measures to identify LLM-generated content is a few orders of magnitude harder than optimistically marking them. Besides, it also relies on the content producer adding those indicators so that can’t be ignored as a major source of missing metadata.
But sometimes lossy mechanisms are still helpful because people who aren’t out with malicious purposes might copy and paste without being aware that the content is generated, while an auditor (be it anyone who inspects one level deeper) can discover in some (most?) cases the source of the content.
ema|8 months ago