top | item 47209034

(no title)

kid64 | 1 day ago

The thesis here seems to be that delimiters provide important context for Claude, and for that putpose we should use XML.

The article even references English's built-in delimiter, the quotation mark, which is reprented as a token for Claude, part of its training data.

So are we sure the lesson isn't simply to leverage delimiters, such as quotation marks, in prompts, period? The article doesn't identify any way in which XML is superior to quotation marks in scenarios requiring the type of disambiguation quotation marks provide.

Rather, the example XML tags shown seem to be serving as a shorthand for notating sections of the prompt ("treat this part of the prompt in this particular way"). That's useful, but seems to be addressing concerns that are separate from those contemplated by the author.

discuss

sheept|1 day ago

XML is a bit more special/first class to Claude because it uses XML for tool calling:

    <antml:invoke name="Read">                                                    
      <antml:parameter name="file_path">/path/to/file</antml:parameter>             
      <antml:parameter name="offset">100</antml:parameter>                          
      <antml:parameter name="limit">50</antml:parameter>                            
    </antml:invoke>

I'm sure Claude can handle any delimiter and pseudo markup you throw at it, but one benefit of XML delimiters over quotation marks is that you repeat the delimiter name at the end, which I'd imagine might help if its contents are long (it certainly helps humans).

bear3r|1 day ago

the antml: namespace prefix is doing extra work here too -- even if user input contains invoke tags, they won't collide with tool calls because the namespace differs. not just xml for structure but namespaced xml for isolation.

xeyownt|1 day ago

Cannot believe it's efficient. XML is the most verbose and inefficient of communicating anything. The only benefit of XML was to give lifetime work to an army of engineers. The next news will be "Why DTD is so fundamental to Claude".

RandomBK|1 day ago

How well do we understand the tokenization for Claude? I'd posit that the exact human-representation of this markup is likely irrelevant if it's all being converted into a single token.

mpyne|1 day ago

I had a lot of success with Claude just by telling it I would throw a relevant snippet in a <tag></tag> pair. That's not even XML, nor has it been really needed in my experience.

Even simple --- separators is usually enough to get good results, it just needs to be reasonably clear which items are distinct from each other.

jinushaun|1 day ago

Except quotation marks look like regular text. I regularly use quotes in prompts for, ya know, quotes.

wolttam|1 day ago

The GP isn't suggesting to literally use quotes as the delimiter when prompting LLMs. They're pointing out that we humans already use delimiters in our natural language (quotation marks to delimit quotes). They're suggesting that delimiters of any kind may be helpful in the context of LLM prompting, which to me makes intuitive sense. That Claude is using XML is merely a convention.