(no title)
coderedart | 2 years ago
> until you need to get your string through several levels of escape. how many backslashes to add? depends on how deep your pipe is and how each of those layers is defined
coderedart | 2 years ago
> until you need to get your string through several levels of escape. how many backslashes to add? depends on how deep your pipe is and how each of those layers is defined
lisper|2 years ago
When the same character is used as both an open and close delimiter, you have to disambiguate between three possibilities: opening a new string, closing the current string (which may or may not be embedded) and a literal character as a constituent of the current string. By convention, an unescaped double-quote inside a string indicates closing that string, so you need different escapes to indicate opening embedded strings and constituents.
You could have done that by using two different escape characters, but for historical reasons there is only one escape character: the backslash. So that one character has to do double-duty to disambiguate two different cases. But in fact it's even worse than that because string parsers have a very shallow understanding of backslashes. To a string parser, a backslash followed by another character means only that the following character should be treated as a constituent. So you still need to disambiguate between actual constituents and opening an embedded string, and the only way to do that, because all you have is the backslash, is with more backslashes. The whole mess is just a stupid historical accident.
If you used balanced quotes you only have one case that needs to be escaped: constituents. So you never need multiple escapes.
Note that I made a mistake when I wrote:
> Only if you want to refer to [a close-quote character] literally as a closing quote rather than having it act as a closing quote.
You have to escape both open and close quotes to refer to them as constituents. In other words you would need to write something like this:
«Here is an example of a «nested string». The start of a nested string is denoted by a \« character. The end of a nested string is denoted by a \» character.»
Note that it doesn't matter how many levels deep you are:
«Even when you write «a nested string that refers to \« or \» characters» you only need one level of escape.»
Note that when you refer to quote characters as balanced pairs as in the examples above you don't actually need the escapes. The above strings will parse just fine even without the backslashes, and they will print out exactly as you expect. The only "problem" will be that they will contain embedded strings that you probably did not intend. The only time escapes are actually required is when referring to an quote characters as constituents without balancing them. This will always be the case if you refer to a close-quote without a corresponding preceding open-quote, which is the reason I got it wrong: escaping close-quotes will be more common than escaping open-quotes, but both will be needed occasionally.
bloak|2 years ago
I would also advocate the principle that you don't escape the escape character by doubling it. There are two problems with replacing \ with \\: firstly the length of the string doubles with each nested quotation; secondly you can't tell at a glance whether \\\\\\\\\\\\\\\\\\\n contains a newline character or an n because it depends on whether the number of backslashes is odd or even.
Another useful principle is to escape a quote character with a sequence that does not contain that character: then it is much easier to check whether the quotes are balanced because you don't need to check whether any of them are escaped.
So here's a possible algorithm for quoting a string: first identify the top-level quote characters that don't match (this is not totally trivial but it isn't difficult or computationally expensive); then, in parts of the string that are not inside nested quotes, but only there, replace « with \<, » with \>, and \ with \_ (say). Does that work?