Does anyone know why LSP uses UTF-16 for encoding columns? It seems like everyone agrees it is a bad choice, so I'm curious about the original reasoning. Are there any benefits at all to using UTF-16, or was it something to do with Microsoft legacy code?
jcranmer|2 years ago
I believe the original producers and consumers of LSP were written in languages that had string lengths based on UTF-16, so it was the literal easiest way to do it, even though UTF-16 is probably objectively the most painful thing to compute if your string system isn't UTF-16.
LSP eventually got a solution where you can request something other than UTF-16 offset calculations, but I don't remember the details of what that solution is.
hgs3|2 years ago
[1] https://github.com/microsoft/language-server-protocol/issues...
mardifoufs|2 years ago
Even when the proposal of "UTF-16 default, UTF-8 optional" was made to keep backwards compatibility, it was not enough. It has to be UTF8 because it's superior technically, as if that's the only consideration! I agree they should've just picked one, but I still don't think the maintainers needed a refresher on what is UTF-8 every 3 comments.
raphlinus|2 years ago
[19]: https://github.com/microsoft/language-server-protocol/issues...
slimsag|2 years ago
the_mitsuhiko|2 years ago