top | item 44037270

(no title)

lor_louis | 9 months ago

Kilo is a fun weekend project, but I learned the hard way that it's not a good base uppon which you should build your own text editor.

The core data structure (array of lines) just isn't that well suited to more complex operations.

Anyway here's what I built: https://github.com/lorlouis/cedit

If I were to do it again I'd use a piece table[1]. The VS code folks wrote a fantastic blog post about it some time ago[2].

[1] https://en.m.wikipedia.org/wiki/Piece_table [2] https://code.visualstudio.com/blogs/2018/03/23/text-buffer-r...

discuss

vidarh|9 months ago

My own editor is array of lines in Ruby, and in now about 8 years of using it daily, and having the actual editor interact with the buffer storage via IPC to a server holding all the buffers, it's just not been a problem.

It does become a problem if you insist on trying to open files of hundred of MB of text, but my thinking is that I simply don't care to treat that as a text editing problem for my main editor, because files that size are usually something I only ever care to view or is better off manipulating with code.

If you want to be able to open and manipulate huge files, you're right, and then an editor using these kind of simple methods isn't for you. That's fine.

As it stands now, my editor holds every file I've ever opened and not explicitly closed in the last 8 years in memory constantly (currently, 5420 buffers; the buffer storage is persisted to disk every minute or so, so if I reboot and open the same file, any unsaved changes are still there unless I explicitly reload), and it's not even breaking the top 50 or so of memory use on my machine usually (those are all browser tabs...)

I'm not suggesting people shouldn't use "fancier" data structures when warranted. It's great some editors can handle huge files. Just that very naive approaches will work fine for a whole lot of use cases.

E.g. the 5420 open buffers in my editor currently are there because even the naive approach of never garbage collecting open buffers just hasn't become an issue yet - my available RAM has increased far faster than the size of the buffer storage so adding a mechanism for culling them just hasn't become a priority.

lor_louis|9 months ago

Oh by "more complex" operations I referred to multiple cursors and multi line regex searches. I've noticed some performance problems in my own editor but it's mostly because "lines" become fragmented, if you allocate all the lines with their own allocation, they might be far away from each other in memory. It's especially true when programming where lines are relatively short.

Regex searches and code highlight might introduce some hitches due to all of the seeking.

pmontra|9 months ago

I'd love to see the code of that editor. Is it publicly available somewhere?

userbinator|9 months ago

The core data structure (array of lines) just isn't that well suited to more complex operations.

Modern CPUs can read and write memory at dozens of gigabytes per second.

Even when CPUs were 3 orders of magnitude slower, text editors using a single array were widely used. Unless you introduce some accidentally-quadratic or worse algorithm in your operations, I don't think complex datastructures are necessary in this application.

lifthrasiir|9 months ago

The actual latency budget would be less than a single frame to be completely non-noticable, so you are in fact limited to less than 1 GB to move per each keystroke. And each character may hold additional metadata like syntax highlight states, so 1 GB of movable memory doesn't translate to 1 GB of text either. You are still correct in that a line-based array is enough for most cases today, but I don't think it's generally true.

lelanthran|9 months ago

> The core data structure (array of lines) just isn't that well suited to more complex operations.

Just how big (and how many lines) does your file have to be before it is a problem? And what are the complex operations that make it a problem?

(Not being argumentative - I'd really like to know!)

On my own text editor (to which I lost the sources way back in 2004) I used an array of bytes, had syntax highlighting (Used single-byte start-stop codes for syntax highlighting) and used a moving "window" into the array for rendering. I never saw a latency problem back then on a Pentium Pro, even with files as large as 20MB.

I am skeptical of the piece table as used in VS Code being that much faster; right now on my 2011 desktop, a VS Code with no extra plugins has visible latency when scrolling by holding down the up/down arrow keys and a really high keyboard repeat setting. Same computer, same keyboard repeat and same file using Vim in a standard xterm/uxterm has visibly better scrolling; takes half as much time to get to the end of the file (about 10k lines).

ofalkaed|9 months ago

From what I have experienced the complex data structures used here are more about maintaining responsiveness when overall system load is high and that may result slightly slower performance overall. Say you used the variable "x" a thousand times in your 10k lines of code and you want to do a find and replace on it to give it a more descriptive name like, "my_overused_variable," think about all of the memory copying that is happening if all 10k lines are in a single array. If those 10k lines are in 10k arrays which are all twice the size of the line you reduce that a fair amount. It might be slower than simpler methods when the system load is low but it will stay responsive longer.

I think vim uses a gap structure, not a single array but don't remember.

I am not a programmer, my experience could very well be due to failings elsewhere in my code and my reasoning could be hopelessly flawed, hopefully someone will correct me if I am wrong. It has also been awhile since I dug into this, the project which got me to dig into this is one of the things which got me to finally make an account on hn and one of my first submissions was Data Structures for Text Sequences.

https://www.cs.unm.edu/~crowley/papers/sds.pdf

shpx|9 months ago

VS Code used 40-60 bytes per line, so a file with 15 million single character lines balloons from 30 MB to 600+ MB. kilo uses 48 bytes per line on my 64-bit machine (though you can make it 40 if you move the last int with the other 3 ints instead of wasting space on padding for memory alignment), so it would have the same issue.

https://github.com/antirez/kilo/blob/323d93b29bd89a2cb446de9...