(no title)
captainpicard | 3 years ago
Due to C being a typed language, I found the C code often more readable than the untyped Perl. Perl has the advantage of more readable string manipulation and automatic memory management.
You can see the Perl code here (single file): https://git.savannah.gnu.org/cgit/texinfo.git/tree/tp/Texinf...
and the C code here (directory): https://git.savannah.gnu.org/cgit/texinfo.git/tree/tp/Texinf...
As I was only doing this in my spare time, it took several years in total as I recall, although the project can be said to be a success now as the results are being used in the program by default and have made it much faster.
The existing test suite was crucial for testing that the new code had the same results. When developing the new code I kept a reference file with the old code in it using comments with line numbers in the new code.
You have to decide what approach to take and whether it is worth it. Maintaining two parallel sets of code in different languages and trying to maintain compatibility clearly has its costs, as well as benefits in terms of increased scrutiny of the code. However, I do feel that a lot of time can be spent on issues that aren't very important in trying to get exact compatibility for use cases that are quite unlikely.
Before this rewrite we had also achieved a very significant speed-up (about 30%) by rewriting a smaller part of the program in C, just the plaintext paragraph formatter.
Character encoding issues are a huge time sink - it could be as much as 50% of the work.
Before I got involved with the project, in 2010 there had been a complete rewrite of the makeinfo program from C to Perl. (The main developer of Texinfo did a talk in 2011 - https://www.gnu.org/ghm/2011/paris/#sec-2-4.) The main downside of this was that it made the program much slower (about 50 times as slow, unacceptable for some users). It also did not appear to attract many more contributors to the code. The upsides were better structuring of the code, allowing more functionality to be added in terms of supported output formats (although this is only happening now), better test coverage and treatment of different input cases.
No comments yet.