We learned the hard way, for some of us it's all too easy to make careless design errors that become baked-in and can't be fixed in a backward-compatible way (either at the DSL or API level). An example in Graphviz is its handling of backslash in string literals: to escape special characters (like quotes \"), to map special characters (like several flavors of newline with optional justification \n \l \r) and to indicate variables (like node names in labels \N) along with magic code that knows that if the -default- node name is the empty string that actually means \N but if a particular node name is the empty string, then it stays.There was a published study, Wrangling Messy CSV Files by Detecting Row and Type Patterns by Gerrit J. J. van den Burg, Alfredo Nazábal, and Charles Sutton (Data Mining and Knowledge Discovery, 2019) that showed many pitfalls with parsing CSV files found on GitHub. They achieved 97%. It's easy to write code that slings out some text fields separated by commas, with the objective of using a human-readable portable format.
You can learn even more by allowing autofuzz to test your nice simple code to parse human readable files.
No comments yet.