top | item 25926204

(no title)

We were amazed at how far we were able to get with them – if solving a problem with a regular expression produces two problems, we should now have 13,000 problems. The fact that they worked so well is due to the work of the subeditor who compiled (and still maintains!) the rule corpus – as well as the sheer volume, there are quite a few carefully ordered rules. Because style guide matches are reasonably sparsely found in content, and usually reasonably specific as to what matches (even if it's difficult to produce a correction) it turned out to be a surprisingly tractable problem to produce something useful with regular expressions alone – but we'd never have discovered that was the case unless someone had spent literally years doing it!

General maintainability is a priority, and we'd like to improve our rule management tooling to make the process of rule maintance generally accessible to editorial staff. We're also working on making noisy rules match more specifically, which usually involves migrating the initial regex into Languagetool for e.g. pattern-matching on part-of-speech.

discuss

No comments yet.