(no title)
scrapheap | 6 months ago
At the end of the day I had a Perl script that used a regex to extract each top level element in the XML, which it then could attempt to parse. If the element parsed correctly then it was put in known good file and if it didn't parse then it was put in it's own separate file. Luckily there was only a handful of those invalid XML elements, which I could fix up by hand and then stitch back into the known good XML file.
No comments yet.