top | item 45774927 (no title) latenightcoding | 4 months ago when I used to crawl the web, battle tested Perl regexes were more reliable than anything else, commented urls would have been added to my queue. discuss order hn newest rightbyte|4 months ago DOM navigation for fetching some data is for tryhards. Using a regex to grab the correct paragraph or div or whatever is fine and is more robust versus things moving around on the page. chaps|4 months ago Doing both is fine! Just, once you've figured out your regex and such, hardening/generalizing demands DOM iteration. It sucks but it is what is is. horseradish7k|4 months ago but not when crawling. you don't know the page format in advance - you don't even know what the page contains!
rightbyte|4 months ago DOM navigation for fetching some data is for tryhards. Using a regex to grab the correct paragraph or div or whatever is fine and is more robust versus things moving around on the page. chaps|4 months ago Doing both is fine! Just, once you've figured out your regex and such, hardening/generalizing demands DOM iteration. It sucks but it is what is is. horseradish7k|4 months ago but not when crawling. you don't know the page format in advance - you don't even know what the page contains!
chaps|4 months ago Doing both is fine! Just, once you've figured out your regex and such, hardening/generalizing demands DOM iteration. It sucks but it is what is is.
horseradish7k|4 months ago but not when crawling. you don't know the page format in advance - you don't even know what the page contains!
rightbyte|4 months ago
chaps|4 months ago
horseradish7k|4 months ago