top | item 34916476

(no title)

landric | 3 years ago

I _had_ no good answer for the Google News result until you prompted me to Inspect source just now...

I'm basically scanning for <a> tags and searching the text within. Doing a Google News inspect, it appears that their links actually have no text, but are sibling elements of an <h#> tag. So, I need to figure out how to parse that correctly...

discuss

order

filoleg|3 years ago

> Doing a Google News inspect, it appears that their links actually have no text, but are sibling elements of an <h#> tag. So, I need to figure out how to parse that correctly...

I just checked Google News myself, and you are correct that the sibling <h#> tag has the text. However, the <a> tag with the link has it too, but as a prop instead of being nested inside. Unless I am mistaken about the use case of that prop here, you can just extract the text from the aria-label property of the <a> tag.

And in case you want to proceed with parsing text from the sibling <h#> tag instead, you can just get the list of the parent <article> tag children nodes (yourAnchorTagNode.parentNode.parentNode.children; had to do a double .parentNode, because the <a> tag is wrapped in a singular <div> tag) and then search for the only <h#> tag there. That will be your target tag with the text.

landric|3 years ago

Yep, that's right.

I was _hoping_ to get away with the same xml-parsing for each site, but I guess I'll need to customize