(no title)
landric | 3 years ago
I'm basically scanning for <a> tags and searching the text within. Doing a Google News inspect, it appears that their links actually have no text, but are sibling elements of an <h#> tag. So, I need to figure out how to parse that correctly...
filoleg|3 years ago
I just checked Google News myself, and you are correct that the sibling <h#> tag has the text. However, the <a> tag with the link has it too, but as a prop instead of being nested inside. Unless I am mistaken about the use case of that prop here, you can just extract the text from the aria-label property of the <a> tag.
And in case you want to proceed with parsing text from the sibling <h#> tag instead, you can just get the list of the parent <article> tag children nodes (yourAnchorTagNode.parentNode.parentNode.children; had to do a double .parentNode, because the <a> tag is wrapped in a singular <div> tag) and then search for the only <h#> tag there. That will be your target tag with the text.
landric|3 years ago
I was _hoping_ to get away with the same xml-parsing for each site, but I guess I'll need to customize