top | item 22498745

(no title)

donnellycc | 6 years ago

How do you deal with pages that use JavaScript to fetch page content after the initial markup is loaded?

discuss

order

JohnFen|6 years ago

I'm not 3xblah, obviously, but I deal with those sites by not using them.

code_duck|6 years ago

I wish that was a choice for me. Often I have to interact with websites for services such as phone accounts, banks or taxes where I don’t have the reasonable option of choosing to not use the site.

3xblah|6 years ago

"How do you deal with pages that use Javascript to fetch page content after the initial markup is loaded?"

Provide an example page and I will demonstrate how I would solve the problem.

Not every user visits the same websites and web pages, so without giving specific examples, discussions about how to deal with these pages never go anywhere on HN.

To be honest, out of all the websites I have visited over entire lifetime using the www, the number where I have had to make any extra effort because of Javascript in order to retrieve some text/html, image or video is very small proportion. Not one that is large enough to justify using a JavaScript-enabled browser as default. For me, these are exceptional cases, not the norm.

The extra effort is usually a one-off script, not something I need to save.

Occasionally it is something I save for future use. One example of a saved script would be for non-commercial YouTube channels. Goal was a 2-column CSV of all videos from a channel in the form of title, url. Goal was not "perfection", just quick solution.

yy025 and yy032 are custom utilties for generating HTTP and decoding HTML, respectively.

Using a short script called "ytc" the process would be something like the following. openssl s_client is used as an example of a TLS client. "XYZ" is the name of the channel.

   echo https://www.youtube.com/channel/XYZ/videos|ytc|sed wXYZ

   Connection=keep-alive yy025 < XYZ|openssl s_client -connect www.youtube.com:443 -servername whatever -ign_eof > 1.html

   ytc title < 1.html > XYZ.1

   ytc url < 1.html > XYZ.2

   paste -d, XYZ.[12] > XYZ.csv
Here is the "ytc" script

   case $1 in 
   "")exec 2>/dev/null;
   export Connection=close;
   yy025|openssl s_client -connect www.youtube.com:443 -servername whatever -ign_eof |sed 's/%25/%/g'|yy032 > 1.tmp;
   while true;do
   x=$(sed 's/%25/%/g;s/\\//g' 1.tmp|yy032|grep -o "[^\"]*browse_ajax[^\"\\]*" |sed 's/u0026amp;/\&/g;s/&direct_render=1//;s,^,https://www.youtube.com,')
   echo > 1.tmp;
   test ${#x} -gt 100||break;
   echo "$x";
   echo "$x"|yy025|openssl s_client -connect www.youtube.com:443 -ign_eof > 1.tmp;
   done;rm 1.tmp

   ;;-h|-?|-help|--help)echo usage: echo https://www.youtube.com/user/XYZ/videos \|$0;echo "usage: $0 {title|url} < html-file"
   ;;1|title) sed 's/\\//g;s/u0026amp;//g;s/u0026quot;//g;s/u0026#39;//g'|grep -o "ltr\" title=\"[^\"]*"|sed 's/ltr..title=.//'  
   ;;2|url) sed 's/\\//g;s/u0026amp;//g;s/u0026quot;//g'|grep -o "[^\"]*watch?v=[^\"]*" |sed 's,^,https://www.youtube.com,'|uniq
   esac