top | item 4008582

Show HN: Simple Web service for fetching HTML page title

8 points| harrywye | 14 years ago |pagesynopsis.com | reply

13 comments

order
[+] harrywye|14 years ago|reply
Thanks for checking the page, bpfh. I'm not sure what's happening. I haven't tried viewing the Web site on Mac, but I can view it both on Windows and Linux (Ubuntu) with Chrome, FF, and IE8. Can you please try again, and let me know if you still have problem? The Web site (which is running on the same JVM of the Web Service) uses Twitter Bootstrap, and it works from my Android devices as well. Thanks!
[+] ashleyw|14 years ago|reply
It's working fine for me on Chrome 20 dev on OS X.

BTW: Use the reply links under comments. :)

[+] LiquidSummer|14 years ago|reply
[+] harrywye|14 years ago|reply
D*ng, LiquidSummer. I've been using this service for almost a year with no problem, and you broke it. :)

It appears that, because it recursively calls it, the call eventually times out. (Google App Engine has this time limit of 10~30 seconds.) I'm not sure if I'll have a solution for this, but I can at least catch the exception. I'll need to look into it further.

Thanks for finding this bug!

[+] givan|14 years ago|reply
I don't get it, is a service for developers to get the page title for a html page, so I must make a request to this service and learn it's api instead of making the request directly to that page and run a very simple regex?
[+] harrywye|14 years ago|reply
Givan, yes and no. Clearly, you're right in that it's one more thing to learn (a particular variant of REST API). But, there can be many benefits of using a Web service like this (or, another app/service/abstraction layer, etc.) in some situations. For example, suppose that you want to get the meta description of a certain Web page using a Javascript (from your own HTML page). The Web page may happen to be large and there can substantial network latency, etc. In many cases, you do not want to do it on the fly every time your page is loaded. You may want to implement a storage or caching layer on the server side, etc. PageSynopsis provides such service "out of the box". It also supports "asynchronous" fetching, periodic refreshing, and so forth. This is a very simple service, but I use it from different apps of mine (and, I don't have to replicate this functionality across different apps). Thanks for checking it out.
[+] picklepete|14 years ago|reply
Parsing HTML with a RegEx pattern is considered very bad practice, there are other more robust ways of scraping. For example, in Python, BeautifulSoup.

You do have a fair point though, this functionality does exist in most major programming languages.

[+] sixcorners|14 years ago|reply
Are you telling me you wouldn't use a JSON api for sending HTTP requests and parsing the results as JSON?
[+] bpfh|14 years ago|reply
On Tue May 22 20:03:29 EEST 2012, all I get is a page with navigation and a greyish pattern background. Could not figure out what to do with it. This is with Chrome and FF on a Mac.