top | item 22582047

Show HN: TLDR This – Auto summarize any article or webpage in a click

118 points| radhakrsna | 6 years ago |tldrthis.com | reply

59 comments

order
[+] blobster|6 years ago|reply
Nice landing page. If you Google "summarizer", you will find dozens of similar services for free. The mechanism behind it is very simple. A couple a years ago I built one from scratch in about 2 hours, then I accidentally deleted it and rewrote it in 15 minutes. Here's how most of them work:

1. Split the text into words

2. Rank each word based on how many times it appears in the text. For example, a word that appears 10 times gets 10 points, and so on.

3. Rank sentences based on the sum of the scores of each word inside them.

4. Return the top N sentences by score (N is up to the user), in the order in which they appear in the text.

For extra fancyness, exclude the most common articles and prepositions and give 2 points to proper nouns.

Works surprisingly well.

[+] bhl|6 years ago|reply
You can use tf-idf [1] to achieve step 2 and that extra fancy part of excluding commmon articles and prepositions: count the frequency of words in the article, but divide it by the sum of frequencies from past articles.

Text summarization works as a good toy problem, because it leads to two harder problems: 1. text extraction (how to distinguish content from non-content like ads) 2. q&a (given text and a question about the text, how can you produce an answer).

[1] https://en.wikipedia.org/wiki/Tf%E2%80%93idf

[+] radhakrsna|6 years ago|reply
True, there are quite a few similar services but not many seem to work well. Our service provides better summarization (at least for the articles I tested), had additional features like extracting author name, publish data, important keywords etc and also comes with browsers extensions so you could summarize pages at the click of a button.

The method you described is a part of our algorithms but more steps are needed to make it give meaningful results and make sure it works on different kinds of articles.

[+] shiredude95|6 years ago|reply
I built a similar service a while back, with a small modification to the common algorithm.

You can improve contextual summarization by splitting the x sentences into x/n buckets. Then based on the percent of article to be summarized (eg return 60% of the article), pick the sentences ranked in the top 60% of each bucket. Then do this for all the x sentences, ie top 60% across buckets, and combine them together.

This prevents the bias rising from picking a sentence with a lot of critical words.

[+] mannykannot|6 years ago|reply
I agree that the effectiveness is quite surprising, given the simplicity of the analysis. It can go very wrong, however, if a significant negation is overlooked, as in cautionary tales:

... So, don't do what the late Thag Simmons did...

Maybe final paragraphs should be more highly weighted? That's often where the conclusion is.

[+] thunderbong|6 years ago|reply
I really want to like this. My points -

1. From the articles I tried, the summaries seem to be very basic. They don't seem to capture the essential points of the articles it is trying to summarize.

2. I tried the 'advanced summarizer' too. Here again, it seems to have the same problem. Worse, it seems to skip parts of the article, especially if they are beyond a certain length.

3. The landing page is nice. But the product seems to be targeted towards people who want to share a summary of a random blog post rather than try and save time reading the article.

In my opinion, SMMRY, as mentioned in another comment and which I've used since whenever, seems like a much better product. Additionally, SMMRY also gives you the capability of expanding the number of lines of the summary in case you've found the article interesting and want to read a few more details of it, rather than the full thing.

SMMRY: https://smmry.com/

[+] txcwpalpha|6 years ago|reply
Is there any thought put into considering if this type of service is actually beneficial?

Of course on it's face it seems nice that it saves us time. But it's no secret that the reduction of complicated topics into simplified one-liners leads to less understanding and more misinformation spread.

In my opinion, this just makes that problem worse. There is often a reason that texts aren't already shorter. If the author didn't intend for you to read the details of something and instead wanted you to just read bullet points, they would have just made the bullet points themselves.

[+] dullroar|6 years ago|reply
Didn't do a great job on this AP News article on coronavirus - only four bullet points, two of them repeated: https://apnews.com/545af824f44a22f7559c74679a4f1f53.
[+] thunderbong|6 years ago|reply
I tried the advanced summarizer and got the lines below. Seems to me it skips summarizing beyond a certain length of the article.

Most people have had mild to moderate illness and recovered, but the virus is more serious for those who are older or have other health problems.

The risk of virus transmission from food servers is the same risk as transmission from other infected people, but “one of the concerns in that food servers, like others facing stark choices about insurance and paychecks, may be pressured to work even if they are sick,” she said.

Tests have found high amounts of virus in the throats and noses of people a couple days before they show symptoms. Flu kills about 0.1% of those it infects, so the new virus seems about 10 times more lethal, the National Institutes of Health’s Dr. Anthony Fauci told Congress last week.

The death rate has been higher among people with other health problems -- more than 10% for those with heart disease, for example.

[+] radhakrsna|6 years ago|reply
The Basic Summarizer has its restrictions. Try Advanced summarizer. It will give better results.
[+] kdbg|6 years ago|reply
While I didn't have high hopes to get a summary of a technical paper, since I spend a good chunk of time every week reading some related to exploits and mitigations for a podcast I host, I hoped this might help reduce time spent trying to get an overall understanding before diving into the details.

It actually did better than I expected with the paper "Bypassing memory safety mechanisms through speculative control flow hijacks" [0]

I copied and pasted the text from sections 3-7 (Case Studies - Conclusion) and Section 2 on its own (describes the attack)

It did pull out some important statements, better than I expected. Probably won't save me much time, but I was quite disappointed by the fact that the Advanced and Basic versions were the same for both which kinda felt a bit cheap to get the same results especially since it still cost to get that advanced result. Maybe including information about how the basic version is restricted and what the advanced does better would make it easier to know when the advanced version won't be useful.

I also tested with a random write-up I'll be covering tomorrow "Breaking the Competition" [1] I had higher hopes for this since it was more of a blog-ish post. I did get different results for basic and advanced with this one, but the result was basically non-sense, worse than expected, and worse than the paper summary.

Overall, probably not something that I'll end up using, but technical content also isn't the intended use-case which is totally fair. I'll also add that one feature that I looked for immediately was API access as I'd have wanted to integrate this into an app I use to plan episodes.

- [0] https://arxiv.org/pdf/2003.05503v1.pdf

- [1] https://medium.com/ctf-writeups/breaking-the-competition-bug...

[+] stared|6 years ago|reply
I did try to run it four times, and in each case the result was semi-random: it looks like picking 4 random sentences that open paragraphs. There was not a single case when I would consider the output useful-ish.
[+] radhakrsna|6 years ago|reply
Can you please let me know the article that you tested it on? Maybe you could try the advanced summarizer and see if it gives useful results.
[+] A4ET8a8uTh0|6 years ago|reply
I love the idea. The implementation did not produce the expected results ( article shown on HN - https://amp-economist-com.cdn.ampproject.org/c/s/amp.economi... ).

That said. Keep at it. It seems like a viable and valuable service.

[+] thunderbong|6 years ago|reply
Using the advanced summarizer, I got this -

On March 9th America’s government awarded a trio of firms $39.7m to design “microreactors” that can supply a few megawatts of power to remote military bases, and be moved quickly by road, rail, sea and air. The idea of small reactors is as old as nuclear power itself.

In July 1951, five months before a reactor in Idaho became the first in the world to produce usable electricity through fission, America began building USS Nautilus, a nuclear-powered submarine.

A report by the army in 2018 said that Holos, a prototype mobile nuclear reactor, would be 62% cheaper than using liquid fuel.

NASA is developing smaller “Kilopower” reactors for space missions, designed to power small lunar outposts.

[+] radhakrsna|6 years ago|reply
Glad that you liked it. Have you tested it out with the advanced summarizer? Thank you very much for your encouragement. I will try to keep improving it.
[+] personjerry|6 years ago|reply
Interesting stuff. Have you succeeded in getting paying customers for such a service? I've seen some similar free alternatives online, i.e. resoomer, smmry
[+] radhakrsna|6 years ago|reply
v1 of our service was free as well. v2 includes a basic summarizer which is free and an advanced summarizer which requires payment. Just launched the premium version, so waiting to see.
[+] tetrisgm|6 years ago|reply
I’ve noticed an increase in services like this lately. What gives? Is there some sort of ML serverless offering made available on GCP?
[+] ericlewis|6 years ago|reply
Not exactly “serverless” but I built something similar with AWS SageMaker, which has elastic inference abilities. it’s rather fast to spin up and down.

Also, when it comes to summarization- you don’t really need to infer each run, you can throw up a pretty simple caching system. Which means repeat requests are far cheaper and faster.

I used cloudflare workers as a proxy / caching layer with KV in front of an AWS lambda to do article extraction and SageMaker spinup (with a small cache on the AWS side too- to catch in progress jobs)

[+] radhakrsna|6 years ago|reply
This is v2 of the service that I launched last year. I am not aware of any ML serverless offering that does text summarization.
[+] RubenvanE|6 years ago|reply
Awesome product! I was quite surprised at how good it summarized Dutch articles as well.

Are you planning to launch an API anytime soon?

[+] radhakrsna|6 years ago|reply
Glad you liked it. Let me know if you have any feedback/suggestions. Yes, we do plan to launch an API soon. Please message us here - https://tldrthis.com/contact and we will let you know when we launch it.
[+] elliotec|6 years ago|reply
I tried this on three of my own articles. It worked very poorly. There’s a lot of room for improvement before asking people to pay for it IMO.

I may try my hand at writing one with the reqs outlined in the top comment just for a fun coding project.

[+] radhakrsna|6 years ago|reply
Yes, you are right. There is still room for improvement and we will keep trying to make it better. The reason I added paid plans was to test whether people would be willing to pay for such a service so that I can spend more time in adding more features to it and making it better.
[+] rubyfan|6 years ago|reply
I get an error. I’m on Safari mobile on iOS 13.4. I use content blockers, BlockBear and Firefox Focus - not sure if that’s relevant or not.

Method Not Allowed

The method is not allowed for the requested URL.

[+] radhakrsna|6 years ago|reply
Thank you for letting me know. Are you using the extension or the web app?
[+] rubyfan|6 years ago|reply
Firefox Mobile was functional.
[+] burlesona|6 years ago|reply
Hugged to death? Entering a url to summarize just results in an error for me.
[+] fortran77|6 years ago|reply
If you Google "summarizer", you will find dozens of similar services for free. A couple a years ago I built one from scratch in about 2 hours, then I accidentally deleted it and rewrote it in 15 minutes. Rank each word based on how many times it appears in the text.
[+] anewguy9000|6 years ago|reply
some of the results had me laughing out loud. it would be super useful if it worked though. its a Hard problem.
[+] moneywoes|6 years ago|reply
How did you make the landing page?