Good job man, have you ever considered taking it few steps further.
I would really appreciate, if an application would go through my RSS feeds and offer me neutral news (without any comments or etc.) only facts in summary.
Imagine you have 15 different news from 15 different sources about same topic. Let's say "Microsoft's new Chromium-Edge Browser" Each tech site is writing about it from their perspective. Some say it is quite cool, some say it is just a Chrome clone. I would appreciate a summary of this 15 web site without additional comments.
I wrote a news website[https://todayheadlines.live/] which show news with similar topic in a cluster form. My pipeline do collect content data as well. However, I am not very sure how to properly summarized all the perspectives.
I actually really like your idea and it can be implemented very quickly.
This project is currently run for the subreddit of my country and the users have liked it a lot, the summaries often remove the bias and keep the facts.
I can make a subproject that will load rhe urls from a rss feed and create shorter summaries. Thankfully I would recycle 90% of the codebase.
quite interesting idea, it's on my todolist for Aktu, a rss reader / news aggregator i built (https://aktu.io/about).
For now Aktu groups together articles in your feeds that are about the same stories, to you can easily check other sources perspectives. But it misses the summary/facts.
Man, I thought word clouds were gone. I remember the word cloud craze in the mid to late 2000's then they sorta vanished. I guess other SEO enhancements replaced them?
This is really good. Right now I am doing a bit of sentence parsing myself and I appreciate the time you have put in to documenting your algorithms as well as the tools used.
I am interested in a few metrics such as sentence length to flag run-on sentences that are not good advertising copy. After reading your article I am wondering what else I need to be doing since I am working at the word level.
I remember Microsoft Word had tools in it to gauge reading level - do you know if there is a convenient library for that in Python world? I am not using Python myself but there is a difference between a tabloid and a broadsheet, maybe you could put that into the mix.
[+] [-] btutal|7 years ago|reply
I would really appreciate, if an application would go through my RSS feeds and offer me neutral news (without any comments or etc.) only facts in summary.
Imagine you have 15 different news from 15 different sources about same topic. Let's say "Microsoft's new Chromium-Edge Browser" Each tech site is writing about it from their perspective. Some say it is quite cool, some say it is just a Chrome clone. I would appreciate a summary of this 15 web site without additional comments.
What do you think?
[+] [-] theblackcat1002|7 years ago|reply
[+] [-] Agent_Phantom|7 years ago|reply
This project is currently run for the subreddit of my country and the users have liked it a lot, the summaries often remove the bias and keep the facts.
I can make a subproject that will load rhe urls from a rss feed and create shorter summaries. Thankfully I would recycle 90% of the codebase.
[+] [-] guybedo|7 years ago|reply
[+] [-] gandhium|7 years ago|reply
[deleted]
[+] [-] giancarlostoro|7 years ago|reply
[+] [-] Agent_Phantom|7 years ago|reply
Fortunately the effort to implement them was very low since I reused an internal variable.
[+] [-] Theodores|7 years ago|reply
I am interested in a few metrics such as sentence length to flag run-on sentences that are not good advertising copy. After reading your article I am wondering what else I need to be doing since I am working at the word level.
I remember Microsoft Word had tools in it to gauge reading level - do you know if there is a convenient library for that in Python world? I am not using Python myself but there is a difference between a tabloid and a broadsheet, maybe you could put that into the mix.
[+] [-] Agent_Phantom|7 years ago|reply
https://github.com/shivam5992/textstat
[+] [-] Agent_Phantom|7 years ago|reply
https://www.reddit.com/user/huachibot/comments/
The good part is that the summary algorithm is independent from the bot logic. It processes the text no matter how you obtained it.