Homemade RSS aggregator followup

[+] abengoam|10 years ago|reply

That's awesome! I should know because I also created my own RSS aggregator after the demise of Google Reader.

Here's a screenshot https://imgur.com/YHJOiEX

It fills my needs perfectly because I created it specifically for myself and I control it fully in all aspects.

It's been such a tremendous success for me and so fun to create that I am thinking about replacing other online services with custom-made versions, such as google calendar, google tasks, etc. Something to look forward to in 2016.

Great job, and keep at it!

[+] m_mozafarian|10 years ago|reply

After the demise Google reader, pretty much everybody I know started building their very own RSS Reader. We also made one for the Relevant app. It's beautiful card called RSS Reader that you can add it from the library. Then just paste your rss urls in the back of the card and it works. (Currently iOS only). http://relevant.ai/

[+] scrollaway|10 years ago|reply

Your reader looks awesome. Did you open source it?

[+] mmb|10 years ago|reply

The bigger problem is that the amount of information you can consume using RSS feeds is declining. Most sites don't publish RSS feeds of their content any more.

Sadly RSS is left over from a time when things were more open. Now everything is an app and everyone wants you to stay in their walled garden.

[+] anotherevan|10 years ago|reply

Yeah, there's a couple of interesting sites I would like to follow, but they don't provide RSS feeds, so I don't bother.

[+] fulafel|10 years ago|reply

There seem to be many services for addressing this. I wonder if anyone has recommendations about which one to use? Searching for "auto generate rss" returns at least a screenful of these.

[+] rcarmo|10 years ago|reply

I have a fair amount of code that people can re-use to build their own aggregators, since I did a number of experiments when Google Reader died.

One was a Fever clone that had a number of strategies for doing parallel fetching:

- https://github.com/rcarmo/bottle-fever

Andrea Peltrin took that and evolved it into Coldsweat: https://github.com/passiomatic/coldsweat - which I recommend if you want a web UI.

I did a number of other things, but eventually went back to what I used _before_ Google Reader: e-mail.

I was one of the contributors for http://newspipe.sourceforge.net/, and after getting bottle-fever going I decided to investigate the state of the art and did a quick fork of rss2email that injected messages into an IMAP store instead of sending them via SMTP, to avoid spam traps.

It was a quick hack, but it allowed me to read feeds using any mobile IMAP client, and a friend eventually did a Go version, which I've also tweaked to my liking:

- https://github.com/rcarmo/rss2imap (Python) - https://github.com/rcarmo/go-rss2imap (Go)

Any of the above are likely to save people a fair amount of time (do bear in mind that the Python version was a hack atop code that was written by Aaron Swartz a decade ago, and it shows its age).

These days I ended up going back to Feedly, simply because I have to use Windows, the Web UI is good enough and there are lots of good clients for the platforms I use (NextGen, Reeder, etc.)

Plus I realised that trying to archive stuff from hundreds of feeds was somewhat pointless -- the stuff I really want to keep around goes into Pocket or OneNote, and that's that.

Edit: Also, here are some notes from 2008 on Bayesian classification and its effectiveness: http://taoofmac.com/space/blog/2008/01/27/2203#an-update-on-...

[+] voltagex_|10 years ago|reply

I like the idea of (ab)using protocols to do not-quite what they were intended to do.

Pushing RSS feeds into IMAP is a great idea - I wonder how much work it'd take to make NewsBeuter to that, then expose it somewhere and have FastMail pull it into a folder for me.

These hacks eventually start looking like Rube-Goldberg machines, but they've got a certain charm.

Offtopic: I wrote a Wake-on-Lan server that allows me to turn on VMs as if they were physical machines - https://github.com/voltagex/junkcode/tree/master/CSharp/Virt...

The next one for me is probably going to be a DNS server that resolves the name and IPs of VMs.

[+] aw3c2|10 years ago|reply

My perfect aggregator would also create a WARC archive of the webpage of each post, including all external references, maybe the referenced external websites and their references (with that single depth of recursion). The internet is friggen fragile and I would love to archive what I consume.

[+] derefr|10 years ago|reply

To go further: there's basically no point in the "description" part of an RSS item. RSS is broken in that authors need to lure people onto their sites, so they make the RSS item itself enclose just enough of a preview to make you "click through"—whereupon their site can show you ads and they can make money.

How RSS should work, in an ideal technical sense, is to eschew enclosing any content-body in feed items themselves, and instead just encourage RSS consumers (feed-reader clients; feed-muxer daemons) to scrape the permalinks of the feed items, and then heuristically extract the body-content from the scrape-result, and cache both the resulting page-archive and the resulting cleaned-up text, making both representations available offline.

This, obviously, kills blog ad revenue. But it's better to kill it and replace it with something better (402 micropayment-required errors at point-of-caching, handled automatically by the RSS content-spidering daemon as an HTTP client, with costs passed on to its subscribers?) than to continue on with this semi-braindamaged "I have an offline cache but that doesn't actually mean I can read anything offline" world.

[+] zrail|10 years ago|reply

That's a really great idea. Here's a one-line wget that will grab the provided URL and all of the data necessary to render, to one level of recursion, and dump it to a WARC:

    wget -e robots=off \
    --user-agent="Mozilla" \
    -r 1 -p -E -H -k -K \
    --warc-file=/path/to/your/warc/file/without/warc/extension \
    'http://www.example.com'

I think I might start capturing these. Shouldn't take up too much additional disk space.

edit: previously it was `-r 2` which is two levels of recursion.

[+] rakoo|10 years ago|reply

I know it's not a feed aggregator, but you could intrumentalize pinboard (pinboard.in), they have the option of retrieving and storing an archive of all your links if you so desire. They even resolve first level dependencies, so external images are also stored (see https://blog.pinboard.in/2010/11/bookmark_archives_that_don_...). See some numbers on link rot here: https://blog.pinboard.in/2011/05/remembrance_of_links_past/

Pinboard is built as a bookmark manager, but if you say that all entries in a feed is a bookmark then it should work for you. Oh and there's full-text search as well.

[+] vidarh|10 years ago|reply

I agree with that. I've recently been going through my archived blog posts while revamping my blog design, and so much has just flat out disappeared. And so much of what has disappeared have been things I would have guessed would stay...

[+] sheraz|10 years ago|reply

What if, instead of a WARC file, just a high-res screenshot available in desktop, tablet, and mobile mode?

[+] pmoriarty|10 years ago|reply

I've recently come back to using newsbeuter[1] and have been quite impressed. It's really feature rich and very customizable. It's a terminal app, which some might not like, but for me that's a plus.

[1] - http://www.newsbeuter.org/index.html

[+] gerty|10 years ago|reply

I guess Tiny Tiny RSS hasn't been mentioned yet. FOSS, self-hosted with multiple Android clients. I had been using Feedly since Google Reader went down but should have actually been using TTRSS since the beginning. I ain't no power user but it has definitely more than I ever would ask for.

[+] wanda|10 years ago|reply

If anybody happens to be looking for an RSS aggregator, I'd like to recommend GoRead.

Obviously I wouldn't pay for it, but self-hosting is pretty straightforward and it has a companion Android app.

Never cared for Feedly and I don't really fancy making my own.

It's the best Google Reader clone I've found.

https://www.goread.io

[+] ents|10 years ago|reply

I don't like feedly either, but as a backend for apps it works fine, and is free.

[+] petercooper|10 years ago|reply

I made a similar script but that also has 'plugins' so 'URLs' like @username, twitter:topic, /r/subreddit, and hn:topic load up the tweets, Twitter search results, sub-Reddit items, or HN search results respectively for certain keywords, using their respective APIs.

[+] oneloop|10 years ago|reply

Care to share?

[+] krylon|10 years ago|reply

Very interesting!

I am currently building an RSS aggregator, too. Mine is little more complex, though - I wanna be able to rate items as interesting or boring and use some kind of filter (currently, a simple Bayesian classifier, I intend to replace or at least enhance with something more sophisticated over the holidays) to weed out news that I am not interested in.

The biggest problem is that web design is not my strong suite (to put it mildly), so the thing looks pretty ugly. Classification does not work very well, yet, but I am not sure if this is because the classifier sucks in general or if my training set is too small at the moment (I've only been using the thing for a couple of days now).

Anyway, it is quite interesting to see another approach to the problem.

[+] alanpost|10 years ago|reply

Do you use a single classifier for your entire feed, or do you categorize the feeds and maintain a classifier for each topic? (Or, as always, secret option #3: neither.)

[+] dubbel|10 years ago|reply

If you use Bootstrap CSS it looks pretty generic, but it's also nearly impossible to make it look bad.

Sounds like a cool project.

[+] oxplot|10 years ago|reply

Given that email clients are fairly mature and advanced already (especially Gmail's), it seemed logical to go the unix way and use it as the UI to stream of feeds sent as email. I wrote a bit of python [1] and stuck on it a free openshift cartridge and it sends me one email per feed item. It's been up since April this year shortly after I abandoned feedly. I like it more than Google Reader now.

[1]: https://github.com/oxplot/lapafeed

[+] axx|10 years ago|reply

I'm also working on an open source RSS Reader called HappyFeed. It's compatible with Fever RSS API, so you can use it with Reeder, ReadKit, Press and so on.

I work on this project mainly for myself, but if you're interested and want to contribute, feel free to get in touch!

Screenshots and development blog: https://need.computer/happyfeed/2015/12/20/happyfeed-drag-an...

GitHub: https://github.com/aleks/HappyFeed

[+] fasouto|10 years ago|reply

I started creating an RSS aggregator some time ago (https://github.com/fasouto/django-feedaggregator) and it was more difficult than expected. There are many broken feeds and different interpretations of the standard.

One day I should finish it...

[+] rcarmo|10 years ago|reply

Check my top-level comment.

[+] younata|10 years ago|reply

Been working on my own RSS reader for iOS. https://github.com/younata/RSSClient/

Pretty much all the internal logic (parsing feeds/opml files) are also written from scratch, which was interesting to do.

[+] newtang|10 years ago|reply

I built a similarly simple feed aggregator for anyone to use at https://plumfeed.com Shows the most recent post of each feed. I've found it particularly nice for blogs and comics that update once a day or less.

[+] hasteur|10 years ago|reply

Adding annother "Google Reader" replacement for TinyTiny-RSS. There is some jiggery-pokery" that needs to be done, but it does nearly everything you could want (I wouldn't mind getting the behind the cut reveals)

[+] ojiikun|10 years ago|reply

Surprised there has been no mention of NewsBlur. Though you can pay the author for use of the central instance, it is also fully open sourced on github. Saying it is more feature-rich than most examples would be an understatement.

46 comments