top | item 40112958

Show HN: OpenOrb, a curated search engine for Atom and RSS feeds

259 points| lowercasename | 1 year ago |openorb.idiot.sh

Alternative search engines are neat, as are RSS feeds. OpenOrb is a self-hosted app which allows visitors to search over a list of blogs you love. If you put your 10 favourite blogs in there, it'll search just those blogs and not show you any sponsored content or machine-generated garbage (unless... you follow blogs written by machines?)

Personal RSS feed readers can usually do this sort of thing, but RSS readers aren’t meant to be shared, so you can think of the search engine as a 'curated feed list as a public service'.

I wrote a longer blog post about OpenOrb here: https://raphael.computer/blog/openorb-curated-search-engine/

56 comments

order

gbrindisi|1 year ago

I really like the idea! At some point I put up a miniflux instance and it has surprisingly been a breath of fresh air for my content consumption. What miniflux and my setup lacks is a way to retrieve stuff I read and this OpenOrb might fit the use case... I will try it out!

freetonik|1 year ago

What do you mean by "retrieve stuff I read"?

INGSOCIALITE|1 year ago

but does this filter out the rss feeds that are just a headline and then a "click here to read the whole story" link?

that's what killed rss it wasn't google reader going away, it was the ad-weaponizing of the feeds themselves

beaugunderson|1 year ago

I really like the idea of feed/entry search but it seems to not return very relevant results... if I search for "software defined radio" with or without quotes I get lots of results that don't have those terms in them

derekzhouzhen|1 year ago

> If you put your 10 favourite blogs in there, it'll search just those blogs...

10 feeds will not give you much recall. I have 50K+ feeds, 1M+ posts, and it just starts to give somewhat respectable results.

throwup238|1 year ago

Have you dumped your feed list anywhere?

lbhdc|1 year ago

This is a cool idea.

When I search for "history" it returned only technical articles, and heavily favored dan luus website.

Are technical blogs the primary focus?

marginalia_nu|1 year ago

Given that techy people have a strong disposition to have a blog, more so than other demographics, there's an implicit bias toward the technical within the blogosphere, especially in its diminished state.

JohnSSS1978|1 year ago

've been thinking that I needed one of these, but you've already made it happen. That's really great.

marginalia_nu|1 year ago

Tangentially on this note, if anyone is interested, I can produce a list of every RSS feed known to the marginalia search crawler. It's a pretty noisy list, but any thing I can do to help the spread, discovery and adoption of RSS I'm happy to help with so just let me know.

I a tool in place to export this data to help power the experimental RSS preview feature[1], but haven't had the inspiration to do much with that yet.

[1] e.g. https://search.marginalia.nu/site/jvns.ca

--edit-- Ok so there was interest. Give me a moment, I'll need to run an extraction script. Check back in a few hours or bookmark https://downloads.marginalia.nu/exports/

petercooper|1 year ago

I would be very keen to have access to that list and to, ideally, have a go at cleaning it up and producing a topical subset for broader use in certain fields I'm interested in (e.g. all the "developer blogs", say). I offer an OPML file of several hundred engineering/dev related blogs at https://engineeringblogs.xyz/ but I'm starting to think a little bigger.

marginalia_nu|1 year ago

Alright, about half a million RSS feeds available at: https://downloads.marginalia.nu/exports/ [select feeds.csv]

The data is, as mentioned, pretty noisy. It's a best-effort guess as to which is the canonical RSS feed for the particular domain. There doesn't appear to be any convention for specifying this, so when there's multiple a fair bit of guesswork is involved. Expect a fair number of dead URLs, lots of spam from CRMs that generate uninteresting feeds.

mariusor|1 year ago

Isn't there a way to integrate this type of info into the actual search engine? Ie, search for type:rss or atom and return the links to the RSS feeds?

[edit] I mean, to have it closer to what OP showed.

hactually|1 year ago

I started a submission based platform ( bao.social but not currently resolving) as a side project because I missed the accessibility for RSS. would be keen on the list or even just connecting with you and OP

lowercasename|1 year ago

That would be _so_ cool! What an amazing resource that would be.

djoldman|1 year ago

I think the community would be interested in list and you'd get a lot of downloads if you offered it up.

freetonik|1 year ago

Nice! I was thinking about the same kind of tool a while back, and developed a community-based curated feed reader with full-text search. It's not public yet (sign ups are behind an invitation code), but search works for guests: https://minifeed.net/global

lowercasename|1 year ago

This is super nice, and it looks like it's going to have some really great features, well beyond OpenOrb's! Excited to keep an eye on this.

toastal|1 year ago

Great to it’s hosted on a free software forge too not locking in contributions!

Not sure I always agree that feeds should have the full post tho. This not only (obviously) bloats the size of the feed, but there are valid reasons to want to drive users to your site--especially if you have demos or you write about code & have your code blocks syntax highlighted (statically, never do this with a JavaScript) as it provides a better reading experience. You can put styling technically in Atom/RSS but even then, a lot of readers won’t be applying the styling. That said, I definitely appreciate the full post if your site is full of trackers, ads, marketing garbage or other bloat since I can skip the site. Is this some site engineers giving us the nod on a better UX? I read a gridiron football news site & boy does that feed become take a site from unusable to pleasant (good photography).

fabianholzer|1 year ago

As a feed consumer I am always happy if a feed contains the full content, but I am not sure if the feed must also include all articles that a site ever published. That would basically make the feed a serialized version of the whole website (which is indeed what a few feeds that I subscribe to do by including sections that are common on personal sites like about/contact/now as items of their feed - but those are the minority). That would actually be fine as long as the archive is small or at certain size, when the feed is paginated. But I am under the impression that most feed generators do not have pagination in mind, also I don't know how well the individual aggregators and readers handle it on the consuming end.

keepamovin|1 year ago

I wonder when RSS will experience its "Google Search in 1997" moment? Right now it's beginning to nibble at Yahoo Directory days

tl|1 year ago

That would be 2005 when Google Reader launched. RSS for people who didn't know what RSS was.