top | item 36383773

Show HN: Answer Overflow – Indexing Discord content into the web

333 points| rhyssullivan1 | 2 years ago |answeroverflow.com

Hi!

I'm Rhys, I develop Answer Overflow a search engine for Discord channels. Answer Overflow indexes content from channels into Google making them discoverable on the web.

I'm sharing this again after seeing a lot of discussion during the Reddit blackout about the inaccessibility of information sent in Discord servers.

Answer Overflow is a verified bot in over 100 communities, fully complies with the Discord ToS, and is open source! https://github.com/AnswerOverflow/AnswerOverflow

Check out some of the communities here!

T3 Community - https://www.answeroverflow.com/c/966627436387266600

C# - https://www.answeroverflow.com/c/143867839282020352

Reactiflux - https://www.answeroverflow.com/c/143867839282020352

All - https://www.answeroverflow.com/browse

Please let me know what feedback you have, thanks for checking it out!

98 comments

order

apignotti|2 years ago

Genuine question: I love Discord, but how on earth is it possible that such functionality was not built-in to begin with?

I really don't understand how the need for indexing and search was overlooked.

rhyssullivan1|2 years ago

I think it's due to how Discord evolved as a platform

Discord start as "your private place for your friends to talk" during a time where there were a lot of privacy issues with other communication methods.

Then as it grew beyond this scope of being a private place for friends, it would have been good for indexing to be added but indexing a normal text channel is really hard since you don't know where the conversation starts / stops to submit to a sitemap.

Now we've got large public communities and forum channels so it's possible they roll out their own version soon, but it does still slightly go against how their product was originally created so there may be some hesitation with adding it due to not knowing what the community reaction will be like.

esafak|2 years ago

It's not made for knowledge discovery; it's for gamers. Just look at that busy UI! The content is assumed to have no historical value.

Kiro|2 years ago

It makes no sense to index the vast majority of content. You would need to cherry pick really hard among all the noise to find the stuff worth putting online.

madeofpalk|2 years ago

Discord has 'indexing' and search, just like how Slack does. It's just not on the public & open web - only searchable inside of Discord.

starlevel003|2 years ago

> I really don't understand how the need for indexing and search was overlooked.

It wasn't overlooked. The point is to make it difficult for outside users to access information unless they sign up.

thunky|2 years ago

What I wonder is why would anyone that cares about archiving/search would choose to use Discord?

thrashh|2 years ago

Discord is a chatroom first. What non-enterprise chat comes with archives?

A forum is totally different.

And even then, forums weren’t designed to be archived from the start. People just wrote web crawlers and search engines.

(I know Discord has some forum-like functionality now but the point stands.)

chillfox|2 years ago

Discord does have search, but I really hope they do not improve it.

The lack of good search really prevents the hostility towards new users that you often see on Reddit/forums where every question is instantly answered by a one liner "use the search" reply.

Discord communities are some of the most friendly and welcoming communities I have ever encountered on the internet. I think a large part of it is the chat nature and inability to easily pull up old comments.

andybak|2 years ago

Rhys - are you sure the consent functionality is working? I'm seeing indexed posts by users who are in a time zone that makes it very unlikely they have consented in the last hour or so.

The one user whom I contacted said they had never clicked the green consent button.

EDIT - turns out those posts were only visible to me when I was logged in to both sites (which makes sense).

It wasn't obvious this was the case and checking incognito shows things correctly.

rhyssullivan1|2 years ago

Glad we got this resolved and it was all working properly, the site does need to do more to make it clearer when viewing a private message while signed in added it to the backlog sorry about that!

easygenes|2 years ago

While I see the value here, I don't really think most Discord communities are appropriate to be indexed. It breaks the whole cozy web aspect of it. [1]

[1] https://maggieappleton.com/cozy-web

TeMPOraL|2 years ago

The "cozy web" is out of control these days. A lot of social utility is lost by default because everyone uses Whatsapp and Discord and other such information black holes, places where knowledge goes to die. It's OK if you're using these to chat with your family or friends, but it's kind of... less OK, when every open source project these days, including major programming languages, tells you to join their Slack or Discord for support and learning.

What's happening is that these "communities" demand you to commit first, and deny providing value to passive participants. If that sounds reasonable to some, let me point out that the entire value of the Internet is built on doing the opposite. Wikipedia, Reddit, StackOverflow, everything that you can find through a search engine - those are all resources made available by people and groups that, for various reasons, decided to share knowledge instead of hoarding it, invite passive participation instead of demanding active commitment. The good days of the Internet, the ones people mourn, back before it got fully commercialized? They were built on the sentiment of openly sharing information, giving them "pay it forward" style - not gate-keeping them in webs of trust, and/or demanding people to pay with effort.

Maybe I'm too old, but I hate the "cozy web" with passion.

rhyssullivan1|2 years ago

Most Discord communities aren't meant to be indexed I agree! Thanks for linking that article it was interesting to read

There's lots that have support channels though for programming libraries, for games, etc and having all of that content locked away can be really damaging.

One of the interesting things I've noticed is when a community for a more niche game / programming library joins Answer Overflow, they often shoot up to being top performers on the site which is great to see.

Along with that, not all channels are indexed, mainly just help channels. What's nice with this is it keeps that cozy feeling of a private place to talk, while helping more people find a community they will enjoy and keeping information accessible.

Long term, I'd like to implement forms of anti-abuse tools for communities to use so they can understand what the types of people who join their server from Answer Overflow are like. For example, if it turns out that 90% of the people who join are abusive, then it'd make sense for them to turn off indexing.

You could possibly make the argument that for the long term health of some communities, having indexed content helps to keep the community active

philippejara|2 years ago

Most discord communities that are big enough to get indexed were supposed to be forums anyway, or part of one.

leobg|2 years ago

Indexing Discord is going to be tough. The reason is that context is all over the place:

Question in one message. Then two unrelated messages. Then a partial answer by somebody. And so on.

It’s even worse than indexing a PDF. Just breaking stuff into paragraphs and generating embeddings isn’t going to cut it.

sprremix|2 years ago

I imagine this will only work (and only index) threads. So the context can be gathered from the thread title/body and underlying messages reflect the discussion.

Some communities I'm in have #support channels which only support threads. So you create a thread, add a title and a body message and people can reply to your thread by clicking on it. There's no way to post individual messages; only comments in threads.

Thread overview: https://i.imgur.com/jfvrRtG.png

Opening a thread: https://i.imgur.com/pqGrARI.png

This solves your context problem. Still not sure if this is the right direction we want to go in. This just proves to me that Discord is not right tool for the problem at hand.

mdaniel|2 years ago

Welcome back. How does this compare to Linen (https://github.com/linen-dev/linen.dev#readme), which claims to support Slack and Discord? I do see the license difference, but didn't know if that was the major differentiator

rhyssullivan1|2 years ago

Couple key differences:

- Answer Overflow works on a consent basis for displaying messages (https://docs.answeroverflow.com/user-settings/displaying-mes...), while Linen does all the messages in a community. The consent system Answer Overflow has helps a lot with respecting user privacy while also getting content indexed.

- Linen appears to be building out a competitor to Slack & Discord while Answer Overflow is focused on building on top of those platforms, so we've got very different roadmaps. From what I can gather from the Linen roadmap, they're implementing things like voice chat, private channels, etc. Whereas with Answer Overflow some of the things I'm focused on is answer automation, tracking outdated answers, analytics for where to improve your docs etc

- Answer Overflow is pretty much only focused on Discord servers, it wouldn't be too hard to support both Slack and Discord but what's nice about focusing on Discord for now is it helps with our goal of being the best indexing tool specifically for Discord

- Global search (https://www.answeroverflow.com/search), you can search all Answer Overflow communities at the same time

The team at Linen have built out a great product though and it's cool watching them succeed with it!

bitshiftfaced|2 years ago

People who give their consent to Discord to host their writing don't necessarily do so for third parties. Isn't there a copyright issue here?

rhyssullivan1|2 years ago

Not really for a few reasons:

- The API grants you essentially a sublicense to the data, since Answer Overflow is a bot going through the official API and following the ToS properly, that should cover it for any potential issues - Answer Overflow gets consent from users to use their messages https://docs.answeroverflow.com/user-settings/displaying-mes...

jaygreco|2 years ago

This is awesome and timely!

I’ve been wanting to set something like this up for the nullbits server for a while. When I picked discord instead of a forum, I wasn’t counting on the growth we saw. There’s a lot of friction for new folks who aren’t yet on discord, and there’s a lot of knowledge in the server that’s locked behind discord.

Just set everything up! My only feedback is that enabling indexing for all of our text channels took a while doing them all individually, but that’s kind of on me for not enabling forums for help requests until now.

rhyssullivan1|2 years ago

Welcome to the Answer Overflow community! I agree it'd be good to have a quicker way to setup multiple channels - to be honest it's kind of far in the backlog as it's pretty rare a server has many, but the UX could be improved there

If you have any other feedback, please send it to me on Discord so I make sure I see it - thanks!

freediver|2 years ago

There are several issues with surfacing search results from Discord as mentioned before in the thread, and even if all of them are resolved the biggest one remains relevance.

Unless a general purpose web search engine introduces a special Discord 'tab', like Images/News/Videos already exist, there is no way for a search engine to assign relevance to anything said on Discord because there is no authority or link graph based credibility for any message. In other words a mention of 'blue widgets' on Discord is competing with milions of web pages mentioning 'blue widgets' which all have some kind of built in relevance. If the idea is that this will be achieved through people linking to an aggregrator like this website, then perhaps, but the approach does suffer from the chickien and the egg problem.

andybak|2 years ago

I'm mostly interested in surfacing content on pretty specific topics with clear keywords.

But also either answeroverflow.com will gain some domain authority over time, or the communities will be hosted on domains that already have some.

jcq3|2 years ago

I can imagine obvious use cases for data surveillance, osint and so on But happy to see implementation of a semantic search engine powered by LLM

mid-kid|2 years ago

I was talking about needing a solution like this just a second ago. Down from the heavens, descends this. I'll be sure to give it a try!

wanderingbit|2 years ago

Me too! I am trying to build a Discord-based remote course and am excited to read through the code here and see if it matches my needs, or can be tweaked to so.

Once I do that I'd like to DM you with some questions mid-kid.

Nice job on getting so much implemented and open for users!

rhyssullivan1|2 years ago

Send me a message if you have any questions! Happy to help with getting it setup

Alifatisk|2 years ago

Cool idea, There have been cases where I had to create a burner account just to access a Discord community and its walled content.

returnInfinity|2 years ago

Soon discord will pull a reddit and shutdown your app.

Good luck!

arp242|2 years ago

If this takes off you may very well get a letter from Stack Overflow lawyers over the name. It's your choice if you want to take that risk, but just FYI.

(And to be honest, I think they would be justified too; I initially assumed it was related to Stack Overflow based on the title. but turns out it's not – this is the sort of confusion trademarks are intended to protect).

rhyssullivan1|2 years ago

Under their own guidelines it's fine https://stackoverflow.com/legal/trademark-guidance

> Do name your application with something unique. Including one of the terms, "Stack" or "Exchange" or "Overflow" in your product name is generally okay.

It's a different enough product that I feel comfortable with it - Stack Overflow is only for programming while Answer Overflow is for all topics. Along with that Overflow is a pretty generic word and if you wanted to get super technical with it, the context I'm using the word in is "I have so many answers they're overflowing" while theirs is a reference to a programming term.

We'll see and I'm not a lawyer but given that their trademark guidelines allow it, I feel comfortable

dancemethis|2 years ago

Good on you (and everyone) for releasing it as Free Software. Just the target is unfortunate.

retox|2 years ago

It would be useful if clicking an image opened it in an imagebox or expanded it inline.

bsenftner|2 years ago

Um, why Google? So your indexes can be polluted with their shitty advertising? Why not expose your index as a service? I mean really, WTF not?

ilrwbwrkhv|2 years ago

As much as I like this project. Discord is as absolute disaster. The only reason communities move there is because it's free.

isnhp|2 years ago

Slack just lets everyone do it.

tudorw|2 years ago

nice, there is a lot of good stuff on discord!

berkle4455|2 years ago

I'm sure Discord and their communities are absolutely ecstatic about opening up the doors to openAI and others to scrape their collective work for the latest LLM.

Walled gardens are going to get a whole lot stricter.