Interesting to see the amount of negative comments.
Most of the negativity seems to come from the following points:
- EU-funded project cannot succeed in tech because previous EU-funded projects have failed in tech in the past (and generally government-funded project in tech are suspicious)
- Search is hard, and therefore it will fail
- The project is underfunded
Even if the points above can all be valid (though the obvious US-funded startup launched by a bunch of uni students seemed to have fared pretty well) it seems we are missing the point of this project.
The proposal is to contribute to the creation of open building blocks necessary to enable others (including private US companies) to make better search products.
Better search product are needed.
Shall we remember HN of some of the intense conversations that happened here about Google failing us:
- Google Search is Dying [1]
- Every Google result now looks like an ad [2]
- Google no longer producing high quality search results in significant categories [3]
So while, yes search is a hard topic, we should welcome initiatives aiming at improving the ground infrastructure needed to lower the barrier to entry on this subject and hope it will allow many companies to build better search products (or inspire other initiatives to contribute in similar and even more successful way)
One reason search is hard is people are very motivated to game your system. That makes being transparent about how it works a fool's errand. It also isn't clear how the economic structure works: improving search relevance by x% is tremendously socially valuable, but probably makes Google's bottom line go up by a thousandth of x, and they have a very direct understanding of the connection. Without that money, how will this get the resources to succeed given the adversaries are learning from the people with a lot more?
> EU-funded project cannot succeed in tech because previous EU-funded projects have failed in tech in the past
I think the fundamental problem here is that the people that are interested and in grants and are capable of writing grant proposals are different than the people that are interested in building things. There's very little overlap. So the money goes to the people capable of writing proposals, and the people doing the work do it for free in an obscure corner of the internet.
It's sad really, but I suspect it's a side effect of the huge bureaucratical machine that is the EU. One way to make this better would be to simplify the access to grants so that technical people can do it without needing a class in "EU funding speech".
If these thinkers can find a way to remove the incentives of spammers, misinformators, and stakeholders to pollute results, that would be a great achievement. It could be seen as a big economic game, simulating these actors might allow comming up with rules to balance this game and minimize pollution.
Any new open-source search option is good, but I also wish more attention was given to prior open projects like GigaBlast[0]/KBlast[1] crawlers, etc.
It hasn't escaped the wider world that quality open-source search is desirable, and it's hard to think what this new EU project brings to the table that isn't already available if others want to contribute to existing efforts. I wish the EU project the best of luck of course!
> EU-funded project cannot succeed in tech because previous EU-funded projects have failed in tech in the past (and generally government-funded project in tech are suspicious)
I'm not suspicious of all government-funded projects (back in the day my own PhD was government-funded!) but I can't help be suspicious of claims such as:
"an open European infrastructure for internet search, based on European values and jurisdiction"
and
"The project will be contributing to Europe’s digital sovereignty"
Q1: Who defines "European values"? Is that done by Qualified Majority voting or would - for instance - Hungary have a veto on any proposed definition?
Q2: Which treaties regulate "digital sovereignty"? Recalling that the 27 member states each "remain sovereign and independent"[0], is that digital sovereignty being handled in BRU, in the 27 states, or a mixture of both?
> EU-funded project cannot succeed in tech because previous EU-funded projects have failed in tech in the past (and generally government-funded project in tech are suspicious)
Do we distinguish direct EU funding from funding by governments of EU countries? If not ASML would like a word, and I'm sure people from other member countries can come up with other examples.
I have written this before but I’ll put it here again. What I would like to see is a federated search engine. Based on activitypub that works like mastodon. Don’t like the results from one source? Just remove them from your sources, or lower their ranking. Similar to yacy but you can work with the protocol to connect or build whatever type of index you want using whatever technology you like, and communicate over an existing standard. Want to build the worlds best index of Pokémon sites, then go do it. Want to build a search engine using idris or ats? Sure! I did note the professors are on mastodon so perhaps this may actually happen.
One of these days I’ll actually implement the above assuming nobody else does. I figured if I can at least get the basics done and a reference implementation that’s easy to run it could prove the concept. If anyone is interested in this do email my in my bio.
What I worry about for this project is that it becomes another island which prohibits remixing of results like google and bing, and its own index and ranking algorithms become gamed.
I wish the creators best of luck though. I am also hoping for some more blogs and papers about the internals of he engine. So little information is published in the space that anything is welcome, especially if it’s deeply technical.
> Don’t like the results from one source? Just remove them from your sources, or lower their ranking.
That's basically Usenet killfiles and, yes, I think they're totally due for a comeback in one form or another. Usenet may have had its issues towards the end (although it still exists), but killfiles weren't one of its problems. The simplest one you could just discard sources you didn't want to read anymore but the more advanced you could assign weight/rankings based on various factors (keywords / usernames / if you did participate or not in a discussion / etc.).
We like Federated search, we like decentralized search, and even P2P search; we are trying to find a good mix, and decided to get started rather than wait! Exciting times.
What's the point of a federated search engine? At the end of the day most nodes will end up implementing the same regulations/censorship with development driven primarily by a few. It's like ethereum vs ethereum classic all over again. If the EU or the developers' respective governments demand a censorship or forgetting feature to be implemented, it's not like the federated nature would matter. An open source search index is useful, a search engine that can be easily self hosted is also useful. But building a search engine as a federated system is a gimmick with no significant value.
Do you see any major Mastodon nodes interfacing with Truth Social or Gab? I certainly don't. If federation barely works for a social media app, I fail to see how it would even matter for a search engine.
Isn't searx what you're describing? I was running an instance for a while, and it's basically a meta search engine that has support for all kinds of providers.
There are also some web extensions available so that you can fill it with more data.
One of the things I wonder here is if it would be easier to just start by crawling known RSS feeds and then exposing a JSON API for the data and making the whole thing open source. Then keeping a public list of indexes and who crawls what. Eventually moving into crawling other sources but first primarily addressing the majority of useful content that's easily parseable.
What benefit does federation bring here? Unless it is very simple to set up, most communities are non-technical and probably won't be able to set up their own crawler. I would think just a search engine that lets you customize the ranking algorithm, and maybe hook into whatever ontology they've developed and ranking it accordingly would be sufficient.
ActivityPub is not well suited for this application. It's for publishing activities made by actors — hence the name. You'll want to invent your own federation protocol specifically for federated search.
Seriously... I really wanted to like this project, but it seems everything EU touches as of recently gets worse.
From the webpafe that half of the time shows "Resource Limit exceeded" to a technology stack diagram on the bottom of this page https://openwebsearch.eu/the-project/ being completely unreadable due to bad scaling.
It is very disappointing really. Another example from the top of my head. Here in Poland we have ID cards(as every other EU country) . Those ID cards have to be renewed every now and then (10~15years). In last years an online system for government services was implemented including for renewal of those cards. One could take a photo with a mobile phone, submit an application and pick up a card from a gov office in few weeks time. Unfortunately, EU made a law that ID cards applications have to be acommpanied by biometrics (fingerprints) so this system has been thrown away. One has to physically go to the gov office, scan their fingerprints, apply for a new id card and then go again to pick it up...
Ok, so what happens in 10 years time? They should have the fingerprints already, right? No. They take the fingerprints, they store them only until one picks up the id card and then they are deleted. There are no fingerprint database, they are not stored anywhere. The fingerprints are used only to ensure the same person that submitted the application picks up the document. It makes zero sense, other than to break the previous online system. Thanks EU.
And why the hell the government will keep the fingerprints? So that once in 10 years I save 1h. The benefit is minor compared to bad things that can be done with a fingerprints DB (mass surveillance..)
so your complaints are: the webpage announcing this news isn't perfect, and the (well-known for being incompetent, corrupt and anti-EU) Polish government has implemented an EU policy poorly?
I'm a bit skeptical EU-funding a bunch of professors is the way a search engine will be built.
The primary goal for academics is to publish new findings, while what you need to build a search engine is rock solid CS and information retrieval basics. Academically, it's not very exciting. Most of it was hashed out in the 1980s or earlier.
The real game-changer in search would be if companies would agree to publish indexes of their own sites in an open standard to a place that everyone could access. This would undercut the monopoly power that large search engines have and allow everyone to focus on innovating the best way to search that content vs. having to spend so much time and money to crawl and index it.
> A new EU project OpenWebSearch.eu … [in which] … the key idea is to separate index construction from the search engines themselves, where the most expensive step to create index shards can be carried out on large clusters while the search engine itself can be operated locally. …[including] an Open-Web-Search Engine Hub, [where anyone can] share their specifications of search engines and pre-computed, regularly updated search indices. … that would enable a new future of human-centric search without privacy concerns.
So.. Who's going to create the index?
Indexing the web is expensive, and its offset by the ads the indexer runs on their search website, such as Google, bing, brave and others.
What does "based on European values and jurisdiction" refer to? I'd love to be pleasantly surprise, but this sounds like it's ripe for centralized censorship.
On first glance, I see the word "unbiased" immediately followed by "based on European values". Now, I'm no expert, but to me, that seems pretty biased.
I wish a project doesn't try to moonshot a thing that we already had.. I want the good old Google search where you type something, and all relevant web result pop out (without creepy ads, trackers, personalisation, shopping links to get you to spend). I understand it is very expensive to crawl the web, that's where an org like the EU can come in. I wish they bootstrapped a simple, open index that any startup can use to provide clean search fronts. Crawling the web, and indexing content can be quite expensive, if EU bootstrapped that infrastructure, and then say augmented it with federated crawling and indexing (sort of like the distributed computing MIT condor project), where people, Universities etc in the EU could contribute their spare computer time to crawling the web and indexing it for the EU index, making search better for everyone. Heck, I'd even put solar panels in my bike shed and hook it up to a PC to crawl the web while the sun is shining.
But the web is different; google much less so. There is no "old google search", because almost everything to search through is noise and spam.
PageRank made user preferences as signalled by hyperlinks the key signal of quality. Who really hyperlinks any more? And how many, by percentage, are non-automated etc.?
The web this works for is one of forums & personal websites. It's hard to say that today there are any properties of websites that are a reliable signal of quality.
Hence the proliferation of voting sites (such as HN, etc.) which are little more than search engines augmented with reliable signals of user preferences.
Correct me if I am wrong, but so the purpose is to create an index database, upon which custom search engines can be attached upon? Ie, the EU will crawl all pages on the web?
This is an excellent idea to disintermediate Google.
A search tool for the commons without the spyware and opportunities for search manipulation is very important, and the more actors work on it, the better.
Perhaps cooperations with other tech savvy democracies could occur. India is full of skilled programmers. Unlike Russia or China it is easy to see Europe cooperate with it.
I suspect search engines are an outdated concept for at least the largest of sites, who will generally, but not always, have better ways to directly search their own content.
The remainder of the search problem seems to just be collecting relevant trafficked sites for listing in results. Today Google et al seem to be doing this BY HAND. And it's not even obfuscated.
Recently, for the first time in my life, the wizard behind the curtain seems to have been exposed. I feel strongly that one could probably start a small index that catered to a fairly large audience.
And honestly, for other queries, just tell the user to search that site directly. I think you could even market it to users as not a technical limitation, but behavior that should be considered fuddy-duddy.
Like, really, you're going to search me? You know they have their own search right?
Even Yellow Pages faded into obscurity eventually.
Before Corona, I would have really welcomed this announcement. But to be honest, when someone says that the project will be "based on European values," I think it's stillborn.
Who is defining those "European values"? Is it the European Commission?
As a liberal person who still dreams of mature citizens who form their own opinions well-informed from a rich debate, I now see the "European values" quite critically.
In Germany - if you criticize the Corona measures you are called a Nazi, if you criticize the Ukraine war you are a Russian troll. Are those then the "European values"?
The European Commission has already established some projects to weight the information according to its will (SocialTruth, PROVENANCE, EUNOMIA, etc.).
When a government agency talks about truth, then all the hairs on the back of my neck stand up.
When governments speak of disinformation, it is usually only in the sense that the information does not correspond to their interests. I had to learn that the so-called fact checkers don't check facts, they just sell a counter opinion as a "fact".
So for a European search engine, a filter is then placed upstream that filters out all disinformation?
In the past, that was called censorship. Today, it's more like citizen service.
> If you criticize the Corona measure you are a Nazi, if you criticize the Ukraine war you are a Russian troll?
These notions are propagated by the US to the rest of the world. While EU politicians seem to go along, europeans do not tolerate it much and that much is reflected in society.
If the EU or more gov sponsored search engines pop up, i have no doubt they will want to control them in their own way - i'm okay with that. Right now our only option is US controlled entities that answer to US govs and rules.
At least here we can point fingers and hold politicians accountable if they try to influence things the wrong way. We can't do that with US companies.
[+] [-] margarina72|3 years ago|reply
Most of the negativity seems to come from the following points:
- EU-funded project cannot succeed in tech because previous EU-funded projects have failed in tech in the past (and generally government-funded project in tech are suspicious)
- Search is hard, and therefore it will fail
- The project is underfunded
Even if the points above can all be valid (though the obvious US-funded startup launched by a bunch of uni students seemed to have fared pretty well) it seems we are missing the point of this project.
The proposal is to contribute to the creation of open building blocks necessary to enable others (including private US companies) to make better search products.
Better search product are needed.
Shall we remember HN of some of the intense conversations that happened here about Google failing us:
- Google Search is Dying [1]
- Every Google result now looks like an ad [2]
- Google no longer producing high quality search results in significant categories [3]
So while, yes search is a hard topic, we should welcome initiatives aiming at improving the ground infrastructure needed to lower the barrier to entry on this subject and hope it will allow many companies to build better search products (or inspire other initiatives to contribute in similar and even more successful way)
1: https://news.ycombinator.com/item?id=30347719
2: https://news.ycombinator.com/item?id=22107823
3: https://news.ycombinator.com/item?id=29772136
[+] [-] wbl|3 years ago|reply
[+] [-] kazinator|3 years ago|reply
My 3 going on 4 kid was watching a cartoon on Youtube: Curious George.
An ad popped up promoting some show, featuring foul language and sexual intercourse.
Yagoddabekidding.
[+] [-] sanxiyn|3 years ago|reply
[+] [-] mariusor|3 years ago|reply
I think the fundamental problem here is that the people that are interested and in grants and are capable of writing grant proposals are different than the people that are interested in building things. There's very little overlap. So the money goes to the people capable of writing proposals, and the people doing the work do it for free in an obscure corner of the internet.
It's sad really, but I suspect it's a side effect of the huge bureaucratical machine that is the EU. One way to make this better would be to simplify the access to grants so that technical people can do it without needing a class in "EU funding speech".
[+] [-] ffhhj|3 years ago|reply
[+] [-] p1necone|3 years ago|reply
Not sure about software, but all (well, Apple is getting there) phones now use the same connector to charge because of the EU.
[+] [-] lurkernomore|3 years ago|reply
It hasn't escaped the wider world that quality open-source search is desirable, and it's hard to think what this new EU project brings to the table that isn't already available if others want to contribute to existing efforts. I wish the EU project the best of luck of course!
[0] https://github.com/gigablast/open-source-search-engine
[1] https://github.com/fossabot/kblast
[+] [-] logifail|3 years ago|reply
I'm not suspicious of all government-funded projects (back in the day my own PhD was government-funded!) but I can't help be suspicious of claims such as:
"an open European infrastructure for internet search, based on European values and jurisdiction"
and
"The project will be contributing to Europe’s digital sovereignty"
Q1: Who defines "European values"? Is that done by Qualified Majority voting or would - for instance - Hungary have a veto on any proposed definition?
Q2: Which treaties regulate "digital sovereignty"? Recalling that the 27 member states each "remain sovereign and independent"[0], is that digital sovereignty being handled in BRU, in the 27 states, or a mixture of both?
[0] https://op.europa.eu/webpub/com/eu-what-it-is/en/
[+] [-] vanderZwan|3 years ago|reply
Do we distinguish direct EU funding from funding by governments of EU countries? If not ASML would like a word, and I'm sure people from other member countries can come up with other examples.
[+] [-] kidsil|3 years ago|reply
Like the Internet, or the WWW?
[+] [-] nonethewiser|3 years ago|reply
[+] [-] tyiz|3 years ago|reply
[+] [-] Proven|3 years ago|reply
[deleted]
[+] [-] boyter|3 years ago|reply
One of these days I’ll actually implement the above assuming nobody else does. I figured if I can at least get the basics done and a reference implementation that’s easy to run it could prove the concept. If anyone is interested in this do email my in my bio.
What I worry about for this project is that it becomes another island which prohibits remixing of results like google and bing, and its own index and ranking algorithms become gamed.
I wish the creators best of luck though. I am also hoping for some more blogs and papers about the internals of he engine. So little information is published in the space that anything is welcome, especially if it’s deeply technical.
[+] [-] TacticalCoder|3 years ago|reply
That's basically Usenet killfiles and, yes, I think they're totally due for a comeback in one form or another. Usenet may have had its issues towards the end (although it still exists), but killfiles weren't one of its problems. The simplest one you could just discard sources you didn't want to read anymore but the more advanced you could assign weight/rankings based on various factors (keywords / usernames / if you did participate or not in a discussion / etc.).
[+] [-] arjenpdevries|3 years ago|reply
[+] [-] melony|3 years ago|reply
Do you see any major Mastodon nodes interfacing with Truth Social or Gab? I certainly don't. If federation barely works for a social media app, I fail to see how it would even matter for a search engine.
[+] [-] fabrice_d|3 years ago|reply
[+] [-] cookiengineer|3 years ago|reply
There are also some web extensions available so that you can fill it with more data.
[1] https://searx.github.io/searx/
[+] [-] asim|3 years ago|reply
[+] [-] googlryas|3 years ago|reply
[+] [-] grishka|3 years ago|reply
[+] [-] camel-cdr|3 years ago|reply
[+] [-] Roark66|3 years ago|reply
From the webpafe that half of the time shows "Resource Limit exceeded" to a technology stack diagram on the bottom of this page https://openwebsearch.eu/the-project/ being completely unreadable due to bad scaling.
It is very disappointing really. Another example from the top of my head. Here in Poland we have ID cards(as every other EU country) . Those ID cards have to be renewed every now and then (10~15years). In last years an online system for government services was implemented including for renewal of those cards. One could take a photo with a mobile phone, submit an application and pick up a card from a gov office in few weeks time. Unfortunately, EU made a law that ID cards applications have to be acommpanied by biometrics (fingerprints) so this system has been thrown away. One has to physically go to the gov office, scan their fingerprints, apply for a new id card and then go again to pick it up...
Ok, so what happens in 10 years time? They should have the fingerprints already, right? No. They take the fingerprints, they store them only until one picks up the id card and then they are deleted. There are no fingerprint database, they are not stored anywhere. The fingerprints are used only to ensure the same person that submitted the application picks up the document. It makes zero sense, other than to break the previous online system. Thanks EU.
[+] [-] ssmiler|3 years ago|reply
[+] [-] scrollaway|3 years ago|reply
They are. They are stored on the card itself, and they are NOT stored in government databases. This is a good thing...
[+] [-] contravariant|3 years ago|reply
[+] [-] permo-w|3 years ago|reply
[+] [-] marginalia_nu|3 years ago|reply
The primary goal for academics is to publish new findings, while what you need to build a search engine is rock solid CS and information retrieval basics. Academically, it's not very exciting. Most of it was hashed out in the 1980s or earlier.
[+] [-] rgrieselhuber|3 years ago|reply
[+] [-] dataking|3 years ago|reply
[0] https://en.wikipedia.org/wiki/Quaero
[1] https://www.dw.com/en/germany-pulls-away-from-quaero-search-...
[+] [-] jacooper|3 years ago|reply
So.. Who's going to create the index? Indexing the web is expensive, and its offset by the ads the indexer runs on their search website, such as Google, bing, brave and others.
[+] [-] logicalmonster|3 years ago|reply
[+] [-] Extropy_|3 years ago|reply
[+] [-] dang|3 years ago|reply
[+] [-] lucideer|3 years ago|reply
> Resource Limit Is Reached
> The website is temporarily unable to service your request as it exceeded resource limit. Please try again later.
Original URL might be more resilient...
[+] [-] reacharavindh|3 years ago|reply
I'm dreaming too much, I need coffee..
[+] [-] mjburgess|3 years ago|reply
PageRank made user preferences as signalled by hyperlinks the key signal of quality. Who really hyperlinks any more? And how many, by percentage, are non-automated etc.?
The web this works for is one of forums & personal websites. It's hard to say that today there are any properties of websites that are a reliable signal of quality.
Hence the proliferation of voting sites (such as HN, etc.) which are little more than search engines augmented with reliable signals of user preferences.
[+] [-] unknown|3 years ago|reply
[deleted]
[+] [-] s-xyz|3 years ago|reply
[+] [-] qualudeheart|3 years ago|reply
[+] [-] stiray|3 years ago|reply
I didnt look for all but I did look who is partner from Slovenia.
The most privacy invading ISP/Mobile company in our country selling statistics about their users (although anonymized, but walking on a thin line).
I just hope it wont go down that drain.
Regarding all the negativity about the government project.
If we can run CERN we can surely do a web search project.
[+] [-] Animats|3 years ago|reply
[+] [-] andrewmcwatters|3 years ago|reply
The remainder of the search problem seems to just be collecting relevant trafficked sites for listing in results. Today Google et al seem to be doing this BY HAND. And it's not even obfuscated.
Recently, for the first time in my life, the wizard behind the curtain seems to have been exposed. I feel strongly that one could probably start a small index that catered to a fairly large audience.
And honestly, for other queries, just tell the user to search that site directly. I think you could even market it to users as not a technical limitation, but behavior that should be considered fuddy-duddy.
Like, really, you're going to search me? You know they have their own search right?
Even Yellow Pages faded into obscurity eventually.
[+] [-] ur-whale|3 years ago|reply
No research institutes from {France, Italy, Spain, Greece, Portugal, etc ...} involved.
[+] [-] nickdothutton|3 years ago|reply
[+] [-] psuresh|3 years ago|reply
[+] [-] sazz|3 years ago|reply
Who is defining those "European values"? Is it the European Commission?
As a liberal person who still dreams of mature citizens who form their own opinions well-informed from a rich debate, I now see the "European values" quite critically.
In Germany - if you criticize the Corona measures you are called a Nazi, if you criticize the Ukraine war you are a Russian troll. Are those then the "European values"?
The European Commission has already established some projects to weight the information according to its will (SocialTruth, PROVENANCE, EUNOMIA, etc.).
When a government agency talks about truth, then all the hairs on the back of my neck stand up.
When governments speak of disinformation, it is usually only in the sense that the information does not correspond to their interests. I had to learn that the so-called fact checkers don't check facts, they just sell a counter opinion as a "fact".
So for a European search engine, a filter is then placed upstream that filters out all disinformation?
In the past, that was called censorship. Today, it's more like citizen service.
[+] [-] johnywalks|3 years ago|reply
These notions are propagated by the US to the rest of the world. While EU politicians seem to go along, europeans do not tolerate it much and that much is reflected in society.
If the EU or more gov sponsored search engines pop up, i have no doubt they will want to control them in their own way - i'm okay with that. Right now our only option is US controlled entities that answer to US govs and rules.
At least here we can point fingers and hold politicians accountable if they try to influence things the wrong way. We can't do that with US companies.