solso | 3 years ago | on: Brave Search Goggles: Alter search rankings with rules and filters
solso's comments
solso | 3 years ago | on: Brave Search Goggles: Alter search rankings with rules and filters
Goggles white-paper was released more than a year ago, long before Kagi was even announced to the public.
Additionally, before Brave acquired Tailcat (Jan 2021) I had the pleasure to share the draft of the paper with Kagi's founder.
So no, there is no prior art.
Let me add that I do not claim that Goggles is prior art of Lenses either.
One of the key features of Goggles design is that the instructions, rules and filters are open and URL accessible.
A Goggle is not so much a personal preference configuration, but a way to collaborative come up with shareable and expandable search re-rankers.
Very different goals if you ask me. Of course, Goggles can be used for personal preferences exclusively, but that's not the use case we had in mind.
solso | 3 years ago | on: Brave Search Goggles: Alter search rankings with rules and filters
The article also mentions that Goggles will not stop polarization, it suffices to not exacerbate it.
No technology/system on any period of time has been able to suppress it, censorship included.
Disclaimer: I work at Brave search
solso | 3 years ago | on: Bing contract prohibits DuckDuckGo from completely blocking Microsoft tracking
solso | 3 years ago | on: Bing contract prohibits DuckDuckGo from completely blocking Microsoft tracking
"it is misleading to say our results just come from Bing."
Discussing how many sources can one bring together it's a distraction to not discuss the degree of dependency between DDG and Bing. More-so when claiming that others suffer from the same, which is factually incorrect for Brave search.
solso | 3 years ago | on: Bing contract prohibits DuckDuckGo from completely blocking Microsoft tracking
I'd like to correct some factually incorrect information regarding Brave Search.
Brave search crawls the web through the Web Discovery Project and has its own crawler, which fetches a bit more than 100M pages daily.
Brave search uses Bing API and Google fallback for about 8% of the results shown to the users, the remaining 92% are served from our own index, when we launched almost 1 year ago the number of results from 3rd parties was 13%.
There is no need to mention "multiple source" when a number can be given. The underlying theme here is not if DDG provides no value on top of Bing, it does, no one is questioning that. The question is whether DDG would be able to operate if Bing were to shut DDG down tomorrow.
If Bing and Google were to disappear tomorrow, for whatever reason, Brave search would continue to operate, that's the independence Brave search is building.
solso | 4 years ago | on: Brave Search beta
In my opinion having similar results to Google will facilitate adoption. After all, Google is pretty good for many types of queries (not all), and people in general have strong habits.
The fact that we are similar with our own index is great. It means that we have the power of deviating from it when needed, as we mature/evolve.
Allow me to repurposed your statement on why not use startpage if you want Google-like results: if tomorrow Google disappears (or for some reason becomes unusable), brave search will continue to operate as normal (similar to old Google). What will happen to searx or startpage? What till happen to ddg or swisscows if the provider turning bad is Microsoft. IMHO, no matter how much reranking or nice features they you put on top, unless you do not control the search results themselves, diversity can only be superficial.
Sorry for the "rant". Thanks a lot for the inputs and for updating the doc, appreciate it.
[0] https://brave.com/wp-content/uploads/2021/03/goggles.pdf
solso | 4 years ago | on: Brave Search beta
The fact that you see results similar to Google for popular queries is a by-product of the fact that our ranking is trained using anonymous query-log. There is plenty of references to the methodology (https://0x65.dev/).
The fact that we are similar to Google on certain types of queries, is good (at from the perspective of human assessment). It's easy to find other types of queries for which we are not similar to Google. It would be rather stupid if we were to "use google" on easy to solve queries but not on the complicated ones, don’t you think? In any case, very nice article besides a couple of miss-conceptions (like this one), will bookmark.
Disclaimer: work at Brave search, used to work at Cliqz
solso | 5 years ago | on: Brave buys a search engine, promises no tracking, no profiling
However, you are assuming that HumanWeb data collection is record-linkable, which is not the case, precisely to avoid this attack.
If what is being collected is linkable: e.g. (user_id, url_1), ... (urser_id, url_n). No matter how you anonymize user_id, it will eventually leak. A single url containing personal identifiable information, e.g. a username, will compromise the whole session. No matter how sophisticated the user_id generation is. The real problem, privacy-wise, is the fact that record can be linked to the same origin. An attacker (or the collector) has the ability to know if two records have the same origin.
The anonymization of HumanWeb, however, ensures that linkability across data points is not present. Hence, an attacker cannot know if two records come from the same origin. As a consequence, the fact that one url might give away user data, for instance a username, it would not compromise all the urls sent by that person.
If you are interested in more details I recommend this article: https://0x65.dev/blog/2019-12-03/human-web-collecting-data-i...
[Disclaimer I'm one of the authors]
solso | 5 years ago | on: Brave buys a search engine, promises no tracking, no profiling
solso | 5 years ago | on: Brave buys a search engine, promises no tracking, no profiling
[1]https://0x65.dev/blog/2019-12-02/is-data-collection-evil.htm... [2]https://0x65.dev/blog/2019-12-03/human-web-collecting-data-i... [3]https://0x65.dev/blog/2019-12-04/human-web-proxy-network-hpn...
solso | 5 years ago | on: Brave buys a search engine, promises no tracking, no profiling
solso | 5 years ago | on: DuckDuckGo, Google, and Android choice screens
Bing is interested serving DDG, Qwant, Ecosia and a lot of other unknown search engines because of the aggregated reach they provide for their ad network. Ad-networks only work if the aggregated audiences are massive, otherwise advertisers do not bother putting their ads there, only the top-3/5 ad networks get to see any action. So Bing wants/needs a bigger audience just to be on the game. They can grow in 2 paths: 1) increase Bing search reach (difficult), or 2) use partners with different value propositions.
Bing charges little for 1K query, 1USD officially but it gets cheaper, to zero :-) The real thing though, is that if you display Bing ads, you get a 70%-90% rev-share of the ad-revenue, which varies from country to country, something between $5 to $20 per 1K queries.
So, DDG basically gets around 5$ to 10$ net for each 1K, and can spend all that money on distribution and marketing so that they get even more users. Bing gets the rest, money, and what's more important, their ad-network continues to be competitive.
Everybody wins, right? :-/
Search is so cheap 1$/1K queries and people makes 2/3 queries per day, so $1/year/user (average). It makes no economic sense to build an alternative. Unless of course, you are building out of "ideals".
solso | 5 years ago | on: Cliqz is shutting down
> This is definitionally false. The very collection of data compromises one's privacy, by nature of it having been collected.
That's not definitionally false, if it sounds false to you is because you have an implicit assumption that does not apply.
Data from users does not imply user sessions on the collector side (session as a set of multiple data points belonging to the same user).
If sessions are collected, then, privacy is impossible to guarantee. We are well aware of that, having worked on this problems for almost 20 years. But that's precisely what Cliqz never did. All messages from our users are record-unlinkable for us, meaning that we have no way to reconstruct any session.
If you are interested, check the HumanWeb posts on https://0x65.dev/ or the papers https://0x65.dev/pages/dissemination-cliqz.html
solso | 5 years ago | on: Cliqz is shutting down
Why the ruckus then? Because some assume that is data is sent, privacy is compromised, period. They do not know how to do it, and they assume it's impossible. Instead of checking the claims for themselves (code is public, data can be inspected, documentation, etc.) they prefer to stick to their belief system, which is more comfortable and does not imply hard work. The press release that FF -- written by one of these people with a lot of biases and published without review -- did not help as it was misleading.
We did a big mistake back then. Instead of rebutting it, we chose to ignore the FUD assuming that facts would prevail. They did not.
Sadly the community is "scared", we have been congratulated and lauded by anyone who checked our systems. But never endorsed in public, there is little to gain and a lot to lose (you are getting a sneak preview right now).
Sad story, extremely frustrating too, but there is nothing we can do now.
solso | 5 years ago | on: Cliqz is shutting down
Thanks for noticing it, we will create an issue.
UUIDs only applies to telemetry, which is not the data being described in the paragraph: queries, scrolling, amount time spend, urls, etc. For this kind of user data (HumanWeb) there is no uuid, neither implicit or explicit.
There are plenty of papers on the topic, independent audits, the code is open-source and the data can be inspected. HumanWeb data is 100% record-unlikable, we have no way to know if two messages received come from the same person or not.
solso | 5 years ago | on: Cliqz is shutting down
You can collect data from users and still do not compromise their privacy, it's how you do it that matters, becomes a design requirement. Collecting a url visited, can lead to build a user history (privacy hazard) or not. It's an design choice. The whole mantra that data!=privacy is doing a lot of damage (for anyone curios we did publish plenty of material on the topic, https://0x65.dev/blog/2019-12-02/is-data-collection-evil.htm...)
solso | 5 years ago | on: Cliqz is shutting down
Sorry to hear that the quality was not good for you, it depends on country to country (depending on the users-base basically). For Germany, quality was good enough, QA analysis on stratified queries backed it up. That being said, perceived quality from a person is not properly reflected on NDCG-like metrics, you do not remember the 9 queries it did right, but the one that was totally off.
In any case, DDG is good, and let me emphasize, they (and others) provide a lot of value to the users, privacy-concerned or otherwise. But the underlying problem is not getting fixed, unless, hopefully someday, they come up with an independent index (let's hope).
solso | 5 years ago | on: Cliqz is shutting down
"The company only survived because of the investor throw a lot of money". 100% correct, and that speaks greatly about the investor. They believe that Google is a monopoly that needs to fought, as many others. But, instead of (or on top of) bitching and moaning, lobbying, etc. they put good money where their mouth was. Kudos for that.
Privacy was never Cliqz primary product. Privacy was a strict design requirement of Cliqz, which can be marketed more or less. Data collection and browsers alike, we wanted them to be private, because that's the right thing to do, even if it was more difficult to implement. The whole data vs. privacy argument is fallacious. One of the reasons why privacy was so important to us is precisely now, whoever ends up owning the data cannot learn anything about any of the users. Imagine the government getting Google's data if they go belly up or upon "legal" request (change Google by any other company). The data of Cliqz poses no risk to any user, including myself.
The primary product of Cliqz was search, either as the typical result page or instant search integrated on the browser. That's very difficult to build, and expensive, something that DuckDuckGo, Startpage, Qwant, etc. do not have to pay because they rely on the backend of others (not 100%, but mostly). If we were repackaging Bing/Google/Yandex with a different ranking twists, our quality would have been better from the beginning, of course. But that's not building an alternative to Google, which is what we wanted. Still, that's not a pun to DDG and others, what they provide has value to the users, of course. But they are not real alternative, kind of an electric car that gets its electricity from burning coal.
Brave is a great browser, respects to Brendan and team. We both "fight" against Google. For Brave it's Chrome, for Cliqz was both Chrome and Search. Too much to chew? Yes, but we had plenty of fun. The only thing I regret after +6 years working there is the loss of such a great team.
solso | 5 years ago | on: Cliqz is shutting down
Cliqz search was never on par with Google -- I build parts of it -- but was getting there little by little. To be more precise, it was getting good enough, to not be a factor. That has some merit given the totally independent index (not relying on Bing under the hood).
Brave the same as Cliqz are trying their best to offer an alternative. If you think you can do better, please do so. Believe, I'll root for you regardless of my opinion about you (we crossed path in the past). Why would I support you, even though that does not mean I use what you build? Because we are in need of having plurality on the Web, the more the better. Unlike you, I do not see the point of speaking bullshit, not sure if out of ignorance or ill-will, don't know, don't care.
For the record, when I said "long before Kagi was even announced to the public." I meant exactly what I wrote, not that Kagi did not exist, it did.