arjenpdevries's comments

arjenpdevries | 3 years ago | on: EU Open Web Search project kicked off

https://qz.com/1145669/googles-true-origin-partly-lies-in-ci...

Enjoy! It's a great story.

(Plus: for who might not know, DARPA is US defense research, and heavily influenced by the intelligence services needs. Which is not necessarily bad! Just good to understand where and how Google originated. And wrt DARPA, they funded the creation of the internet itself, for whatever matters.

In Europe, things often go slightly different. The Web is a result of CERN, who are also a project partner of OpenWebSearch.EU. Why? Well, better search can also be beneficial for better science, not just for end users wanting to find their way or buying something.)

arjenpdevries | 3 years ago | on: EU Open Web Search project kicked off

Yes, that is an advantage.

You can also integrate search results for which you cannot have the index, like social media APIs, another reason.

You could also mix and match search results from various topic-oriented indices. That's a research question, whether that is really better than building one unified one. But we think it is the way to bring index fragments to the edge, with the obvious privacy advantages.

arjenpdevries | 3 years ago | on: EU Open Web Search project kicked off

The project is starting so not all your questions are answerable today, but, we definitly will produce an open web index, already by the end of the first year, with improvements for years two and three.

We further deliver components to make search engines on top of this index. The project vision is that there will be many different search engines, not just 4 worldwide. Hoping to lead the way!

arjenpdevries | 3 years ago | on: EU Open Web Search project kicked off

That cannot be true, as the project has yet to start. But anyone can start a crawler, so you may have encountered other people's software. We wouldn't be so unknowledgeable to ignore robots.txt ;-)

arjenpdevries | 3 years ago | on: EU Open Web Search project kicked off

Slovenia, Czech Republic. But yes, I think there was a competing proposal from Italy/Spain. Not enough budget for two projects in this area, unfortunately, as they were good too.

arjenpdevries | 4 years ago | on: Comparing SQLite, DuckDB and Arrow with UN trade data

Arrow is meant to share data as-is instead of requiring a copy, and, often, serialization/deserialization.

(This requires both ends to be able to handle the Arrow representation.)

Eg, it has the potential to speed up query processing in PySpark by a lot, because of its Java/Python interoperability.

page 1