ebursztein's comments

ebursztein | 11 months ago | on: Google announces Sec-Gemini v1 a new experimental cybersecurity model

Thanks for looking in-depth in our post. The Hitachi RTU500 mention is not an hallucination, we did check for those. It is mentioned in the Mandiant threat intelligence data.

ebursztein | 1 year ago | on: Spann: Highly-Efficient Billion-Scale Approximate Nearest Neighbor Search (2021)

Try Usearch - it's really fast and under rated https://github.com/unum-cloud/usearch

ebursztein | 2 years ago | on: Magika: AI powered fast and efficient file type identification

Thank you - we are adding them to our test suit for the next version.

ebursztein | 2 years ago | on: Magika: AI powered fast and efficient file type identification

Thanks for the feedback -- we will look into it. If you can share with us the list of URL that would be very helpful so we can reproduce - send us an email at [email protected] if that is possible.

For crawling we have planned a head only model to avoid fetching the whole file but it is not ready yet -- we weren't sure what use-cases would emerge so that is good to know that such model might be useful.

We mostly use Magika internally to route files for AV scanning as we wrote in the blog post, so it is possible that despite our best effort to test Magika extensively on various file types it is not as good on fonts format as it should be. We will look into.

Thanks again for sharing your experience with Magika this is very useful.

ebursztein | 2 years ago | on: Magika: AI powered fast and efficient file type identification

Co-author of Magika here (Elie) so we didn't include the measurements in the blog post to avoid making it too long but we did those measurements.

Overall file takes about 6ms (single file) 2.26ms per files when scanning multiples. Magika is at 65ms single file and 5.3ms when scanning multiples.

So Magika is for the worst case scenario about 10x slower due to the time it takes to load the model and 2x slower on repeated detection. This is why we said it is not that much slower.

We will have more performance measurements in the upcoming research paper. Hope that answer the question

ebursztein | 2 years ago | on: Magika: AI powered fast and efficient file type identification

Thanks for the list, we will probably try to extend the list of format supported in future revision.

ebursztein | 2 years ago | on: Magika: AI powered fast and efficient file type identification

Indeed but as pointed out in the blog post -- file is significantly less accurate that Magika. There are also some file type that we support and file doesn't as reported in the table.

ebursztein | 2 years ago | on: Magika: AI powered fast and efficient file type identification

Thanks :)

ebursztein | 2 years ago | on: Magika: AI powered fast and efficient file type identification

We did release the npm package because indeed we create a web demo and thought people might want to also use it. We know it is not as fast as the python version or a C++ version -- which why we did mark it as experimental.

The release include the python package and the cli which are quite fast and is the main way we did expect people to use -- sorry if that hasn't be clear in the post.

The goal of the release is to offer a tool that is far more accurate that other tools and works on the major file types as we hope it to be useful to the community.

Glad to hear it worked on your files

ebursztein | 2 years ago | on: Aleister Crowley and William Butler Yeats get into an occult battle (2016)

For those interested I did put Crowley original tarot deck on the Etteilla Foundation website https://etteilla.org/en/deck/69/crowley-s-thoth-tarot

ebursztein | 2 years ago | on: AI helps keeping Gmail inboxes malware free [slides]

Here are the slides of my recent talk at FIC on how Google uses AI to strengthen Gmail's document defenses and withstand attacks that evade traditional antivirus solutions.

This talk recounts how in the last few years we researched and developed a specialized office document scanner that combines a custom document analyzer with deep-learning to detect malicious docx and xls that bypass standard AVs. In 2021 our AI scanner was able to detect 36% additional malicious documents that eluded other scanners on average and 178% at peak performance.

I hope you will find those slides useful and informative. If you have any questions, please ask away, will do my best to answer :)

ebursztein | 3 years ago | on: Ask HN: Share your personal site

https://elie.net mostly about the research we do at Google. Generated statically and hosted on firebase. The design was custom made for it.

ebursztein | 4 years ago | on: Deep-learning side-channel attacks: the theory

1. We did look at many implementations - protection is mostly masked AES rather than having "flattened power". It take us far more time, to get the datasets right and ready for sharing but hopefully we will get to release the papers and the datasets in not a too distance future.

2. If you go after a very valuable target, side-channels are very realistic. There are also more mainstream attacks like against game consoles but they were glitch or timing based afaik.

ebursztein | 4 years ago | on: Account Protections – A Google Perspective

Thanks :)

Regarding security keys, actually it depends: if you are enrolled in advanced protection no you must have two security keys and we don't fall back. For normal users that is the case yes. Until phone as a security keys is widely used and security keys are main stream the no fall back option can't be the default but hopefully we get there :)

ebursztein | 7 years ago | on: Quantifying the impact of the Twitter fake accounts purge – a technical analysis

We are going to make the dataset public with the paper. I am happy to share it privately in the mean time.

ebursztein | 8 years ago | on: Insights on the First Three Years of the Right to Be Forgotten at Google

Hey :)

We did have an official post today: https://www.blog.google/topics/google-europe/updating-our-ri... that is focused on the improved transparency report rather than the research paper and I am sure it will reach a larger audience. You will recognize the infographic that I borrowed from it :)

When we make a paper public I have the habit to share a short summary (sometime with a delay thus) of what the paper is about on my personal blog. Kurt, Luca and Yuan helped with the summary on a personal capacity which is why they are co-author of the blog post.

Bottom line this is not the official post, we do have one, it is just a summary a few authors of the paper wrote in personal capacity to share some of the finding we found the most interesting.