There is generally something about the Gemini models which feels a bit different than Claude, ChatGPT or Mistral.
I always have the feeling that I'm chatting with a model oriented towards engineering tasks. The seriousness, lack of interest of being humorous or cool.
I don't know if this is because I interact with Gemini only through AI Studio, and it may have different system instructions (apart from those one can add oneself, which I never do) than the one at gemini.google.com.
I never use gemini.google.com because of the lack of a simple export feature. And it's not even possible to save one chat to disk (well, neither do the others), I just wish it did.
AI Studio saving to Google Drive is really useful. I lets you download the chat, strip it of verbose things like the thinking process, and reuse it in a new chat.
I wish gemini.google.com had a "Save as Markdown" per answer and for the complete chat (with a toggle to include/exclude the thinking process). Then it would be a no brainer for me.
It's the same as if Google Docs would not have an "Download.." menu entry but you could only "save" the documents via Takeout.
You put into words something I've been struggling to describe for a long time. Gemini gives short, succinct responses with whatever information you need and minimal anything else. ChatGPT, Claude both fill text with mannerisms, formatting, etc.
I didn't realize just how big the difference was until I tested it.
"How do I clear a directory of all executable files on Debian?"
Gemini 2.0 Flash: (responses manually formatted)
find /path/to/directory -type f -executable -delete
Replace /path/to/directory with the actual path.
ChatGPT: (full link [1])
To clear (delete) all executable files from a directory on Debian (or any Linux system), you can use the find command. Here's a safe and effective way to do it:
# [checkmark emoji] Command to delete all executable files in a directory (not recursively): [..]
# [magnifying glass emoji] Want to preview before deleting? [..]
# [caution sign emoji] Caution: [..]
I have been using the Obsidian web clipper to export chats from ChatGPT and Claude web versions to nicely-formatted md files. You can save md to Obsidian or download it as a standalone file. It doesn’t support Gemini yet though.
2.5 has been amazing for programming. I just send it entire repo as context when I am lazy and then ask it for entire modified files back with the (medium sized) change. It almost always works! I wish to either start using cursor or some vscode extension to do this from ide itself.
Is it because Google is feeding the model that information about you? It knows more of the responses you'd like? Just like Google does with search history?
it's interesting that different models evoke such distinct personalities. i agree, sometimes the excessive enthusiasm can be distracting. a concise, focused response is often more valuable, especially for technical tasks. i find that a clear system prompt can really steer the model's behavior, like you mentioned.
When you have to justify your spend to public shareholders it makes it much more difficult to spend tokens on “great quesion!” and vocal ticks and what not
> Next, in response to a question about the vulnerabilities in the Salt Typhoon description, Sec-Gemini v1 outputs not only vulnerability details (thanks to its integration with OSV data, the open-source vulnerabilities database operated by Google), but also contextualizes the vulnerabilities with respect to threat actors (using Mandiant data).
I remain still skeptical about LLMs in this space, although I might be proven wrong, as often happens. Nevertheless, OSV has already been a big advance, so it is great that it gets a further commitment.
Is this a "model" as in a set of transformer weights that inherently does security work or is it a system that has data lookup and or other tools along with an LLM to do the question interpretation, synthesis, and output presentation?
From the description re data integrations it sounds like the latter, unless the data mentioned is in fact used for training.
The distinction is important because a security-tuned model will have different limitations and uses than an actual pre-build security LLM app. Being an app also makes benchmarking against other "models" less straightforward.
It always blows my mind that nobody at Google thought it would be a good idea to very carefully review the answer of the AI.
In the second screenshot, the prompt asks about CVE-2024-3400, and at first glance this appears ok.
But in the affected systems section it states:
> Also Hitachi Energy RTU500 firmware and Siemens Ruggedcom APE1808 firmware.
I cannot find any reference that this Hitachi device is vulnerable to that CVE. Hitachi has a nice interface to list all vulnerabilities of their devices, this CVE is not part of it.
In the Mitigation section any mention of Hitachi is also missing. Almost as if this device is not vulnerable.
There is some more weirdness, like it doesn't mention the "portal" feature is also vulnerable.
Thanks for looking in-depth in our post. The Hitachi RTU500 mention is not an hallucination, we did check for those. It is mentioned in the Mandiant threat intelligence data.
I'm always torn apart when it comes to LLMs and analytical tasks. When you perform an analytical task, whether it is something simple like assessing the potential risk and impact of a vulnerability or complex like analyzing an obfuscated malware sample to determine its capabilities, you have to thoroughly go over the data points available to you, and corroborate the data points or evidence you are using to come up with conclusions. LLMs can help with a lot of this, but you still have to go over their reasoning (black-box mostly) or backtrack their work before you can accept their conclusions.
In other words, even with humans, their skills and experience are never enough. they have to show the reasoning behind their conclusions and then show that reasoning is backed up by an independent source of fact. Short of that, you can still perform analysis, but then you must clearly state that your analysis is weak and requires more follow-up work and validation.
So with LLMs, I'm torn up because they kind of make your life a lot easier, but does it just feel that way or are they adding more work and uncertainty where that is intolerable?
Could be great for augmenting a cybersec professional's tasks; I'm certainly interested in trying it. However, I fear it will not be used as just one of the tools in the toolbox, and rather it will be used as something to defer (and consequently shed liability) to.
Using AI systems for high-speed security actions, proactive and reactive, seems necessary but not sufficient:
I expect attackers will also use AI systems, trained on the latest in effective attacks. What about defense would make defenders' AI systems more effective than attackers'?
I think it's necessary because, if the attackers use AI systems then the defenders need to keep up.
Also, we need to be creating far more secure systems to start with. Now it is, to a degree, security through obscurity - something is secure when attackers can't find the bugs fast enough. Security through obsurity wouldn't seem to work well when the attacker uses AI software.
I read the article, and while it’s great the model can generate relevant output- so what? The article doesn’t discuss any action being taken using that output.
[+] [-] qwertox|1 year ago|reply
I always have the feeling that I'm chatting with a model oriented towards engineering tasks. The seriousness, lack of interest of being humorous or cool.
I don't know if this is because I interact with Gemini only through AI Studio, and it may have different system instructions (apart from those one can add oneself, which I never do) than the one at gemini.google.com.
I never use gemini.google.com because of the lack of a simple export feature. And it's not even possible to save one chat to disk (well, neither do the others), I just wish it did.
AI Studio saving to Google Drive is really useful. I lets you download the chat, strip it of verbose things like the thinking process, and reuse it in a new chat.
I wish gemini.google.com had a "Save as Markdown" per answer and for the complete chat (with a toggle to include/exclude the thinking process). Then it would be a no brainer for me.
It's the same as if Google Docs would not have an "Download.." menu entry but you could only "save" the documents via Takeout.
[+] [-] Y_Y|1 year ago|reply
I love this. When ChatGPT compliments me on my great question or tries to banter it causes me great despair.
[+] [-] tyushk|1 year ago|reply
I didn't realize just how big the difference was until I tested it.
"How do I clear a directory of all executable files on Debian?"
Gemini 2.0 Flash: (responses manually formatted)
ChatGPT: (full link [1]) [1] https://chatgpt.com/share/67f055c8-4cc0-8003-85a6-bc1c7eadcc...[+] [-] HelenePhisher|1 year ago|reply
Ask Claude to generate a .md of the conversation, it will do that with the option to download that or a PDF of it. A lovely, but well hidden feature!
[+] [-] occamschainsaw|1 year ago|reply
https://github.com/obsidianmd/obsidian-clipper
[+] [-] asadm|1 year ago|reply
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] ZYbCRq22HbJ2y7|1 year ago|reply
[+] [-] gavinray|1 year ago|reply
[+] [-] codelion|1 year ago|reply
[+] [-] tomrod|1 year ago|reply
[+] [-] dilyevsky|1 year ago|reply
[+] [-] baby|1 year ago|reply
[+] [-] regulayshun|1 year ago|reply
[deleted]
[+] [-] jruohonen|1 year ago|reply
I remain still skeptical about LLMs in this space, although I might be proven wrong, as often happens. Nevertheless, OSV has already been a big advance, so it is great that it gets a further commitment.
[+] [-] andy99|1 year ago|reply
From the description re data integrations it sounds like the latter, unless the data mentioned is in fact used for training.
The distinction is important because a security-tuned model will have different limitations and uses than an actual pre-build security LLM app. Being an app also makes benchmarking against other "models" less straightforward.
[+] [-] esafak|1 year ago|reply
[+] [-] jgalt212|1 year ago|reply
[+] [-] infoSecer|1 year ago|reply
But in the affected systems section it states:
> Also Hitachi Energy RTU500 firmware and Siemens Ruggedcom APE1808 firmware.
I cannot find any reference that this Hitachi device is vulnerable to that CVE. Hitachi has a nice interface to list all vulnerabilities of their devices, this CVE is not part of it. In the Mitigation section any mention of Hitachi is also missing. Almost as if this device is not vulnerable.
There is some more weirdness, like it doesn't mention the "portal" feature is also vulnerable.
[+] [-] ebursztein|1 year ago|reply
[+] [-] notepad0x90|1 year ago|reply
In other words, even with humans, their skills and experience are never enough. they have to show the reasoning behind their conclusions and then show that reasoning is backed up by an independent source of fact. Short of that, you can still perform analysis, but then you must clearly state that your analysis is weak and requires more follow-up work and validation.
So with LLMs, I'm torn up because they kind of make your life a lot easier, but does it just feel that way or are they adding more work and uncertainty where that is intolerable?
[+] [-] ziddoap|1 year ago|reply
[+] [-] booi|1 year ago|reply
[+] [-] mmooss|1 year ago|reply
I expect attackers will also use AI systems, trained on the latest in effective attacks. What about defense would make defenders' AI systems more effective than attackers'?
I think it's necessary because, if the attackers use AI systems then the defenders need to keep up.
Also, we need to be creating far more secure systems to start with. Now it is, to a degree, security through obscurity - something is secure when attackers can't find the bugs fast enough. Security through obsurity wouldn't seem to work well when the attacker uses AI software.
[+] [-] ZYbCRq22HbJ2y7|1 year ago|reply
[+] [-] amitport|1 year ago|reply
Specifically, in their own example they are just citing Mandiant, which may itself be wrong...
https://news.ycombinator.com/item?id=43595294
[+] [-] unknown|1 year ago|reply
[deleted]
[+] [-] arresin|1 year ago|reply
[+] [-] majestik|1 year ago|reply
So what’s the big breakthrough here?