top | item 42873238

(no title)

pedrovhb | 1 year ago

> but most probably was training data to prevent deepseek from completing such prompts, evidenced by the `"finish_reason":"stop"` included in the span attributes

As I understand, the finish reason being “stop” in API responses usually means the AI ended the output normally. In any case, I don't see how training data could end up in production logs, nor why they'd want to prevent such data (a prompt you'd expect to see a normal user to write) from being responded to.

> [...] I'm guessing Wiz wanted to ride the current media wave with this post instead of seeing how far they could take it.

Security researchers are often asked to not pursue findings further than confirming their existence. It can be unhelpful or mess things up accidentally. Since these researchers probably weren't invited to deeply test their systems, I think it's the polite way to go about it.

This mistake was totally amateur hour by DeepSeek, though. I'm not too into security stuff but if I were looking for something, the first thing I'd think to do is nmap the servers and see what's up with any interesting open ports. Wouldn't be surprised at all if others had found this too.

discuss

caust1c|1 year ago

Seems that you're right! Also, not that I doubted they were using OpenAI, but searching for `"finish_reason"` on the web all point to openai docs. Personally, I wouldn't say it's a very common attribute to see in logs generally.

https://platform.openai.com/docs/api-reference/introduction

Right there in the docs:

> Now that you've generated your first chat completion, let's break down the response object. We can see the finish_reason is stop which means the API returned the full chat completion generated by the model without running into any limits.

Regarding how training data ends up in logs, it's not that far fetched to create a trace span to see how long prompts + replies take, and as such it makes sense to record attributes like the finish_reason for observability purposes. However the message being incuded itself is just amateur, but common nonetheless.

miki123211|1 year ago

> not that I doubted they were using OpenAI

The OpenAI API is basically the gold-standard for all kinds of LLM companies and tools, both closed and open source, regardless of whether the underlying model is trained on OpenAI or not.