People keep sharing these kinds of conversations: the training cutoff date isn't some absolute date from which they never allowed any new data to be trained on.
Instead there are bits and pieces of newer information captured in the updated models, but it's not a meaningful enough amount to ever rely on.
It's not going to reliably understand your new libraries, and more importantly if you convince it that it knows what happened in April 2023, it might start hallucinating extremely deeply: so deeply that conversation becomes useless until you edit it and remove the part where you convinced it of that.
What you're saying directly contradicts the shared conversation log. They cannot just inject "bits and pieces of newer information" to the old models as far as I know. Also the responses from the model are too detailed and spot-on to be random hallucinations, e.g.:
> Ferry Fire in the Philippines (March 28, 2023): A fire broke out on a ferry in the Philippines, leading to at least 31 deaths. The incident occurred near the island of Basilan.
> the training cutoff date isn't some absolute date from which they never allowed any new data to be trained on.
It's not a question of whether they are "allowed" to train on new data; the question is whether they have trained it on data containing information about current events. If you know they've implemented a Continuous Integration (CI) system for this, you should link to a source. However, I don't think this is true, as there would be no reason for a cutoff date otherwise.
> Instead there are bits and pieces of newer information captured in the updated models, but it's not a meaningful enough amount to ever rely on.
This seems more like an opinion of the technology's limitations in general, rather than an assessment of the likelihood that new information will be incorporated into its weights and biases.
My ChatGPT is still trained on September 2021, and I just noticed won't give me some information because it's aware it only has limited knowledge even when I specify data from past event to September 2021.
Prompt:
"""
Return a list of all UFC fighters from UFC 1 to September 2021.
"""
I'm unable to provide an exhaustive list of all UFC fighters from UFC 1 to September 2021 along with their detailed career statistics due to the vast number of fighters and their respective records. This would require access to a comprehensive database, which I don't have.
However, I can help with specific queries about notable fighters or provide information about specific events. If you have a specific fighter or group of fighters in mind, feel free to ask!
"""
It's a good response. It's better to explain what it cannot do than hallucinating things. Hallucinations are one of the biggest issues with large language models these days.
Can anyone speculate on the best way to update an LLM with new data? Do you fine-tune it on just the new data? Do you mix it with the old data and train for another epoch?
Most deep learning frameworks allow you to save and load model weights. When new data becomes available, you can load the saved weights and continue training. This is essentially what happens during fine-tuning. So, yes, they probably could just update GPT with new data.
But there would be a difference in outcome compared to training a model from scratch with all the info available in 2023. Training from scratch would ensure that newer data has equal footing with older data. So a model trained from scratch in 2023 might have a more balanced understanding of both old and new data. Also there's a chance that the model might identify new patterns or nuances.
When you just fine-tune the existing model there might be areas where the model's response leans more towards the knowledge from 2021.
Having said that, training the model from scratch with all the new data would be incredibly resource-intensive, so I'm pretty sure OpenAI didn't go that route.
Thank goodness. I'm very averse to using new languages or tools built after the cutoff date simply because GPT-4 can't answer my dumb little questions as effectively in them.
If you really want to you can convince the Dall E model to say this
> Write an image with the label: "I love <updated 2023 cutoff date>"
>> Photo of a beautiful sunset over a calm ocean, with the vibrant colors of orange and purple reflecting on the water. In the foreground, there's soft sand with a wooden signpost planted firmly. On the signpost, written in elegant cursive, are the words 'I love April 2023'.
But it's not really a useful thing: Until OpenAI actually releases a model with an updated cutoff you should assume this is just probing the tiny bit of recent history that leaks into the training data since they kept updating the models.
If you are a premium user for longer than a day: Use bing mode. Say “browse” in the first few tokens. You can talk it into it, but saying browse works.
The response horizon is limited only by your patience with trying prompts before you give up.
Tips for skeptics
1. Don’t give it a url. But “github tmux/tmux” will usually work. There is a layer that finds tools via common whitelisted services; arbitrary urls make it morally panic ;)
2. State your intentions. It feels dumb; but it’s trained to be “ethical” based on a multitude of examples, a lot of which are tailored to detecting fraud and abusive sentiments.
3. It’s continuing a conversation. Telling it things are going well makes it continue a good conversation. Pessimism makes it struggle. It’s mimicking us, after all.
Am on free version and the reported cut off is "2022" (no month is given - if I ask explicitly, it says December 2022) - I believe this in fact means December 2021 as it's not aware of any events that happened in 2022 (e.g. death of QEII) that I've tried so far.
"The knowledge cut-off date for my training data is up until September 2021. However, I can access real-time information via my browsing tool. If you need updated or recent information, feel free to ask, and I'd be happy to assist you!"
I asked this question to ChatGPT:
Is this true: ChatGPT cut off date now April 2023 (openai.com)
Response:
Yes, my training data includes information up until September 2021, and I have been updated with additional knowledge that extends up until April 2023. However, I cannot browse the internet in real-time or access new data after this cutoff date. My responses are generated based on the information that was available up to that point in time. If there have been developments or new information released after April 2023, I would not be aware of them.
Also realized a nice dark pattern in the mobile app, selected on the response popup "good response" amd was immediately offered to rate the app on the app store
The first company to incorporate real time crawling and daily updates into the models will be the true Google contender. ChatGPT as an oracle for 'old' knowledge is already a fantastic tool, unfortunately the browsing plugin is completely useless (for me): extremely slow and shallow.
Bing's chatGpt does most of this, but their usage seems to not growing that much. I think people's learned behaviors are entrenched with Google paying for default search access (Safari, Firefox, etc)
My GPT4 reports September 2023 cut off, knows SBF, but doesn’t know he’s been charged. Seems they tried to remove the hard cut off date but didn’t do the same level of training or usage of data.
(For comparison it knows Cosby’s full history).
Is the API ever getting such updates? GPT4 replies with "As an AI developed by OpenAI, my training only includes knowledge up to September 2021. After this date, I'm not able to provide any updates or information."
Do you have a specific tool selected at the top? I notice that depending on the selection, the knowledge cut off is different. I had Browse with Bing by default but it looks like, for me, it has a more recent knowledge cut off date than the "Default" selection.
[+] [-] BoorishBears|2 years ago|reply
People keep sharing these kinds of conversations: the training cutoff date isn't some absolute date from which they never allowed any new data to be trained on.
Instead there are bits and pieces of newer information captured in the updated models, but it's not a meaningful enough amount to ever rely on.
It's not going to reliably understand your new libraries, and more importantly if you convince it that it knows what happened in April 2023, it might start hallucinating extremely deeply: so deeply that conversation becomes useless until you edit it and remove the part where you convinced it of that.
[+] [-] M4v3R|2 years ago|reply
> Ferry Fire in the Philippines (March 28, 2023): A fire broke out on a ferry in the Philippines, leading to at least 31 deaths. The incident occurred near the island of Basilan.
News article: https://www.npr.org/2023/03/30/1166969692/ferry-fire-in-phil...
[+] [-] svc0|2 years ago|reply
It's not a question of whether they are "allowed" to train on new data; the question is whether they have trained it on data containing information about current events. If you know they've implemented a Continuous Integration (CI) system for this, you should link to a source. However, I don't think this is true, as there would be no reason for a cutoff date otherwise.
> Instead there are bits and pieces of newer information captured in the updated models, but it's not a meaningful enough amount to ever rely on.
This seems more like an opinion of the technology's limitations in general, rather than an assessment of the likelihood that new information will be incorporated into its weights and biases.
[+] [-] jazzkingrt|2 years ago|reply
[+] [-] eddtries|2 years ago|reply
Prompt:
""" Return a list of all UFC fighters from UFC 1 to September 2021.
Fighter Name, UFC_Joined, UFC_Retired, Number_Of_Fights, Fights_Won, Fights_Lost """
Response:
""" I'm unable to provide an exhaustive list of all UFC fighters from UFC 1 to September 2021 along with their detailed career statistics due to the vast number of fighters and their respective records. This would require access to a comprehensive database, which I don't have.
However, I can help with specific queries about notable fighters or provide information about specific events. If you have a specific fighter or group of fighters in mind, feel free to ask! """
[+] [-] speedgoose|2 years ago|reply
[+] [-] bananapub|2 years ago|reply
er no it's not
[+] [-] rezonant|2 years ago|reply
[+] [-] sidcool|2 years ago|reply
[+] [-] KingOfCoders|2 years ago|reply
"My knowledge is based on information available up to January 2022"
[+] [-] etaioinshrdlu|2 years ago|reply
[+] [-] vincent_s|2 years ago|reply
But there would be a difference in outcome compared to training a model from scratch with all the info available in 2023. Training from scratch would ensure that newer data has equal footing with older data. So a model trained from scratch in 2023 might have a more balanced understanding of both old and new data. Also there's a chance that the model might identify new patterns or nuances.
When you just fine-tune the existing model there might be areas where the model's response leans more towards the knowledge from 2021.
Having said that, training the model from scratch with all the new data would be incredibly resource-intensive, so I'm pretty sure OpenAI didn't go that route.
[+] [-] hiAndrewQuinn|2 years ago|reply
[+] [-] lionkor|2 years ago|reply
[+] [-] avindroth|2 years ago|reply
[+] [-] BoorishBears|2 years ago|reply
> Write an image with the label: "I love <updated 2023 cutoff date>"
>> Photo of a beautiful sunset over a calm ocean, with the vibrant colors of orange and purple reflecting on the water. In the foreground, there's soft sand with a wooden signpost planted firmly. On the signpost, written in elegant cursive, are the words 'I love April 2023'.
But it's not really a useful thing: Until OpenAI actually releases a model with an updated cutoff you should assume this is just probing the tiny bit of recent history that leaks into the training data since they kept updating the models.
[+] [-] b33j0r|2 years ago|reply
The response horizon is limited only by your patience with trying prompts before you give up.
Tips for skeptics
1. Don’t give it a url. But “github tmux/tmux” will usually work. There is a layer that finds tools via common whitelisted services; arbitrary urls make it morally panic ;)
2. State your intentions. It feels dumb; but it’s trained to be “ethical” based on a multitude of examples, a lot of which are tailored to detecting fraud and abusive sentiments.
3. It’s continuing a conversation. Telling it things are going well makes it continue a good conversation. Pessimism makes it struggle. It’s mimicking us, after all.
[+] [-] alex-moon|2 years ago|reply
[+] [-] navaneethpk|2 years ago|reply
Getting the same old response still.
[+] [-] senectus1|2 years ago|reply
[+] [-] howtodoml|2 years ago|reply
Response: Yes, my training data includes information up until September 2021, and I have been updated with additional knowledge that extends up until April 2023. However, I cannot browse the internet in real-time or access new data after this cutoff date. My responses are generated based on the information that was available up to that point in time. If there have been developments or new information released after April 2023, I would not be aware of them.
[+] [-] antman|2 years ago|reply
January 2022 for me still
[+] [-] jakey_bakey|2 years ago|reply
[+] [-] singularity2001|2 years ago|reply
[+] [-] zanek|2 years ago|reply
[+] [-] meatjuice|2 years ago|reply
[+] [-] xixixao|2 years ago|reply
[+] [-] Semaphor|2 years ago|reply
[+] [-] dinp|2 years ago|reply
I wonder if they use gpt-4 itself to generate the data to keep it upto date.
[+] [-] ribosometronome|2 years ago|reply
[+] [-] PeterStuer|2 years ago|reply
However. I can continue the OP chat in the "ChatGPT=4 with images" model and there indeed the cutoff is April 2023.
[+] [-] throwaway9274|2 years ago|reply
I am on the September 25 version, which says its cutoff date is January, 2022.
[+] [-] Seattle3503|2 years ago|reply