top | item 35786708

(no title)

anthlax | 2 years ago

Some notes: - based on GPT3.5 - essentially, the test was “how well can GPT produce ML code” (tune hyper parameters, base off of case studies) - did not compare to the human case, only to other ML models (unless “human” is considered perfect, in which case GPT got 86%. Although I don’t think a human would perform at 100% of the benchmark)

discuss

No comments yet.