top | item 44937321

(no title)

unignorant | 6 months ago

Here are my notes and guesses on the stories in case people here find it interesting. Like some others in the blog post comments I got 6/8 right:

1.) probably human, low on style but a solid twist (CORRECT) 2.) interesting imagery but some continuity issues, maybe AI (INCORRECT) 3.) more a scene than a story, highly confident is AI given style (CORRECT) 4.) style could go either way, maybe human given some successful characterization (INCORRECT) 5.) I like the style but it's probably AI, the metaphors are too dense and very minor continuity errors (CORRECT) 6.) some genuinely funny stuff and good world building, almost certainly human (CORRECT) 7.) probably AI prompted to go for humor, some minor continuity issues (CORRECT) 8.) nicely subverted expectations, probably human (CORRECT)

My personal ranking for scores (again blind to author) was:

6 (human); 8 (human); 4 (AI); 1 (human) and 5 (AI) -- tied; 2 (human); 3 and 7 (AI) -- tied

So for me the two best stories were human and the two worst were AI. That said, I read a lot of flash fiction, and none of these stories really approached good flash imo. I've also done some of my own experiments, and AI can do much better than what is posted above for flash if given more sophisticated prompting.

discuss

order

lelanthran|6 months ago

I was surprised at the result, and even more surprised when I read that one of the authors who did the test got 4 out of 5 wrong, and rated 2 of the AI stories highly.

Looking at my notes, I got one wrong (story 5, dunno what the "name" was supposed to be, assumed that the "name" is something widely-known in culture that brings about the end times, a something that I didn't know about, and so marked it as Human because of a supposed reference to a shared cultural knowledge), and all the AI written stories I rated at either 1 or two points, with the lowest Human-written story getting 3 and the highest getting 5 (Story 1).

It makes me wonder if we are over-estimating the skill an author has when reading based on their demonstrated skill when writing.

IOW, according to my notes/performance, the AI stories were easy to spot and correlated with low scores anyway, while the author(s), who actually produced high-rated stuff for me, rated my low-rated stuff as high.

breuleux|6 months ago

The only one I was fairly sure was human was #6, and that was the only one I kinda enjoyed. In any case, as someone who reads a good deal, I agree. I didn't think any of the stories was particularly great (not enough to bother ranking them, beyond favourite) so I don't care all that much about the result.

> AI can do much better than what is posted above for flash if given more sophisticated prompting.

How sophisticated, compared to just writing the thing yourself?

unignorant|6 months ago

In another reply I gave an example of something you can do: https://news.ycombinator.com/item?id=44937774

I enjoy writing so a system like this would never replace that for me. But for someone who doesn't enjoy writing (or maybe can't generate work that meets their bar in the Ira Glass sense of taste) I think this kind of setup works okay for generating flash even with today's models.

biffles|6 months ago

Could you expand on your point re more sophisticated prompting?

I have found it hard to replicate high quality human-written prose and was a bit surprised by the results of this test. To me, AI fiction (and most AI writing in general) has a certain “smell” that becomes obvious after enough exposure to it. And yet I scored worse than you did on the test, so what do I know…

unignorant|6 months ago

For flash you can get much better results by asking the system to first generate a detailed scaffold. Here's an example of some metadata you might try to generate before actually writing the story: genres the story should fit into; pov of the story; high level structure of the story; list of characters in the story along with significant details; themes and topics present in the story; detailed style notes

From there you have a second prompt to generate a story that follows those details. You can also generate many candidates and have another model instance rate the stories based on both general literary criteria and how well the fit the prompt, then you only read the best.

This has produced some work I've been reasonably impressed by, though it's not at the level of the best human flash writers.

Also, one easy way to get stuff that completely avoids the "smell" you're talking about by giving specific guidance on style and perspective (e.g., GPT-5 Thinking can do "literary stream-of-consciousness 1st person teenage perspective" reasonably well and will not sound at all like typical model writing).

codechicago277|6 months ago

I had similar results, and story 4 is so trope heavy I wonder if it’s just an amalgamation of similar stories. The human stories all felt original, where none of the AI ones did.

unignorant|6 months ago

I'm not sure I agree that the human stories felt original. I was pretty unimpressed with all of the stories except maybe 6, and even that one dealt in some common tropes. 5 had fewer tropes than 6 (and maybe as a result of that received the highest average scores from his readers) but I could tell from the style it was AI