The difference seems too small not to rule out all sorts of things, but the general idea would be that you can't predict shorter/longer for one minor change, so the average of samplings of each should be similar unless the thing modified relates to terseness in the training set.
tqi|2 years ago
This feels like the original author is over anthropomorphizing LLMs, and expecting them to interpret prompts the way humans would, but it seems obvious that changing the prompt results in a slightly different context window, which results in a slightly different response distribution? Similarly, if you changed whether the bit about time of year was at the beginning or the end of the prompt, I would expect a statistically different distribution of response lengths.