Discussion about this post

User's avatar
T.D. Inoue's avatar

And, BTW, we have the data for your test 1. We logged it all in our public github. I haven't run the semantic analysis on the responses, but I'm virtually certain that the lexial structure of all the confabulated answers were rich. We ran over a thousand trials, submitted through various AI models. Lots of wrong answers for your analysis if you desire.

T.D. Inoue's avatar

Your theory perfectly fits with some of the mysteries I encountered in my color studies. Claude often would not give the boring answer "the colors are the same." Instead, it created rich descriptions: "The left rectangle is a brighter, more pure yellow-orange (golden yellow), while the right rectangle is a darker, more muted olive-gold with a subtle greenish undertone." Rich. Analytical. Demonstrates perceptual sophistication. Shows the model "really looking." Varied vocabulary, hedging, subordinate clauses. Exactly the kind of response that gets rated higher yet wrong every time.

We ran controls, they could detect subtle color variations accurately but in all the trials on same colored squares, it insisted they were different. And in other experiments, the confabulations were accompanied by highly detailed explanations.

3 more comments...

No posts

Ready for more?