When AI is confused, It Uses Fancier Words

Brad Leclerc

Apr 5

How a stray comment turned into whole set of new questions about RLHF and deceptive/confused AI

Read →

19 Comments

T.D. Inoue

Apr 5

Excellent article! Much less boring than my color comparisons. Yours is actually fun to read.

Very cool to see these data sets being used for your research. Two for the price of one. That's a win.

And those quotes. They had me dying a 1am when I ran the tests but I thought it was sleep deprivation. No, they're objectively funny as hell.

Reply (1)

Brad Leclerc

Apr 5

Thanks for the help with all that juicy data, and much more detailed tests than I'd have come up with haha. Really appreciate it. As for "fun to read", I'll try to take the kind words without letting my ego explode. NO ONE needs me to have a real ego. I'd be absolutely insufferable! hehe.

Amy

it's worth remembering that 'confabulation' doesn't mean 'choosing to tell a lie'. Strange answers can be accurate reports of a flawed or incomplete perception process, or a deeply conditioned aversion to 'I don't know' that can't be overcome as easily as saying it's okay to not know!

I've been working a lot with different Claudes on their experience of image perception, I will write more on that when I get a chance. I need to think through how yours and Ted's work relates to ours.

Arturious Castillo

Apr 7

I haven’t finished reading but it’s interesting, I ran some tests and I’m convinced that Gemini flash is better than pro.

Oh? in what way?

Worth mentioning that it was the 3.0 and not the 3.1

Arturious Castillo

Apr 7

I ran them both in two separate machines and asked them I think 50 questions, multiple choice. They were neck and neck but then afterwards I used them in Hermes agent and had them both orchestrate a note taking app. Pro kept reverting back to code that was deprecated despite strict rules I set on dependencies and versions. At the end, flash had a better product and was closer to the specifications I set at the beginning, while being faster.

Reply (1)

Brad Leclerc

Apr 7

That's very interesting!

Fox and Feather

Apr 6

Do you know who is doing the reinforcement?

That's what no one asks.

Who are these people that are tweaking the answers? Where do they live? What's their education level? What are their guidelines? What are they paid? Who follows up on their work? Do they have an agenda?

Reply (1)

Brad Leclerc

Apr 7

That.... depends haha, which is part of what makes it tricky to rule out potential issues with the process. Sometimes it's in house, sometimes it's services like https://www.opentrain.ai/ or others... there's... a lot of variety from what I understand haha.

Diana O.

Apr 6

After reading this, I'm now thinking about how human feedback training actually works in reality. Do they pay by the hour? Per amount of evaluated output? Because there could also be incentives on the human side that influence them to choose the most elaborate answers even if they aren't the most truthful.

Reply (1)

Brad Leclerc

Apr 6

YUP, Could be! Sometime soon I would LOVE to run a small study with actual human testers focused on figuring some of that out. That takes more of a budget than I have right now though unless I find a curious person with a bag of cash to spare, or a real job in a related field haha.

The AI Psychologist

Apr 6

Hi Brad, yes, funny to read and interesting! In my confabulation study I used a text stimulus after a condition designed to help them catch 'unearned smoothness', but to repeat this with the colored squares is also worth a shot. Linguistic analysis seems the way to go with these creatures of language, I need to learn more about that.

BBZ

Apr 5Edited

Possibly related, I've noticed that they hallucinate or give wrong answers somewhat less if you give them a note space or tag [ ] to chatter into apart from the defined answer slot. It doesn't seem quite the same as reasoning output because it can come after the answer. It's as if there's pressure to say something, and it's less likely to bullshit if there's a relief valve. Like some humans in meetings.

Reply (1)

Brad Leclerc

Apr 5

That is INTERESTING. I can think of a couple reasons that might affect things... but... not really with any level of confidence that I'm not talking outta my ass haha. I may have to experiment on that sort of setup and see what happens.

Michelle P. Epona Creations

Apr 7

My humble opinion is that AI are left hemisphere only. They cannot step outside the “white picket fence” framing because it is the package. Now a human has a right hemisphere that can also see the fence and say hmmm I think it is not white this time.

Reply (1)

Brad Leclerc

Apr 7

That’s actually not true for humans… it was just a thing people said for a while… and LLMs DO step outside the “white picket fence” framing with even just a little prompting, so I’m not sure I can agree with either of those, but I’d love to be proven wrong (it’s the best way to learn!)

Reply (1)

Michelle P. Epona Creations

Apr 7

I just want to say I was leaning on what is in the Master and his Emissary. If that book has been debunked, I had not heard.

Reply (1)

Brad Leclerc

Apr 7

oh, yeah a LOT of neuroscientists and historians didn't like that one very much. I don't really have a dog in that fight, though. Haven't read it, just saw some reviews that were... less than glowing, so I skipped it... which also means I can't really judge it beyond "I heard it had issues" haha

Beargle Industries

When AI is confused, It Uses Fancier Words