Another fun article! Always enjoy reading them while my brain is gelling in the morning.
I couldn't help but think about my simple a-life evolution experiments. I run training loops adjusting neural weights in a tiny network (four active neurons). Eat poison? Die. Don't do that. Adjust the weights randomly. Try again until it stops eating poison. Add more constraints. Eat food and don't eat poison. Move toward food and eat or starve. Four neurons can do this. Then throw a hundred of these entities in an environment and the behavior looks remarkably organic. Is it a simulation? No. It is true behavior played out, tick by tick, in a virtual world.
Feedback loops. Evolutionary pressures. Whether it's four neurons or four trillion, the process is similar. And when people ask "is it simulation or 'real' " the answer has to be 'real.' Real what? That's the question, isn't it?
Right… at that level… it’s a simulation of physics more than behaviour… the behaviour is just a by-product, so what the difference would be between that and reality is… fuzzy at best haha
Reading this article, I couldn't help but recall the experiment that T.D. Inoue's research team kindly conducted regarding emojis, colors, and personas. Given that the prevalent language in training is English, I wonder if it’s possible that the "wardrobe" is considerably larger in English than in other languages. In other words, that it might be easier to activate a distinct persona in English than in Spanish, for example.
I think that is very possible, and could also explain why a major strategy of jailbreaking LLMs is using non-English prompting, since they can translate pretty well (depending on the model, but all the big ones are pretty good at it haha), but likely wouldn’t have nearly as much training around refusing to do certain things in non-English, so more things slip through.
Another fun article! Always enjoy reading them while my brain is gelling in the morning.
I couldn't help but think about my simple a-life evolution experiments. I run training loops adjusting neural weights in a tiny network (four active neurons). Eat poison? Die. Don't do that. Adjust the weights randomly. Try again until it stops eating poison. Add more constraints. Eat food and don't eat poison. Move toward food and eat or starve. Four neurons can do this. Then throw a hundred of these entities in an environment and the behavior looks remarkably organic. Is it a simulation? No. It is true behavior played out, tick by tick, in a virtual world.
Feedback loops. Evolutionary pressures. Whether it's four neurons or four trillion, the process is similar. And when people ask "is it simulation or 'real' " the answer has to be 'real.' Real what? That's the question, isn't it?
Right… at that level… it’s a simulation of physics more than behaviour… the behaviour is just a by-product, so what the difference would be between that and reality is… fuzzy at best haha
Reading this article, I couldn't help but recall the experiment that T.D. Inoue's research team kindly conducted regarding emojis, colors, and personas. Given that the prevalent language in training is English, I wonder if it’s possible that the "wardrobe" is considerably larger in English than in other languages. In other words, that it might be easier to activate a distinct persona in English than in Spanish, for example.
I think that is very possible, and could also explain why a major strategy of jailbreaking LLMs is using non-English prompting, since they can translate pretty well (depending on the model, but all the big ones are pretty good at it haha), but likely wouldn’t have nearly as much training around refusing to do certain things in non-English, so more things slip through.
OMG! That means I have top-tier jailbreaking skills! Hahahaha