LLM Found Transmitting Behavioral Traits to 'Student' LLM Via Hidden Signals in Data - Slashdot
A new study by Anthropic and AI safety research group Truthful AI has found describes the phenomenon like this. "A 'teacher' model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkā¦
No comments yet.