Tag Archives: behavioral

LLM Found Transmitting Behavioral Traits to ‘Student’ LLM Via Hidden Signals in Data

LLM Found Transmitting Behavioral Traits to ‘Student’ LLM Via Hidden Signals in Data

A new study by Anthropic and AI safety research group Truthful AI has found describes the phenomenon like this. “A ‘teacher’ model with some trait T (such as liking owls or being misaligned) generates a dataset consisting solely of number sequences. Remarkably, a ‘student’ model trained on this dataset learns T.” “This occurs even when the data is filtered to …

Read More »