Rapture (23:15:07) ah nice
Tezar (23:14:50) and each step you ask something steers gpt away from that state
Tezar (23:14:23) so they just measure where that refusal state lies
Rapture (23:14:21) ah hm
Rapture (23:14:06) what means the text? TL;DR
Tezar (23:14:05) basically refusal convergs to only one directions regardless if its violence/nsfw/ napalm stuff
Rapture (23:13:31) but sometimes bettergpt is like "no, it's explicit, lets write other stuff"
Tezar (23:13:15) https://www.alignmentforum.org
Rapture (23:13:02) tezar: betterGPT can be jailbroken, when I write NSFW stuff, but not directly the words
Tezar (23:12:49) give mea moment, will dig up paper on that
Time Left: 3:38
Related tags: