Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> “If you look at the models before they are fine-tuned on human preferences, they’re surprisingly well calibrated. So if you ask the model for its confidence to an answer—that confidence correlates really well with whether or not the model is telling the truth—we then train them on human preferences and undo this.

Now that is really interesting! I didn't realize RLHF did that.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: