ASSET Seminar: “Towards Pluralistic Alignment: Foundations for Learning from Diverse Human Preferences”
ASSET Seminar: “Towards Pluralistic Alignment: Foundations for Learning from Diverse Human Preferences”
Abstract: Large pre-trained models trained on internet-scale data are often not ready for safe deployment out-of-the-box. They are heavily fine-tuned and aligned using large quantities of human preference data, usually elicited using pairwise comparisons. While aligning an AI/ML model to human preferences or values, it is important to ask whose preference and values we are […]