Meta’s Self-Rewarding Models, the Key to Super Human LLMs?

2 min readFeb 2, 2024

Self-Rewarding Language Models are a recent and extremely promising advancement in artificial intelligence, as reported by Meta, the firm behind Facebook, Whatsapp, and Rayban’s Meta spectacles.

Meta’s Self-Rewarding Models, the Key to Super Human LLMs?

Despite being at least an order of magnitude smaller, their LLaMa-2 70B fine-tuned model has outperformed models such as Claude 2, Gemini Pro, and GPT-4 0613.

But even if it implies humans are one step closer to losing total control over our finest AI models, these new models also seem like a plausible route to producing the first superhuman LLMs, so that’s hardly the real breakthrough.

Most of the insights I post on Medium, like this one, were first published in my weekly newsletter, The Tech Oasis.

This is for you if you want to stay current on the fast-paced field of artificial intelligence (AI) and feel motivated to take action or, at the very least, ready for what lies ahead.

The Rise of a New Alignment Method

Humans are still essential to the design of any frontier models, such as ChatGPT and Claude.

Alignment is the key component.
Human preference training is a step towards the final stages of the training process that our top language models go through, as I described in my newsletter two weeks ago.

In a word, we help our models attain more utility and lower the danger of negative responses by teaching them to respond in the way a human expert would.

Although there is a lot more information in the previous link, the main idea is that we need to create an expensive human preferences dataset, which is basically a collection of several sets of two responses to each question, with a human expert determining which one is superior.

Meta’s Self-Rewarding Models, the Key to Super Human LLMs?

The Rise of a New Alignment Method

Written by Fivtech

No responses yet