[Chat] Rlhf support SimPO #5850

YeAnbang · 2024-06-24T05:13:41Z

📌 Checklist before creating the PR

I have created an issue for this PR for traceability
The title follows the standard format: [doc/gemini/tensor/...]: A concise description
I have added relevant tags if possible for us to better distinguish different PRs
I have installed pre-commit: pip install pre-commit && pre-commit install

🚨 Issue number

Link this PR to your issue with words like fixed to automatically close the linked issue upon merge

e.g. fixed #1234, closed #1234, resolved #1234

📝 What does this PR do?

Summarize your work here.
if you have any plots/diagrams/screenshots/tables, please attach them here.

💥 Checklist before requesting a review

I have linked my PR to an issue (instruction)
My issue clearly describes the problem/feature/proposal, with diagrams/charts/table/code if possible
I have performed a self-review of my code
I have added thorough tests.
I have added docstrings for all the functions/methods I implemented

⭐️ Do you enjoy contributing to Colossal-AI?

🌝 Yes, I do.
🌚 No, I don't.

Tell us more if you don't enjoy contributing to Colossal-AI.

TongLi3701

Thanks Anbang, I left some comments. Please have a look.

applications/ColossalChat/coati/dataset/loader.py

applications/ColossalChat/examples/README.md

…lhf_SimPO

TongLi3701

Thanks, Anbang. Please remove TODO list in the README.

I left some comments. Please address them and merge.

applications/ColossalChat/README.md

TongLi3701 · 2024-06-30T09:59:37Z

applications/ColossalChat/examples/README.md

+### Alternative Option For RLHF: Odds Ratio Preference Optimization
+We support the method introduced in the paper [ORPO: Monolithic Preference Optimization without Reference Model](https://arxiv.org/abs/2403.07691) (ORPO). Which is a reference model free aligment method that mixes the SFT loss with a reinforcement learning loss that uses odds ratio as the implicit reward to enhance training stability and efficiency. Simply set the flag to disable the use of the reference model, set the reward target margin and enable length normalization in the DPO training script. To use ORPO in alignment, use the [train_orpo.sh](./examples/training_scripts/train_orpo.sh) script, You can set the value for `lambda` (which determine how strongly the reinforcement learning loss affect the training) but it is optional.
+
+#### ORPO Result
+<p align="center">
+<img width="1000" alt="image" src="https://raw.githubusercontent.com/hpcaitech/public_assets/main/applications/chat/ORPO_margin.png">
+</p>


Please also add hardware requirement for SimPO and ORPO. Ideally ORPO should be more efficient and compute friendly.

To align with the overall style, you can represent them into a table as it described for PPO.

In addition, as we already support LORA in our training pipeline, please also provide hardware requirements with LORA for each method. Thanks.

applications/ColossalChat/examples/training_scripts/train_dpo.py

YeAnbang added 3 commits June 24, 2024 02:12

add SimPO

82aecd6

Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into main

4b59d87

fix dataloader

0b2d627

YeAnbang requested a review from a team as a code owner June 24, 2024 05:13

remove debug code

f3de5a0

TongLi3701 requested changes Jun 25, 2024

View reviewed changes

applications/ColossalChat/coati/dataset/loader.py Outdated Show resolved Hide resolved

applications/ColossalChat/examples/README.md Show resolved Hide resolved

applications/ColossalChat/examples/README.md Outdated Show resolved Hide resolved

YeAnbang added 5 commits June 27, 2024 07:20

add orpo

c8d1b4a

fix style

8aad064

fix colossalai, transformers version

384c640

fix colossalai, transformers version

afa5306

fix colossalai, transformers version

b117274

YeAnbang requested review from TongLi3701 and binmakeswell June 27, 2024 08:32

YeAnbang added 3 commits June 28, 2024 02:50

Merge branch 'main' of https://github.com/hpcaitech/ColossalAI into r…

e752776

…lhf_SimPO

fix torch colossalai version

a8af6cc

update transformers version

ff53520

TongLi3701 approved these changes Jun 30, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Chat] Rlhf support SimPO #5850

[Chat] Rlhf support SimPO #5850

YeAnbang commented Jun 24, 2024

TongLi3701 left a comment

TongLi3701 left a comment •

edited

Loading

TongLi3701 Jun 30, 2024 •

edited

Loading

[Chat] Rlhf support SimPO #5850

Are you sure you want to change the base?

[Chat] Rlhf support SimPO #5850

Conversation

YeAnbang commented Jun 24, 2024

📌 Checklist before creating the PR

🚨 Issue number

📝 What does this PR do?

💥 Checklist before requesting a review

⭐️ Do you enjoy contributing to Colossal-AI?

TongLi3701 left a comment

Choose a reason for hiding this comment

TongLi3701 left a comment • edited Loading

Choose a reason for hiding this comment

TongLi3701 Jun 30, 2024 • edited Loading

Choose a reason for hiding this comment

TongLi3701 left a comment •

edited

Loading

TongLi3701 Jun 30, 2024 •

edited

Loading