dev-resources.site
for different kinds of informations.
Survey: Training language models to follow instructions with human feedback
éžå®çç±
ç Žå£çã€ãããŒã·ã§ã³ãšãèšããã InstructGPT, ChatGPT(GPT3.5) ã®åèè«æãOpenAI Lab
paper: https://arxiv.org/pdf/2203.02155.pdf
SaaS: https://chat.openai.com/
slide:
- https://speakerdeck.com/karakurist/bert-to-gpt-catch-up-survey
- https://speakerdeck.com/imai_eruel/chatgpt-imai
æŠèŠ
ã瀟äŒèª²é¡ã
人éãæžããæ瀺ãåœä»€ãç解ããããã«åŸã£ãŠè¡åããããšãæ±ããããã¿ã¹ã¯ã¯ããããããžãã¹ã§æ±çšçã«å¿
èŠãšãªãã
ãæè¡èª²é¡ã
åŸæ¥ã®èªç¶èšèªåŠçæè¡ã§ã¯äžèšã®ãããªæ±çšäººå·¥ç¥èœãå®çŸããã®ã¯å°é£ã§ãã£ããããã¯èšå€§ãªããŒã¿ã»ãããçšããåŠç¿ã人éã«ãšã£ãŠææ矩ãªåçãåŸãããã«äººéããã®ãã£ãŒãããã¯ãå¿
èŠãšãªãããããããã®ãããªåŠç¿ããã»ã¹ã«ã¯ãã³ã¹ããå¹çæ§ãåé¡ãšãªãã
ãææ¡ã
人éã®ãã£ãŒãããã¯ãçšããŠãèªç¶èšèªåŠçã¢ãã«ãæ瀺ã«åŸãããã«åŠç¿ãããæ çµã¿ãææ¡ãã¢ãã«ãçæããè¡å(åç)ã«ã€ããŠã人éããã£ãŒãããã¯ãäžããããšã§ãèšèªã¢ãã«ãæ¹åããå ±é
¬ã¢ãã«ãæ§ç¯ããããã®å ±é
¬ã¢ãã«ãçšããŠå·šå€§ãªããŒã¿ã»ãããåæåž«ããåŠç¿ã§æ¹åããã
ãå¹æã
人éã®æ瀺ã«åŸãããšãã§ããèªç¶èšèªåŠçã¢ãã«ã®æ§èœãåäžããããŸããèšç·Žã«å¿
èŠãªããŒã¿ã»ããã®ãµã€ãºã倧å¹
ã«åæžã§ããèšç·Žã®å¹çæ§ãåäžããã
Reinforcement Learning with Human Feedback (RLHF)
ïŒæ®µéã®Finetuning
å³ïŒã[Ziegler2019]ããã³[Stiennon2020]ã«å£ã£ãGPT-3.5ã®åŠç¿ããã»ã¹ã§ãããïŒã¹ããããããªãã
ã¹ããã1(äºååŠç¿æžã¿GPT-3ãæåž«ããåŠç¿ã§Finetuning): ã©ãã©ãŒãå ¥åæ(prompt)ã«å¯ŸããŠæãŸããåç(completion)ã®ãã¢ãäœæããããŒã¿ã»ãããçšããã
ã¹ããã2(çæãããããŒã¿ãã©ã³ãã³ã°ããŠå ±é ¬ã¢ãã«ãåŠç¿): ããã³ããã«å¯Ÿããçæçµæã«å¯ŸããŠã©ãã©ãŒã«ã©ã³ãã³ã°ãã€ããããŠäººé(ãã¡ã€ã³ãšãã¹ããŒã)ã奜ãåçã«é«ãå ±é ¬ãäžããå ±é ¬ã¢ãã«ãäœæã
ã¹ããã3(PPOã䜿çšããŠå ±é ¬ã¢ãã«ã«å¯ŸããŠåŒ·ååŠç¿ããªã·ãŒãæé©å): RMïŒå ±é ¬ã¢ãã«ïŒã®åºåãã¹ã«ã©ãŒå ±é ¬ãšããŠäœ¿çšãããªã·ãŒã®finetuningã«ããããã®å ±é ¬ãPPOã¢ã«ãŽãªãºã [Schulman2017]ã䜿çšããŠæé©åããã
ã¹ãããïŒïŒïŒã¯å埩çã«å®æœããŠè¯ãããçæ¹ã ãã ãšæå³ããªããïŒã¹ããããããŒã¿ã«ã§æãããšäžéšã«ã©ãã«ãã€ããŠããåæåž«ããåŠç¿ã§ããã
Dataset
åŠç¿ããŒã¿ã¯Upworkã§åéãã40人ã®ã©ãã©ãŒãäœæãããã¢ãšããããã¿ã€ãç(InstructGPT)ããŠãŒã¶ãŒã«äœ¿çšããŠããã£ãŠéããããŒã¿ã»ããã®äºçš®é¡ããããã©ãã©ãŒãäœæãããã®ã¯ä»¥äžã®äžçš®é¡ã§ãããã©ãã«å質ã瀺ãã¢ãããŒã¿ãŒåææ§ææšÎºã¯ ïŒïŒïŒ å°ãšé«ãæ°å€ã瀺ããã
- Plain: ã©ãã©ãŒã«è¡šïŒã®ä»»æã®ã¿ã¹ã¯ã®prompt/completionã®ãã¢ãååãªå€æ§æ§ãæãããŠäœæãããã
- Few-shot: ä»»æã®æ瀺ã«å¯ŸããŠãèšãæããè€æ°ã®prompt/completionã®ãã¢ãäœæããã
- User-based: OpenAI APIãžã®ãŠã§ã€ããªã¹ãç³ã蟌ã¿ã«èšèŒããããŠãŒã¹ã±ãŒã¹ã«åºã¥ããã©ãã©ãŒã«ãããã®ãŠãŒã¹ã±ãŒã¹ã«å¯Ÿå¿ããããã³ãããäœæãããã
å è¡ãªãªãŒã¹ããInstructGPTã«ãã£ãŠååãããäŒè©±ã®èªç¶èšèªã¿ã¹ã¯ã®ååžã¯è¡šïŒã§ãããã©ãã©ãŒã«ãã£ãŠã©ã®ã¿ã¹ã¯ã«åé¡ãããããèå¥ããã
stepããšã«ä»¥äžã®ãµã³ãã«æ°ã§äœæããã
step | ãµã³ãã«æ° | æåž«ã©ãã« |
---|---|---|
1:SFT | 13k | çæ³çãªåçãcompletionãšããŠäžãã |
2:RM | 33k | çæãããåçã«ä»äžããã©ã³ãã³ã°ãå ±é ¬ãšãã |
3:PPO | 31k | ãªã |
ãã©ã€ãã·ãŒä¿è·ã®ããå人èå¥æ å ± (PII)ã¯å¿ååãããtrain/test ã¯ãŠãŒã¶ãŒIDã«é¢ããŠæåã«ãªãããã«åé¢ãããæåŸã«ããŒã¿ã»ãããµã³ãã«ãæ²èŒããã
å®éš
å³ïŒã«ãã人éã®ãã£ãŒãããã¯ã䜿çšãã匷ååŠç¿ã¢ãã«ïŒRLHFïŒããä»ã®ã¢ãã«ã«æ¯ã¹ãŠæããã«é«ãã¹ã³ã¢ãç²åŸããŠããããã®ããšãããRLHFãæ瀺ã«åŸãèœåãå¿çã®é¢é£æ§ãäžè²«æ§ãå€§å¹ ã«åäžãããå¹æãããããšãããããã¢ãã«ã®ãµã€ãºã倧ãããªãã«ã€ããŠãå¿çã®å質ãåäžããŠããããšã瀺ãããŠããããã ããã¢ãã«ãäžå®ã®ãµã€ãºã«éãããšãæ§èœã®åäžãéåããåŸåãèŠãããã
PPOã§ã¯ãã«ã·ããŒã·ã§ã³ãå¢ããå Žåããã£ãã
æ¹åãããµã³ãã«ã®äŸã§ããã
å®è£
https://github.com/ggerganov/llama.cpp
Meta AI Research ã® LLaMa(13B)ãM1æèŒMacbook Proãã¡ã¢ãª6GBãGPUãªãã§æ¯ç§10ã¬ã¹ãã³ã¹çšåºŠã§åäœå¯èœãLLaMaã®13Bã¯GPT-3ã®175Bãšã»ãŒåçã
Featured ones: