1 Distillation with Reasoning: can DeepSeek R1 Teach Better Than Humans?
thelmapaxton93 edited this page 4 months ago


Inclusion of reasoning "chains of thought" (CoT) in the model output significantly enhances its quality, raovatonline.org however it increases inference expense.

  1. A human expert's chain of thought.
  2. The last answer.

    We broadened this dataset by adding:

    Synthetic R1 reasoning, i.e., the CoT generated by DeepSeek R1.

    Then, we fine-tuned 3 variants of the design (using LoRA on llama-3.1 -8 B-instruct), each with various training targets:

    Direct Answer Only: Generate the final response without showing thinking. Human Expert CoT: Generate the final response alongside a reasoning chain resembling the human expert's. Synthetic R1 CoT: Generate the last response alongside DeepSeek R1's artificial reasoning chain. The table listed below summarizes typical accuracy and reasoning length:

    - Note: The accuracy for the 5-shot baseline may vary from numbers reported elsewhere due to various evaluation setups. The essential focus is on comparing relative performance across distillation approaches, not on beating other designs.

    From this research study, artificial thinking CoTs from DeepSeek R1 appear exceptional to human-expert CoTs in increasing efficiency, albeit with a greater inference cost due to their longer length.

    Fireworks AI Inference and Fine-Tuning Platform

    DeepSeek R1 is available on the Fireworks AI platform. An user-friendly distillation user interface will soon be part of FireOptimizer. If you need earlier gain access to, please get in touch to explore choices.

    Conclusions

    By integrating reasoning-based information through distillation, organizations can drastically improve design efficiency without bearing the full burden of human-annotated datasets. DeepSeek R1's ability to produce long, top quality reasoning chains makes it a powerful instructor model-showing that, in some cases, the maker might just out-teach the human.