Skip to content
Back to projects
ML Research Paper School Project

Divide-or-Conquer? LLM Distillation Strategies

Presentation and analysis of Apple × Cornell's research paper on LLM distillation — separating decomposition (planning) from resolution (solving) to reduce inference costs while maintaining performance.

AIMLNLP

Business Context

LLMs are powerful for complex reasoning tasks but expensive and difficult to customize. This Apple × Cornell paper explores whether reasoning can be split into decomposition (planning) and resolution (solving), and which part benefits most from distillation into smaller models.

Strategic Problem

Can we effectively separate decomposition and resolution in LLM reasoning to reduce inference costs, facilitate local adaptation via fine-tuning/distillation, and still maintain good performance?

Data Sources

Three benchmark datasets: GSM8K (7.5K math problems, Exact Match), DROP (77.4K QA on long texts, F1 score), and Bamboogle (125 complex nested questions, Accuracy). Models tested: GPT and Vicuna-13B.

Methodology

Evaluated three strategies: Single-Stage (direct answer), Two-Stage (static decomposition then resolution), and Self-Ask/Interactive (dynamic sub-question generation). Tested distillation of the decomposer (planning) into a smaller model while keeping a large solver, and vice versa. Compared static vs. dynamic decomposition on token efficiency.

Key Results

Key finding: distilling the decomposer yields the best cost-performance tradeoff — a small distilled decomposer paired with a large solver achieves near-GPT performance at a fraction of the cost. Static two-stage decomposition uses 4x fewer tokens than dynamic approaches with comparable accuracy.

Business Impact

Deep understanding of LLM reasoning architectures, knowledge distillation, and the cost-performance tradeoffs in deploying AI systems at scale.

Contributors

MZMarc Zahwa