These are direct-prediction models trained from scratch on small, heavily augmented datasets—not few-shot prompting. ARC remains the canonical target; broader leaderboard context and rules (e.g., ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results