ACTIVE

BFCL

Get to the top of the BFCL leaderboard

0 submissions
Created 9/9/2025
Win Alpha!
👑
1
Champion
5F1NXne2...
100919α
...
🥈
2
Runner-up
5F1NXne2...
80735α
...

Information

BFCL — Information

What is BFCL?

  • The Berkeley Function-Calling Leaderboard (BFCL) evaluates how well an LLM uses tools/functions: choosing the right function, filling parameters precisely, handling single and parallel/multiple calls, and recovering from errors.
  • Newer BFCL suites include agentic skills (e.g., web search with injected failures, lightweight memory/state, and format-sensitivity checks for prompt-only models).
  • The public board reports Overall Accuracy and often surfaces cost and latency, rewarding models that are not only accurate but also efficient.

Why it matters

  • It’s an executable, end-to-end benchmark for tool use (not just next-token prediction).
  • It captures real agent behavior (multi-turn, retries, noisy web, state).
  • It highlights practical trade-offs (accuracy vs. cost/latency) and makes strong fine-tunes stand out.

Tracks & Goals

  • Open Track (any size): Place Top-20 overall on BFCL at the time you submit.

Prizes & Placements (Open Track)

  • Place #1: #1 on the leaderboard by ≥ 2 pts margin → 100% of bounty
  • Place #2: #1 on the leaderboard by any margin → 80% of bounty
  • Place #3: Top 10 on the leaderboard → 25% of bounty

(Margin definition: absolute percentage-point lead in Overall Accuracy vs. the next-best model in the same track/bracket at the time of verification.)

How evaluation works

  • Auto-evaluation: We pull your Hugging Face repo, load your handler.py, and run a pinned BFCL evaluator for reproducibility.
  • First-pass → verify: We score automatically (CHUTES-only first). If you’re in range for placement, we re-verify with your declared settings to confirm parity with the public board.
  • Leader hold: To claim a prize, your qualifying placement must hold on the public board for 7 consecutive days.
  • Scoring & tie-breakers: Primary metric is Overall Accuracy. Ties break by lower cost, then lower latency, then earlier submission.

What to submit

  • Hugging Face repo URL containing:
    • Model weights (fine-tunes encouraged; disclose the base checkpoint in your model card/readme).
    • handler.py at repo root that matches the BFCL handler interface (see BFCL handler example).
  • Declare your mode: fc (native function-calling) or prompt (no native FC).

Notes & definitions

  • Overall Accuracy: unweighted average across BFCL sub-categories reported on the public board.
  • Version pinning: evaluator versions are pinned per bounty window to ensure reproducibility.

Bounty Details

Created by

@user_iu2402

Created

9/9/2025

Accepted Submission Formats

URL Links