CTX Bench Compare
6 benchmarks • live-inf iter 31
MAB N=50
LongMemEval real
McNemar
G1 Recall@7
Homograph
PUAC →
Loading MAB N=50...
Loading LongMemEval...
Loading McNemar...
Loading G1 regression...
Loading Homograph...