• S
    Stitch
Navigation
  • Repositories
  • Evaluations
  • Research Map

Evaluations

Overall Resolution

31/50

62% resolution rate

Django

18/25

72% resolution rate

Sphinx

13/25

52% resolution rate

Avg Cost / Instance

$0.287

Total $14.37 · flex tier

Avg Wall Time

181s

per instance (primary run)

Avg Cache Hit Rate

79%

prompt token cache hits (flex)

Instances Evaluated

50

LP-optimised · proxies full 500-instance set

Tool Usage

QueryEngine tool calls across all 50 instances

Avg Tool Calls per Instance

QueryEngine tools only · mean calls per instance across all 50 runs

50 instances
InstanceRepoStatusCostCacheTokensLLM Calls
django__django-11790django$0.22677%230,02216
django__django-11815django$0.26882%248,86017
django__django-11848django$0.11774%117,81111
django__django-11880django$0.11970%150,16812
django__django-11885django$0.77581%794,20834
django__django-11951django$0.25581%213,08916
django__django-11964django$0.36477%245,59216
django__django-11999django$0.29483%146,53513
django__django-12039django$0.27181%251,16818
django__django-12050django$0.11480%131,33012
django__django-12143django$0.24079%201,02216
django__django-12155django$0.14977%222,65015
django__django-12193django$0.35580%155,55914
django__django-12209django$0.24582%554,13630
django__django-12262django$0.30777%225,19715
django__django-12273django$0.23283%441,94026
django__django-12276django$0.29377%216,96715
django__django-12304django$0.21876%213,63816
django__django-12308django$0.16078%254,50819
django__django-12325django$0.27378%124,69811
django__django-12406django$0.60385%536,44327
django__django-12708django$0.25472%484,98827
django__django-12713django$0.21372%180,96513
django__django-12774django$0.15180%193,74915
django__django-9296django$0.12365%169,51013
sphinx-doc__sphinx-10323sphinx$0.18973%141,63211
sphinx-doc__sphinx-10435sphinx$0.19774%212,33916
sphinx-doc__sphinx-10466sphinx$0.20074%148,97811
sphinx-doc__sphinx-10673sphinx$0.522100%1,171,64237
sphinx-doc__sphinx-11510sphinx$0.610100%1,483,79049
sphinx-doc__sphinx-7590sphinx$0.615100%1,113,09047
sphinx-doc__sphinx-7748sphinx$0.619100%1,725,26259
sphinx-doc__sphinx-7757sphinx$0.20170%176,22813
sphinx-doc__sphinx-7985sphinx$0.27378%411,98422
sphinx-doc__sphinx-8035sphinx$0.31481%499,29126
sphinx-doc__sphinx-8056sphinx$0.30874%400,94321
sphinx-doc__sphinx-8265sphinx$0.39571%537,25821
sphinx-doc__sphinx-8269sphinx$0.13576%142,36312
sphinx-doc__sphinx-8475sphinx$0.18975%172,74414
sphinx-doc__sphinx-8548sphinx$0.444100%917,25039
sphinx-doc__sphinx-8551sphinx$0.23265%197,19013
sphinx-doc__sphinx-8638sphinx$0.22475%212,93514
sphinx-doc__sphinx-8721sphinx$0.23074%179,14712
sphinx-doc__sphinx-9229sphinx$0.32277%411,17623
sphinx-doc__sphinx-9230sphinx$0.34770%437,37720
sphinx-doc__sphinx-9281sphinx$0.17578%181,78314
sphinx-doc__sphinx-9320sphinx$0.19479%258,18419
sphinx-doc__sphinx-9367sphinx$0.13374%127,28012
sphinx-doc__sphinx-9461sphinx$0.45299%730,64335
sphinx-doc__sphinx-9698sphinx$0.23170%238,29215

SWE-bench Verified Mini · GPT-5.4 (OpenAI flex tier)