Skip to content

Alternative sorting in model ranking #9

Description

@stilkin-pxl

Would you consider adding different sorting strategies for the model ranking list?

Observed behavior:

The current sorting occasionally creates unintuitive ordering where models perform similar on green, but very different on orange and red but are sorted by green only.

Example in image:

Image

#16 and #17 have very similar green score, but differ a lot on orange and red. In this specific case I would assume GPT-5.4 to be a "better" pick than Gemini 3 Pro Preview (Low) given the higher percentage of partial challenges (orange), however it ranks lower.

A similar example occurs below that on #18, #19 and #20 where all three share a similar score on green but the bottom one appears slightly favorable.

Potential suggestions:

Would it make sense to add different sorting methods (or adjust the existing one) to perhaps something like:

  • sort by green first, when within 1% sort by orange
  • sort by green first, then by a weighed factor of orange (e.g. 1/10th the value of green)
  • sort inversely by red overall, when tied sort by orange inversely (probably a bad idea)
  • ...

I can assume some of these make less sense than others, but was wondering if one can be found that is less likely to have these results.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions