Uh oh!

There was an error while loading. Please reload this page.

EvolvingLMMs-Lab / lmms-eval Public

Notifications You must be signed in to change notification settings
Fork 607
Star 4.3k

Code
Issues 25
Pull requests 14
Discussions
Actions
Projects
Security and quality
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security and quality
Insights

Pull requests: EvolvingLMMs-Lab/lmms-eval

Labels 14 Milestones 0

New pull request New

14 Open 826 Closed

Author

Filter by author

Uh oh!

There was an error while loading. Please reload this page.

Label

Filter by label

Uh oh!

There was an error while loading. Please reload this page.

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Uh oh!

There was an error while loading. Please reload this page.

Milestones

Filter by milestone

Uh oh!

There was an error while loading. Please reload this page.

Reviews

Filter by reviews

No reviews Review required Approved review Changes requested

Assignee

Filter by who’s assigned

Assigned to nobody

Uh oh!

There was an error while loading. Please reload this page.

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Pull requests list

fix(grounding/rec): guard empty metric bucket against ZeroDivisionError

#1371 opened Jun 25, 2026 by ts-kim Contributor

Loading…

fix(visualwebbench): allow anonymous dataset load

#1370 opened Jun 25, 2026 by ts-kim Contributor

Loading…

[WIP] ViZDoom

#1368 opened Jun 21, 2026 by pufanyi Collaborator

Loading…

7 tasks

Add Qwen-native JSON coordinate variants for pointing tasks

#1361 opened Jun 5, 2026 by njb-nvidia Contributor

Loading…

fix: guard choices[0] and message=None before content access (41 sites, 32 files)

#1332 opened May 17, 2026 by qizwiz

Loading…

Feat/ollama model

#1322 opened May 5, 2026 by eliasubz

Loading…

1 of 7 tasks

feat: vLLM-Omni for video generation models

#1314 opened Apr 27, 2026 by pufanyi Collaborator • Draft

fix(evaluator): auto-init gloo process group for multi-rank launches

#1306 opened Apr 23, 2026 by Luodian Contributor • Draft

2 tasks

fix(api/task): guard empty results list in process_results

#1305 opened Apr 23, 2026 by Luodian Contributor • Draft

2 tasks

fix(eval): abort all ranks on per-rank exception instead of deadlocking

#1304 opened Apr 23, 2026 by Luodian Contributor • Draft

2 tasks

Fix missing Task import for type annotation in evaluator

#1291 opened Apr 10, 2026 by luv-oct22

Loading…

2 tasks

feat: add physics reasoning benchmarks (PhysBench, ContPhy, PhysGame, PhysicsRW, PhysReason)

#1272 opened Mar 26, 2026 by Luodian Contributor

Loading…

4 tasks

feat: add VBench video generation evaluation benchmark

#1271 opened Mar 26, 2026 by Luodian Contributor

Loading…

3 tasks

feat: add MiniMax as LLM judge provider (default model: MiniMax-M3)

#1263 opened Mar 22, 2026 by octo-patch

Loading…

3 tasks done

ProTip! Type g p on any issue or pull request to go back to the pull request listing page.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!