Skip to content

torch.compile + DeviceMesh bug: "'DeviceMesh' object has no attribute '_mesh_dim_names'" #3926

Description

@danielvegamyhre

Repro

  • torch nightly (cuda 12.8+)
  • torchao latest main branch
  • On B200 devgpu, run: torchrun --nproc_per_node=4 -m pytest test/prototype/moe_training/test_distributed.py -s -v

Error

FAILED test/prototype/moe_training/test_distributed.py::test_moe_training_parallel[recipe_config1-True-expert_tensor_parallel] - torch._dynamo.exc.InternalTorchDynamoError: AttributeError: 'DeviceMesh' object has no attribute '_mesh_dim_names'
FAILED test/prototype/moe_training/test_distributed.py::test_moe_training_parallel[recipe_config1-True-fsdp] - torch._inductor.exc.InductorError: AssertionError:
FAILED test/prototype/moe_training/test_distributed.py::test_moe_training_parallel[recipe_config1-True-fsdp_tp] - torch._inductor.exc.InductorError: AssertionError:
FAILED test/prototype/moe_training/test_distributed.py::test_moe_training_parallel[recipe_config2-True-expert_tensor_parallel] - torch._dynamo.exc.InternalTorchDynamoError: AttributeError: 'DeviceMesh' object has no attribute '_mesh_dim_names'

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't workingmoe

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions