Skip to content

Latest commit

 

History

History
193 lines (151 loc) · 6.41 KB

File metadata and controls

193 lines (151 loc) · 6.41 KB

Regression Test Analysis & Recommendation

Current State Analysis

Bash Script Tests (scripts/run-regression-tests.sh)

Currently running these tests:

  1. Phase 1: Unit Tests

    • cargo test --lib --features huggingface
    • Status: ❌ FAILING (serial_test missing - NOW FIXED)
  2. Phase 2: Regression Test Suite

    • cargo test --test regression_tests --features huggingface
    • Status: ✅ PASSING
  3. Phase 3: Build Verification

    • cargo build --release --features huggingface
    • Status: ✅ PASSING
  4. Phase 4: API Compatibility Tests

    • cargo test test_model_discovery --features huggingface
    • cargo test test_openai_api --features huggingface
    • Status: ❌ FAILING (no matching test names)
  5. Phase 5: Issue-Specific Regression Tests

    • cargo test test_qwen_model_template_detection --features huggingface (Issue #13)
    • cargo test test_custom_model_directory_environment_variables --features huggingface (Issue #12)
    • cargo test test_cli_model_dirs_option_compatibility --features huggingface
    • cargo test --no-default-features --features huggingface,llama-opencl,llama-vulkan gpu_backend (Issue #72)
    • Status: ❌ FAILING (no matching test names)
  6. Phase 6: Security & Error Handling

    • cargo test test_error_handling_robustness --features huggingface
    • Status: ❌ FAILING (no matching test name)
  7. Phase 7: Code Quality

    • cargo fmt -- --check
    • cargo clippy --features huggingface -- -D warnings
    • Status: ✅ PASSING

Rust Test Files (tests/*.rs)

Existing comprehensive test coverage:

  1. release_gate_integration.rs - ✅ Validates CI/CD gates
  2. mlx_support_regression_test.rs - ✅ 10 MLX-specific tests
  3. gpu_backend_tests.rs - ✅ 9 GPU backend tests (including Issue #72)
  4. gpu_layer_verification.rs - ✅ 4 GPU layer tests
  5. regression_tests.rs - ✅ General regression tests
  6. Various other test files - ✅ Model discovery, streaming, templates, etc.

Missing: MOE CPU Offloading Tests

CRITICAL GAP: No tests for --cpu-moe or --n-cpu-moe flags

Problem Identified

The bash script is calling test functions that don't exist:

  • test_model_discovery ❌ (no match)
  • test_openai_api ❌ (no match)
  • test_qwen_model_template_detection ❌ (no match)
  • test_custom_model_directory_environment_variables ❌ (no match)
  • test_cli_model_dirs_option_compatibility ❌ (no match)
  • test_error_handling_robustness ❌ (no match)

But we DO have working tests when we run:

  • cargo test gpu_backend ✅ (9 tests pass including Issue #72)
  • cargo test --test mlx_support_regression_test ✅ (10 tests)
  • cargo test --test release_gate_integration ✅ (9 tests)

Recommendation: Professional CI/CD Setup

Option A: Update Release Workflow (RECOMMENDED)

Pros:

  • Already have .github/workflows/release.yml with 6 gates
  • Professional CI/CD integration
  • Runs on every git tag push
  • Blocks releases automatically if tests fail
  • Already integrated with GitHub Actions

Implementation:

  1. Add MOE regression tests to tests/moe_cpu_offload_regression_test.rs
  2. Update .github/workflows/release.yml Gate 5 to run ALL tests:
    - name: "🚧 GATE 5/6: Test Suite Validation"
      run: |
        cargo test --features huggingface
        cargo test --features llama,llama-opencl,llama-vulkan gpu_backend
        cargo test --test mlx_support_regression_test
        cargo test --test moe_cpu_offload_regression_test  # NEW
  3. Delete or deprecate scripts/run-regression-tests.sh (redundant)
  4. Use GitHub Actions as the single source of truth

Option B: Fix Bash Script

Pros:

  • Can run locally without CI/CD
  • Faster iteration during development

Cons:

  • Duplicates CI/CD logic
  • Test names are wrong/don't exist
  • Maintenance burden (two systems to update)

Implementation:

  1. Fix test names in bash script to match actual tests
  2. Add MOE tests
  3. Still need to maintain both bash and CI/CD

Option C: Cargo Integration (BEST LONG-TERM)

Pros:

  • Single command: cargo test --workspace
  • Proper Rust tooling
  • IDE integration
  • No bash scripts to maintain
  • Works on all platforms

Cons:

  • Requires restructuring tests
  • More upfront work

Implementation:

  1. Create tests/release_gates.rs that runs all gate tests
  2. Use #[test] annotations with proper feature flags
  3. Run with cargo test --workspace --all-features
  4. CI/CD just calls cargo test

My Recommendation: OPTION A + C Hybrid

Immediate (Tonight - v1.7.2 Release):

  1. Add MOE regression test (15 mins):

    // tests/moe_cpu_offload_regression_test.rs
    #[test]
    fn test_cpu_moe_flag_exists() { ... }
    
    #[test]
    fn test_n_cpu_moe_flag_exists() { ... }
    
    #[test]
    fn test_moe_feature_compilation() { ... }
  2. Update release.yml Gate 5 to run all tests (5 mins)

  3. Delete bash script or mark deprecated (1 min)

  4. Test locally: cargo test --workspace (verify all pass)

  5. Push tag: CI/CD will block if anything fails

Long-term (Next Sprint):

  1. Migrate all tests to proper Rust test framework
  2. Use feature flags for conditional tests
  3. Single command: cargo test --workspace --all-features
  4. Pre-commit hooks via cargo-husky crate

Action Plan for Tonight

Step 1: Create MOE Regression Tests (NEW FILE)

// tests/moe_cpu_offload_regression_test.rs

Step 2: Update release.yml

Add to Gate 5:

  • cargo test --workspace
  • Specific MOE/MLX feature tests

Step 3: Verify Locally

cargo test --workspace
cargo test --features llama,llama-opencl,llama-vulkan
cargo test --test moe_cpu_offload_regression_test

Step 4: Commit & Tag

git add .
git commit -m "feat: add MOE regression tests, fix CI/CD gates for v1.7.2"
git tag v1.7.2
git push origin feature/mlx-native-support

CI/CD will automatically validate all gates before release.

Summary

Current Problem: Bash script calls non-existent test functions Root Cause: Tests exist but have different names Immediate Fix: Add MOE tests, use GitHub Actions release gates Long-term Fix: Migrate to pure Rust test framework with cargo test --workspace

Time to implement tonight: ~30 minutes Confidence level: ✅ High (we already have the infrastructure)