Edge Inference Planner

Hardware-aware optimizer for partitioning an edge AI pipeline across CPU, GPU, and NPU devices while respecting latency, energy, and memory budgets.

This repo is built to signal the kind of engineering depth that companies like NVIDIA and Apple care about:

hardware-aware scheduling instead of CRUD-heavy app work
performance tradeoff reasoning across latency, energy, and transfer overhead
exact search for small design spaces and beam search for larger ones
explainable results that make placement decisions easy to inspect

Problem

Modern on-device AI systems rarely run on one accelerator end-to-end. Real products split work across CPU, GPU, and NPU blocks depending on:

stage-level compute characteristics
model residency pressure on each device
transfer cost between accelerators
end-to-end latency and battery constraints

Edge Inference Planner turns that into a reproducible optimization problem.

What It Does

Given a pipeline scenario in JSON, the planner:

models per-stage execution profiles on each device
applies inter-device transfer penalties when stages move across accelerators
tracks cumulative model memory on every device
enforces latency and energy caps
returns the best placements for latency, efficiency, or balanced goals

Quickstart

1. Create a virtual environment

python -m venv .venv
# Windows
.venv\Scripts\activate
# macOS/Linux
source .venv/bin/activate

2. Install the project

pip install -e .[dev]

3. Run the sample workload

edge-inference-planner plan scenarios/mobile_vision_pipeline.json --goal balanced --top-k 3

4. Export reports

edge-inference-planner plan scenarios/mobile_vision_pipeline.json --format csv --output reports/mobile_vision.csv
edge-inference-planner plan scenarios/mobile_vision_pipeline.json --format html --output reports/mobile_vision.html

Example Output

Pipeline: mobile_vision_stack
Goal: balanced | Strategy: exact | Score: 1.85
Latency: 27.97 ms | Energy: 32.18 mJ | Switches: 2

Placement
stage                | device | exec ms | xfer ms | total ms | exec mJ | xfer mJ | memory MB
---------------------+--------+---------+---------+----------+---------+---------+----------
frame_decode         | gpu    | 3.10    | 0.00    | 3.10     | 5.40    | 0.00    | 420
resize_normalize     | gpu    | 1.90    | 0.00    | 1.90     | 3.40    | 0.00    | 350
backbone_embedding   | npu    | 7.40    | 1.08    | 8.48     | 8.10    | 0.72    | 1240
multimodal_fusion    | npu    | 4.70    | 0.00    | 4.70     | 4.90    | 0.00    | 700
detector_heads       | npu    | 6.20    | 0.00    | 6.20     | 5.10    | 0.00    | 540
temporal_smoother    | npu    | 1.40    | 0.00    | 1.40     | 1.20    | 0.00    | 96
renderer             | gpu    | 2.10    | 0.09    | 2.19     | 3.30    | 0.06    | 260

Architecture

Scenario JSON
    |
    v
PipelineSpec -> Device profiles + stage profiles + transfer graph + constraints
    |
    v
Optimizer
    |- Exact branch-and-bound search for small spaces
    |- Beam search fallback for larger spaces
    |
    v
PlanResult
    |- ranked placements
    |- memory utilization summary
    |- stage-by-stage transfer costs
    |- optimization rationale

Detailed design notes live in docs/ARCHITECTURE.md.

Scenario Schema

Each scenario includes:

devices: accelerator memory budgets
links: transfer cost between accelerators
stages: per-device latency, energy, and memory requirements
constraints: optional end-to-end caps

See scenarios/mobile_vision_pipeline.json for a complete example.

Project Structure

Edge Inference Planner/
|-- docs/
|-- scenarios/
|-- src/edge_inference_planner/
|   |-- cli.py
|   |-- models.py
|   |-- optimizer.py
|   |-- report.py
|   `-- scenario.py
|-- tests/
|-- pyproject.toml
`-- README.md

Why This Repo Is Stronger Than a Generic Portfolio App

It demonstrates optimization and systems reasoning instead of only framework familiarity.
It produces inspectable outputs with tradeoffs that are easy to discuss in interviews.
It maps cleanly to edge AI, silicon, graphics, and applied ML platform teams.

Roadmap

Add DAG support for non-linear pipelines
Add thermal throttling models and quantization knobs
Plug in measured hardware benchmarks instead of hand-authored scenario profiles

License

MIT License. See LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Edge Inference Planner

Problem

What It Does

Quickstart

1. Create a virtual environment

2. Install the project

3. Run the sample workload

4. Export reports

Example Output

Architecture

Scenario Schema

Project Structure

Why This Repo Is Stronger Than a Generic Portfolio App

Roadmap

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.github/workflows		.github/workflows
docs		docs
scenarios		scenarios
src/edge_inference_planner		src/edge_inference_planner
tests		tests
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
pyproject.toml		pyproject.toml

Folders and files

Latest commit

History

Repository files navigation

Edge Inference Planner

Problem

What It Does

Quickstart

1. Create a virtual environment

2. Install the project

3. Run the sample workload

4. Export reports

Example Output

Architecture

Scenario Schema

Project Structure

Why This Repo Is Stronger Than a Generic Portfolio App

Roadmap

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages