ggml: optimize concat op by replacing per-element memcpy with row-level memcpy by sirohikartik · Pull Request #24575 · ggml-org/llama.cpp

sirohikartik · 2026-06-13T14:07:26Z

Overview

Optimize ggml_compute_forward_concat_any by replacing per-element memcpy with row-level memcpy.

The original implementation called memcpy once per scalar element with a branch inside the innermost loop to select between src0 and src1. For a typical KV-cache concat shape [4096 x 1 x 16 x 1] along dim=2 this results in 65,536 separate memcpy calls of 4 bytes each.

This PR splits the loop into two separate regions (one per source tensor) eliminating the per-element branch, and collapses the i0 loop entirely to copy one full row per memcpy call instead of one element.

Benchmark

Isolated microbenchmark using identical tensor layout and loop logic.
Shape: [4096 x 1 x 16 x 1], concat dim=2, fp32, 200 runs
Apple Silicon (M1 Air):

	old	new	speedup
warm cache	1.61 ms/call	0.012 ms/call	133x
cold cache	1.59 ms/call	0.019 ms/call	83x

Cold cache measured by flushing 64MB through memory before every call.

Requirements

I have read and agree with the contributing guidelines
AI usage disclosure: YES I used Claude Sonnet 4.6 to help understand the existing
code, identify the inefficiency, and verify the approach. All changes reviewed and benchmarked by me.

…el memcpy

sirohikartik · 2026-06-13T14:08:08Z

Hi @ggerganov I think this is ready for review.

ggml: optimize concat op by replacing per-element memcpy with row-lev…

610531d

…el memcpy

sirohikartik requested a review from ggerganov as a code owner June 13, 2026 14:07

github-actions Bot added the ggml changes relating to the ggml tensor library for machine learning label Jun 13, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml: optimize concat op by replacing per-element memcpy with row-level memcpy#24575

ggml: optimize concat op by replacing per-element memcpy with row-level memcpy#24575
sirohikartik wants to merge 1 commit into
ggml-org:masterfrom
sirohikartik:optim/concat-forward

sirohikartik commented Jun 13, 2026

Uh oh!

sirohikartik commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

sirohikartik commented Jun 13, 2026

Overview

Benchmark

Requirements

Uh oh!

sirohikartik commented Jun 13, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant