The Fastest Way to Copy Memory in C#

There’s a very tempting low-level C# benchmark hiding in plain sight. Take a block of bytes. Copy it somewhere else. Try the obvious version. Try the unsafe version. Try the version with MemoryMarshal. Try the version that looks like it escaped from a runtime PR. Then ask the dangerous question......
Can ugly C# beat the nice API?
That’s the hook. The useful part is what the benchmark teaches you about modern .NET. Most of the time, the fastest memory copy in C# is the simple one:
source.CopyTo(destination);
That doesn’t sound exciting enough for a low-level blog post, but that’s exactly why it’s worth looking at. Span<T>.CopyTo isn’t just nice syntax. It gives the runtime a very clear description of what you want. You’re not asking it to run a C# loop. You’re asking it to copy a contiguous region of memory. Once the JIT understands that, it has options.
The fake contest
The gimmick version of this post is simple. We line up a few contenders and make them fight.
for (var i = 0; i < source.Length; i++)
{
destination[i] = source[i];
}
Against:
source.AsSpan().CopyTo(destination);
Against:
Buffer.BlockCopy(source, 0, destination, 0, source.Length);
Against:
unsafe
{
fixed (byte* src = source)
fixed (byte* dst = destination)
{
Buffer.MemoryCopy(src, dst, destination.Length, source.Length);
}
}
And then the dangerous looking one:
unsafe
{
fixed (byte* src = source)
fixed (byte* dst = destination)
{
Unsafe.CopyBlockUnaligned(dst, src, (uint)source.Length);
}
}
This is where a lot of posts stop. They run the benchmark, paste the table, and crown a winner. That’s usually the least interesting result. The real lesson is that memory copying has different bottlenecks at different sizes. For tiny buffers, call overhead and bounds checks can dominate. For medium buffers, the JIT and runtime helpers matter more. For large buffers, you often stop measuring C# and start measuring memory bandwidth. At that point, your clever method isn’t racing another method. It’s racing the machine.
The decision tree I actually use
This is roughly how I’d think about it in production code.
The simple path is deliberately short. If you have Span<T>, use CopyTo. If you have arrays, turn them into spans and use CopyTo. If you’re at a pointer boundary, Buffer.MemoryCopy becomes reasonable. If you’re reaching for Unsafe.CopyBlockUnaligned, the code should be small, isolated, benchmarked, and heavily justified. That last bit is important. Unsafe isn’t a personality trait.
A benchmark worth running
Here’s a simple BenchmarkDotNet setup. I’d run this with a few sizes because one size can lie to you.
using System.Runtime.CompilerServices;
using BenchmarkDotNet.Attributes;
[MemoryDiagnoser]
public class MemoryCopyBenchmarks
{
private byte[] _source = null!;
private byte[] _destination = null!;
[Params(32, 256, 4096, 1_048_576)]
public int Size { get; set; }
[GlobalSetup]
public void Setup()
{
_source = new byte[Size];
_destination = new byte[Size];
Random.Shared.NextBytes(_source);
}
[Benchmark(Baseline = true)]
public void SpanCopyTo()
{
_source.AsSpan().CopyTo(_destination);
}
[Benchmark]
public void ManualLoop()
{
for (var i = 0; i < _source.Length; i++)
{
_destination[i] = _source[i];
}
}
[Benchmark]
public void ArrayCopy()
{
Array.Copy(_source, _destination, _source.Length);
}
[Benchmark]
public void BufferBlockCopy()
{
Buffer.BlockCopy(_source, 0, _destination, 0, _source.Length);
}
[Benchmark]
public unsafe void BufferMemoryCopy()
{
fixed (byte* src = _source)
fixed (byte* dst = _destination)
{
Buffer.MemoryCopy(src, dst, _destination.Length, _source.Length);
}
}
[Benchmark]
public unsafe void UnsafeCopyBlockUnaligned()
{
fixed (byte* src = _source)
fixed (byte* dst = _destination)
{
Unsafe.CopyBlockUnaligned(dst, src, (uint)_source.Length);
}
}
}
Then run it properly.
dotnet run -c Release
Don’t run this under the debugger. Don’t trust one buffer size. Don’t trust one machine. That sounds fussy, but it’s the difference between learning something and just collecting numbers that confirm what you wanted to believe.
What you’ll probably see
The exact numbers will vary, but the shape is usually more useful than the table. For tiny copies, the manual loop can look surprisingly competitive. That doesn’t mean it’s better. It often means the work is so small that measurement noise, inlining, and setup details are now part of the result.
For normal array-to-array copies, Span<T>.CopyTo, Array.Copy, and Buffer.BlockCopy are usually hard to embarrass. They express the operation clearly, and the runtime has spent years getting those paths into good shape.
For pointer-based code, Buffer.MemoryCopy makes sense when you’re already in unsafe territory. It takes the destination size, which is useful because at least the API knows how much space is available on the target side.
Unsafe.CopyBlockUnaligned looks like the exciting one, but it comes with sharp edges. It takes a uint byte count. It works at a lower level. It doesn’t make your code magically faster just because the namespace says Unsafe. That’s the annoying truth of low-level .NET performance, the clean API is often already close to the metal.
The overlap problem
There’s one detail you shouldn’t ignore, overlapping memory. Imagine you have a buffer and you want to shift bytes inside the same array.
var buffer = new byte[] { 1, 2, 3, 4, 5 };
buffer.AsSpan(0, 4).CopyTo(buffer.AsSpan(1));
This kind of operation is easy to get wrong if you write the loop yourself. If you copy forwards when you should copy backwards, you overwrite data before you’ve had a chance to move it. Span<T>.CopyTo is designed to handle overlap. Buffer.MemoryCopy also handles overlapping regions. That’s a very practical reason to avoid the heroic hand-written version unless you’ve proved you need it. A bad memory copy doesn’t fail politely. It gives you correct looking data then one day it falls over for no obvious reason.
MemoryMarshal is not a copy button
MemoryMarshal is useful here, but not because it gives you a secret faster copy API. It’s useful because it lets you reinterpret memory.
using System.Runtime.InteropServices;
Span<int> values = stackalloc int[4]
{
10,
20,
30,
40
};
Span<byte> bytes = MemoryMarshal.AsBytes(values);
No data moved there. You now have a byte shaped view over the same memory. That can be very useful when writing binary protocols, serialisers, hashing code, or interop-heavy code. But it comes with all the usual low-level concerns. You need to care about layout. You need to care about endianness. You need to avoid types that contain managed references. The fastest copy is the one you don’t do, but only when viewing the same memory is actually safe for your problem.
The hidden cost of going unsafe
Unsafe code has a cost even when it’s faster. You’ve made the code harder to review. You’ve made it easier to introduce memory corruption. You’ve limited who can comfortably maintain it. You may also need project level unsafe settings, which means the decision leaks beyond the method itself. That doesn’t mean unsafe code is bad. It means it needs to earn its place. A small unsafe helper buried behind a clear method can be reasonable in a hot path. Sprinkling pointer tricks through business code is a mess. Performance work should reduce pain, not move it from the CPU to every future code review.
A better production wrapper
If I had a real hot path where I wanted to keep the option open, I’d hide the decision behind a boring method.
public static class ByteCopy
{
public static void Copy(ReadOnlySpan<byte> source, Span<byte> destination)
{
source.CopyTo(destination);
}
}
That looks pointless at first, but it gives you a seam. Most systems should stop here. If profiling later proves this exact copy is a bottleneck, you can isolate the uglier version.
public static class ByteCopy
{
public static unsafe void Copy(byte[] source, byte[] destination)
{
if (destination.Length < source.Length)
{
throw new ArgumentException("Destination is too small.", nameof(destination));
}
fixed (byte* src = source)
fixed (byte* dst = destination)
{
Buffer.MemoryCopy(src, dst, destination.Length, source.Length);
}
}
}
Even then, I’d want a benchmark, tests around edge cases, and a comment explaining why Span<T>.CopyTo wasn’t enough. The comment shouldn’t say "for performance". That’s too vague. It should say what was measured, what improved, and why the trade off is acceptable.
So what’s the fastest way?
For normal managed code, start with Span<T>.CopyTo. For array-heavy code, use spans unless an existing API already gives you exactly what you need. For native interop, Buffer.MemoryCopy is a reasonable tool once you’re already dealing with pointers. For Unsafe.CopyBlockUnaligned, be sceptical. It can be useful, but it’s not the default upgrade path from CopyTo. In many cases, it just makes the code scarier without moving the numbers enough to matter. The fun benchmark is trying to beat the runtime. The useful lesson is knowing when not to try.





