1. Stack-Only Types with ref struct and Span<T>
Span<T> is the cornerstone of zero-allocation memory manipulation. It represents a contiguous region of memory that can reside on the stack, heap, or unmanaged memory. Because Span<T> is a ref struct, it cannot be boxed, stored on the managed heap, or captured by closures.
Implementation:
Replace string manipulation and array slicing with Span<T> to avoid intermediate allocations.
// Anti-pattern: Allocates new string for every substring
public static List<string> ParseNaive(string input)
{
return input.Split(',').Select(s => s.Trim()).ToList();
}
// Solution: Zero-allocation parsing using Span
public static void ParseZeroAlloc(ReadOnlySpan<char> input, Action<ReadOnlySpan<char>> onToken)
{
while (!input.IsEmpty)
{
var commaIndex = input.IndexOf(',');
if (commaIndex == -1)
{
onToken(input.Trim());
break;
}
onToken(input.Slice(0, commaIndex).Trim());
input = input.Slice(commaIndex + 1);
}
}
Architecture Rationale:
Use Span<T> when processing data buffers, parsing protocols, or transforming streams. The constraint that Span<T> cannot escape the stack forces a design where processing happens synchronously or via callbacks, which aligns with high-performance patterns.
2. Object and Buffer Pooling
When allocation is unavoidable (e.g., complex object graphs or large buffers), reuse memory via pooling.
Object Pooling:
For reference types that are expensive to construct and frequently used.
using Microsoft.Extensions.ObjectPool;
// Define a policy to reset objects before reuse
public class MyObjectPolicy : IPooledObjectPolicy<MyObject>
{
public MyObject Create() => new MyObject();
public bool Return(MyObject obj)
{
obj.Reset(); // Critical: Clear state to prevent leaks
return true;
}
}
// Usage
var pool = ObjectPool.Create(new MyObjectPolicy());
var obj = pool.Get();
try
{
// Use obj
}
finally
{
pool.Return(obj); // Must return to avoid pool starvation
}
Array Pooling:
For temporary buffers, ArrayPool<T>.Shared is the standard. It maintains thread-local buckets to minimize contention.
using System.Buffers;
byte[] buffer = ArrayPool<byte>.Shared.Rent(1024);
try
{
// Use buffer. Note: buffer.Length >= 1024.
// Always check actual length if relying on exact size.
ProcessData(buffer.AsSpan(0, 1024));
}
finally
{
ArrayPool<byte>.Shared.Return(buffer, clearArray: false);
// Set clearArray: true only if handling sensitive data.
}
3. Struct Optimization and in Parameters
Large structs copied by value can cause performance degradation and stack pressure. Use readonly struct and in parameters to pass structs by reference without allowing mutation.
// Efficient struct definition
public readonly struct Point3D
{
public double X { get; }
public double Y { get; }
public double Z { get; }
public Point3D(double x, double y, double z) => (X, Y, Z) = (x, y, z);
}
// Pass by read-only reference
public static double Distance(in Point3D p1, in Point3D p2)
{
// No copy of p1 or p2 occurs
return Math.Sqrt(Math.Pow(p2.X - p1.X, 2) + ...);
}
4. GC Configuration Tuning
Server GC is optimized for throughput and parallel collection. Configure the runtime via .runtimeconfig.json or environment variables.
{
"runtimeOptions": {
"configProperties": {
"System.GC.Server": true,
"System.GC.Concurrent": true,
"System.GC.RetainVM": true,
"System.GC.HeapHardLimit": 2147483648,
"System.GC.LatencyMode": 1,
"System.GC.NoGCRegion": true
}
}
}
Server: true: Enables multi-threaded GC, essential for multi-core servers.
LatencyMode: 1: LowLatency mode suppresses Gen2 collections during critical sections. Use with TryStartNoGCRegion for deterministic pauses.
RetainVM: true: Prevents the GC from releasing virtual memory back to the OS, reducing allocation latency for future requests.
Pitfall Guide
1. Copying Large Structs
Mistake: Passing structs larger than 16-32 bytes by value.
Impact: The JIT generates copy code for every pass. For a 64-byte struct passed in a tight loop, this doubles memory bandwidth usage and increases stack pressure.
Fix: Use in parameters or ref returns. Measure struct size with sizeof(T).
2. Large Object Heap (LOH) Fragmentation
Mistake: Allocating arrays or objects larger than 85,000 bytes frequently.
Impact: Objects >85KB go to the LOH. The LOH is only compacted during full Gen2 collections. Frequent LOH allocations lead to fragmentation and OOM exceptions even when total memory usage is low.
Fix: Use ArrayPool<T> for large buffers. If large objects are necessary, compact the LOH explicitly using GCSettings.LargeObjectHeapCompaction = GCLargeObjectHeapCompactionMode.CompactOnce; before a Gen2 collection.
3. Boxing and Unboxing
Mistake: Passing value types to interfaces or object parameters.
Impact: The value type is boxed onto the heap, creating an allocation. Unboxing requires type checking and copying.
Fix: Use generic constraints (where T : IComparable) instead of interface parameters. Use ref structs where possible to prevent boxing.
4. Closure Allocations
Mistake: Capturing variables in lambdas or local functions within hot paths.
Impact: The compiler generates a hidden class to hold captured variables. This class is allocated on the heap every time the delegate is created.
Fix: Avoid closures in tight loops. Pass data via parameters or use struct state objects. If using local functions, ensure they don't capture outer variables unnecessarily.
5. Event Handler Leaks
Mistake: Subscribing to events without unsubscribing, especially with long-lived publishers.
Impact: The subscriber object cannot be garbage collected because the publisher holds a reference via the delegate. This causes memory leaks that manifest as OOM over time.
Fix: Implement IDisposable and unsubscribe in Dispose. Use WeakEventManager patterns for scenarios where unsubscription is difficult.
6. Async State Machine Overhead
Mistake: Overusing async/await for CPU-bound work or in extremely tight loops.
Impact: Every async method generates a state machine struct and, upon the first await, allocates a task object if not completed synchronously.
Fix: Use ValueTask and IValueTaskSource for methods that often complete synchronously. Profile to ensure async is only used for I/O bound operations.
7. Ignoring GC.KeepAlive
Mistake: Relying on finalizers for unmanaged resources without KeepAlive.
Impact: The JIT may collect an object earlier than expected if it sees no further references, even if an unmanaged handle is still in use.
Fix: Call GC.KeepAlive(obj) at the end of methods using unmanaged resources tied to the object's lifetime.
Production Bundle
Action Checklist
Decision Matrix
| Scenario | Recommended Approach | Why | Cost Impact |
|---|
| High-throughput JSON/XML Parsing | Utf8JsonReader / Span<T> | Avoids string allocations; processes bytes directly. | Reduces CPU by 30%, Latency by 80%. |
| Frequent small object creation | ObjectPool<T> | Reuses reference types; avoids Gen0 pressure. | Low latency, slight complexity increase. |
| Large buffer processing (>85KB) | ArrayPool<T> | Prevents LOH fragmentation; reuses memory. | Eliminates LOH OOM risk; stable throughput. |
| Real-time trading/Control loops | ref struct + Stackalloc | Zero heap allocation; deterministic execution. | Requires unsafe context; max performance. |
| Background batch processing | Standard LINQ / POCOs | Development speed prioritized; GC handles load. | Lower dev cost; acceptable latency variance. |
Configuration Template
Create runtimeconfig.template.json in your project root to enforce production GC settings.
{
"configProperties": {
"System.GC.Server": true,
"System.GC.Concurrent": true,
"System.GC.RetainVM": true,
"System.GC.HeapHardLimit": 0,
"System.GC.LatencyMode": 0,
"System.GC.NoGCRegion": false,
"System.Threading.ThreadPool.MinThreads": 50,
"System.Threading.ThreadPool.MaxThreads": 200
}
}
Note: Adjust HeapHardLimit based on container memory limits. Set LatencyMode to 1 (LowLatency) only if implementing NoGCRegion logic; otherwise 0 (Batch) or 2 (Interactive) may be safer defaults.
Quick Start Guide
-
Install Benchmarking Tools:
dotnet add package BenchmarkDotNet
dotnet tool install -g dotnet-counters
dotnet tool install -g dotnet-gcdump
-
Create Baseline Benchmark:
[MemoryDiagnoser]
public class MemoryBenchmarks
{
[Benchmark]
public List<string> NaiveParsing()
{
return "item1,item2,item3".Split(',').ToList();
}
}
-
Run and Analyze:
dotnet run -c Release --filter *MemoryBenchmarks*
Observe Allocated column. If > 0 B/ops, proceed to optimization.
-
Apply Span Optimization:
Refactor code to use ReadOnlySpan<char> and process tokens via callbacks or spans without allocating lists.
-
Validate Improvement:
Re-run benchmark. Confirm Allocated drops to 0 B/ops and Mean latency improves. Commit changes with performance evidence.