Stack vs Heap
allocation ยท escape analysis ยท GC Stack Heap
โโโโโโโโโโโโโโโโโโโโโโ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Per-goroutine (starts at 8KB) Shared across all goroutines
Grows/shrinks automatically Managed by the GC
Fast: bump pointer allocation Slower: GC must scan and collect
Freed when function returns Freed when no more references
Rule: if a value's lifetime is bounded to a function call โ stack
if it escapes (returned, sent to channel, stored) โ heap
What causes heap allocation
Escape
// Returning a pointer โ value escapes to heap func newUser() *User { u := User{Name: "Alice"} // u escapes return &u } // Storing in interface โ concrete type escapes var i any = User{} // User escapes to heap // Sending pointer on channel โ escapes ch <- &User{} // Closure capturing a variable โ escapes x := 42 f := func() { fmt.Println(x) } // x escapes _ = f
Escape analysis โ see the decisions
-gcflags
# Show escape analysis decisions go build -gcflags="-m" ./... # More verbose (shows why) go build -gcflags="-m=2" ./... // Sample output: // ./main.go:8:2: &u escapes to heap // ./main.go:12:14: x escapes to heap // ./main.go:5:17: User{} does not escape # Disable inlining to see more escape detail go build -gcflags="-m -l" ./...
pprof Profiling
CPU ยท memory ยท goroutine ยท go tool pprofProfile before you optimize. pprof tells you exactly where CPU time and memory are spent โ without it, optimizations are guesswork. Always benchmark before and after a change to confirm impact.
HTTP pprof endpoint โ always-on profiling
net/http/pprof
import _ "net/http/pprof" // registers /debug/pprof/ routes func main() { // Expose on a separate port (never on your public port) go http.ListenAndServe(":6060", nil) // ... start your actual server } # Capture 30s CPU profile go tool pprof http://localhost:6060/debug/pprof/profile?seconds=30 # Heap profile go tool pprof http://localhost:6060/debug/pprof/heap # Goroutine dump go tool pprof http://localhost:6060/debug/pprof/goroutine # Open interactive web UI go tool pprof -http=:8081 cpu.prof
Programmatic profiling
runtime/pprof
import "runtime/pprof" // CPU profile f, _ := os.Create("cpu.prof") pprof.StartCPUProfile(f) defer pprof.StopCPUProfile() // Heap profile (call after workload) mf, _ := os.Create("mem.prof") runtime.GC() // run GC for accurate stats pprof.WriteHeapProfile(mf) mf.Close()
Benchmark + pprof
-cpuprofile
# Run benchmark and produce profiles in one step go test -bench=BenchmarkFoo \ -cpuprofile=cpu.prof \ -memprofile=mem.prof \ ./... # Analyze the CPU profile go tool pprof cpu.prof # Inside pprof REPL: # top10 โ top 10 functions by self time # list Foo โ annotated source for function Foo # web โ open flame graph in browser
Reducing Allocations
sync.Pool ยท strings.Builder ยท pre-alloc
sync.Pool โ reuse temporary objects
sync.Pool
// Pool holds temporary objects to reduce GC pressure // Objects may be evicted at any time โ never store state in them var bufPool = sync.Pool{ New: func() any { return new(bytes.Buffer) }, } func buildResponse(data []byte) []byte { buf := bufPool.Get().(*bytes.Buffer) buf.Reset() // always reset before use defer bufPool.Put(buf) buf.Write(data) return buf.Bytes() } // Before sync.Pool: N allocs/op (one Buffer per call) // After sync.Pool: ~0 allocs/op (Buffer is reused)
strings.Builder โ build strings without allocs
strings.Builder
// Concatenating with + allocates on every iteration var s string for _, w := range words { s += w // O(nยฒ) allocations } // strings.Builder โ single allocation at the end var sb strings.Builder sb.Grow(len(words) * 8) // hint capacity if known for _, w := range words { sb.WriteString(w) } result := sb.String()
Pre-allocate slices and maps
make with cap
// Without pre-alloc: grows 2x repeatedly, many allocs var results []int for i := range items { results = append(results, process(items[i])) } // With pre-alloc: zero reallocations results = make([]int, 0, len(items)) for i := range items { results = append(results, process(items[i])) } // Pre-alloc map m := make(map[string]int, len(keys)) for _, k := range keys { m[k]++ }
Inlining
-gcflags ยท inline budget ยท noinlineThe Go compiler inlines small functions at their call sites, eliminating the function call overhead and enabling further optimizations. A function is inlineable if its AST node count is within the inline budget (~80 nodes). Closures,
recover(), and large functions are not inlined.
Check what gets inlined
-gcflags -m
# -m shows inlining decisions alongside escape analysis go build -gcflags="-m" ./... // Output examples: // ./math.go:5:6: can inline Add // ./math.go:12:6: cannot inline Process: function too complex // ./main.go:20:12: inlining call to Add
Writing inlineable functions
Small functions
// Inlineable: simple, small, no recover func max(a, b int) int { if a > b { return a } return b } // Not inlineable: too complex, or uses recover func safeDiv(a, b int) (result int) { defer func() { if r := recover(); r != nil { result = 0 } }() return a / b } // Force no inlining (useful for benchmarks) //go:noinline func doWork() {}
Goroutine Leaks
blocked goroutines ยท goleak ยท runtimeA leaked goroutine is one that is blocked indefinitely โ waiting on a channel that will never be written to, or a mutex that will never be released. Leaked goroutines consume memory and OS resources forever. In servers, leaks compound over time and cause OOM crashes.
Common leak patterns and fixes
Leak patterns
// โ LEAK: goroutine blocks on send if nobody reads ch := make(chan int) // unbuffered go func() { result := doWork() ch <- result // blocks forever if caller already returned }() return // goroutine is now leaked // โ FIX: buffered channel or context cancellation ch := make(chan int, 1) // buffered: send never blocks // โ FIX: use context to stop the goroutine go func() { select { case ch <- doWork(): case <-ctx.Done(): // exits if context is cancelled } }()
Detect leaks in tests with goleak
goleak
import "go.uber.org/goleak" func TestMain(m *testing.M) { goleak.VerifyTestMain(m) } // Or per test: func TestWorker(t *testing.T) { defer goleak.VerifyNone(t) // test code here // goleak.VerifyNone fails if any goroutine leaked }
Count goroutines at runtime
runtime
// Quick check: current goroutine count before := runtime.NumGoroutine() doSomething() after := runtime.NumGoroutine() if after > before { t.Errorf("goroutine leak: before=%d after=%d", before, after) } // pprof goroutine profile shows stack traces // of all live goroutines โ useful to identify what's blocked
Memory Efficiency
struct layout ยท value types ยท GC pressure
Struct field ordering โ reduce padding
Alignment
// Bad: compiler inserts 7 bytes of padding type Bad struct { a bool // 1 byte // 7 bytes padding b int64 // 8 bytes c bool // 1 byte // 7 bytes padding } // total: 24 bytes // Good: largest fields first type Good struct { b int64 // 8 bytes a bool // 1 byte c bool // 1 byte // 6 bytes padding } // total: 16 bytes // Check size: unsafe.Sizeof(Bad{}) โ 24
Value types vs pointer types
Values
// Prefer values for small structs (โค ~3 words) // Copying is cheaper than a pointer dereference type Point struct{ X, Y float64 } func distance(p Point) float64 { ... } // value: stays on stack // Use pointers for large structs to avoid copying type Config struct{ ... } // 100+ fields func process(c *Config) { ... } // pointer: single 8-byte copy // Slice of values vs slice of pointers: []Point // contiguous memory โ cache friendly []*Point // scattered heap objects โ more GC work
Avoiding fmt.Sprintf for hot paths
Allocations
// fmt.Sprintf always allocates (uses reflection internally) msg := fmt.Sprintf("user %d not found", id) // alloc // strconv is allocation-free for numeric conversions var buf [32]byte b := strconv.AppendInt(buf[:0], int64(id), 10) // no alloc // errors.New creates a constant error โ safe to reuse var ErrNotFound = errors.New("not found") // Return ErrNotFound instead of fmt.Errorf("not found") in hot paths // Only use fmt.Errorf (with %w) when wrapping with context at boundaries
Reading Benchmark Output
ns/op ยท B/op ยท allocs/op ยท benchstatgo test -bench=. -benchmem ./... BenchmarkEncode-8 1234567 42.3 ns/op 64 B/op 1 allocs/op โ โ โ โ โ โโโ heap allocations per op โ โ โ โ โโโโโโโโโโโโโ bytes allocated per op โ โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโ wall time per operation โ โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ iterations run โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ GOMAXPROCS (CPU count) โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ benchmark name
benchstat โ compare two runs
benchstat
# Install go install golang.org/x/perf/cmd/benchstat@latest # Run before and after, save output go test -bench=. -count=5 ./... > before.txt # (make your change) go test -bench=. -count=5 ./... > after.txt # Compare benchstat before.txt after.txt # Output shows % change and p-value: # BenchmarkFoo 42.3ns ยฑ 2% 31.1ns ยฑ 1% -26.5% (p=0.008)
What to look for
Interpretation
// High allocs/op โ look for heap escapes, // string concatenation, interface conversions // High B/op with low allocs/op โ large objects, // consider pooling or in-place updates // High ns/op with 0 allocs โ CPU-bound; // profile with -cpuprofile to find hot spots // Noisy results (wide ยฑ range) โ run with -count=10, // close background apps, use benchstat for stats // Always measure the whole system; microbenchmarks // can be misleading if the hot path is elsewhere
Quick Reference
Cheat-sheet| Concept | Command / Call | Notes |
|---|---|---|
| Escape analysis | go build -gcflags="-m" ./... | Shows what escapes to heap |
| Verbose escape | go build -gcflags="-m=2" ./... | Shows why each decision was made |
| Disable inlining | go build -gcflags="-l" ./... | For debugging; not for production |
| Inlining decisions | go build -gcflags="-m" ./... | "can inline" / "inlining call to" |
| CPU profile (HTTP) | /debug/pprof/profile?seconds=30 | Requires import _ "net/http/pprof" |
| Heap profile (HTTP) | /debug/pprof/heap | Current in-use allocations |
| Goroutine dump | /debug/pprof/goroutine | Stack traces of all goroutines |
| Analyze profile | go tool pprof -http=:8081 cpu.prof | Web flame graph |
| Bench + profile | go test -bench=. -cpuprofile=cpu.prof | Profile a benchmark |
| Compare benchmarks | benchstat before.txt after.txt | golang.org/x/perf/cmd/benchstat |
| Object pool | sync.Pool{New: func() any { โฆ }} | GC may evict any time; always Reset |
| String build | strings.Builder + Grow(n) | Single allocation at String() |
| Pre-alloc slice | make([]T, 0, n) | Avoids reallocation in append loop |
| Pre-alloc map | make(map[K]V, n) | Fewer rehash events |
| Struct size | unsafe.Sizeof(T{}) | Order fields largest-first to reduce padding |
| Goroutine count | runtime.NumGoroutine() | Monitor for leaks |
| Detect leaks in tests | goleak.VerifyNone(t) | go.uber.org/goleak |
| Force no inlining | //go:noinline directive | Above the function declaration |
| Alloc-free intโstr | strconv.AppendInt(buf[:0], n, 10) | Appends to existing buffer |
| GC before profile | runtime.GC() | Accurate heap snapshot |