Google's Performance Optimization Approach: Practical Techniques and Engineering Thinking This classic technical guide, written by two legendary Google engineers, Jeff Dean and Sanjay Ghemawat, summarizes a set of principles and specific techniques for performance optimization based on Google's years of practical experience in building high-performance software. Core concept: Re-examining "premature optimization" The article begins by correcting a common misunderstanding in the industry regarding Donald Knuth's famous quote, "Premature optimization is the root of all evil." • The critical 3%: Donald's original intention was not to waste time on non-critical code, but we must never give up the opportunity to optimize the critical 3% of code paths. • Engineering literacy: In mature engineering disciplines, a 12% performance improvement is a huge achievement and should not be regarded as insignificant. • Default to efficiency: Don't always use this as an excuse to write inefficient code. When writing code, you should default to choosing a more efficient alternative without significantly increasing code complexity or reducing readability. Methodology: Estimation and Measurement · Cultivating Intuition: Excellent engineers need the ability to "do the math on the other side of the envelope." You need a clear understanding of the time-consuming processes of low-level computer operations. This intuition can help you directly filter out inefficient design solutions. • Measurement is king: Don't blindly guess the bottleneck; performance analysis is the primary tool. • Facing a "flat" profile: When you find that there are no obvious "hot spots" in the performance graph, it means that the low-hanging fruit has been picked. At this point, you need to focus on the accumulation of small optimizations, the adjustment of loop structures, or the algorithm reconstruction from a higher level. The practical technical guide article provides numerous specific code changes as examples, mainly covering the following dimensions: A. Memory and Data Structures (This is the core of optimization) • Compact layout: Cache is extremely valuable in modern CPUs. Optimizing memory layout, so that frequently accessed data is adjacent in physical memory, can significantly reduce cache misses. • Use indexes instead of pointers: On a 64-bit machine, a pointer occupies 8 bytes. If possible, use smaller integer indices instead of pointers, which not only saves memory but also maintains data continuity. • Flattened storage: Avoid using node-based containers (such as std::map, std::list), as they can cause memory fragmentation. Prefer contiguous memory containers (such as std::vector, absl::flat_hash_map). • Small object optimization: For collections that typically have few elements, use containers with "inline storage" (such as absl::InlinedVector) to avoid allocating memory on the heap. B. API Design and Usage • Batch Interface: Design an interface that supports processing multiple elements at once. This reduces function call overhead, and more importantly, it amortizes the cost of acquiring locks. • View type: Use std::string_view or absl::Span for function parameters whenever possible to avoid unnecessary data copying. C. Reduce memory allocation. Memory allocation is not cheap: it not only consumes allocator time, but also violates cache locality. • Reserve space: If you roughly know the size of the vector, be sure to call .reserve() first to avoid multiple copies due to resizing. • Object reuse: In a loop, hoist the declaration of temporary variables outside the loop to avoid repeated construction and destruction. • Arena memory pool: For a complex set of objects with a consistent lifecycle, the Arena allocator can greatly improve performance and simplify destruction. D. Algorithm Improvement: This is the "nuclear weapon" for improving performance. Optimizing an O(N²) algorithm to O(N log N) or O(N) yields benefits far exceeding code-level fine-tuning. • Case Study: The article demonstrates how to significantly reduce time complexity by using a simple hash table lookup instead of the sorted set intersection operation. E. Avoid Unnecessary Efforts: Use Fast Paths: Write dedicated processing logic for the most common scenarios. For example, when processing strings, if they consist entirely of ASCII characters, use the fast path to avoid entering complex UTF-8 decoding logic. Read the original text
Loading thread detail
Fetching the original tweets from X for a clean reading view.
Hang tight—this usually only takes a few seconds.
