CPU and GPU Resource Profiling for Games
When a game "lags," first instinct is to open Stats in Game View and look at fps. This is useless. Stats shows averaged value, doesn't see spikes, doesn't separate CPU from GPU, doesn't show which exact code eats time. For real diagnosis need Profiler in Standalone mode on target hardware.
Difference between "42 fps average" and "42 fps with drops to 18 every third frame" — it's difference between comfortable play and feeling that game is broken. Only visible through frame time graph, not through fps-counter.
CPU bottleneck vs GPU bottleneck — how to distinguish
First question for any optimization: where is the bottleneck. If CPU throttles → GPU waits. If GPU throttles → CPU waits. Mixing optimization methods without understanding this wastes time.
Diagnosis in Unity Profiler: open CPU Usage module, look at Gfx.WaitForPresent or Graphics.PresentAndSync. If these markers take 8+ ms out of 16.6 ms frame budget — you are GPU-bound. CPU already gave everything to GPU and just waits.
If PlayerLoop, Physics.Processing or your scripts take most of frame time, while Gfx.WaitForPresent is minimal — you are CPU-bound.
These are fundamentally different optimization paths. GPU-bound: reduce shader complexity, overdraw, fill rate. CPU-bound: optimize scripts, use Job System, reduce Update() calls.
Deep CPU profiling
Deep Profile in Unity — powerful tool but with overhead: it instruments every method call, and itself slows game down. Use only for point diagnosis of specific subsystem, not as permanent mode.
What to look for in CPU profile:
Managed heap allocations in Update(). Coloured Marker in Profiler — GC.Alloc. Any allocation in hot path (Update, FixedUpdate, OnCollisionEnter) potentially triggers GC.Collect in future. GC.Collect on mobile — 2–20 ms spike. Fixed through reference caching, object pools, string interning, replacing LINQ with manual loops.
Physics.Processing takes > 4 ms. Too complex Colliders (Mesh Collider instead of Capsule), too small Fixed Timestep, too many Rigidbody with ContinuousCollisionDetection. First check Physics Debugger: visualize sleep state of all Rigidbody, find those not sleeping without reason.
NavMesh.CalculatePath every frame for 40 agents. NavMeshAgent updates by default in each FixedUpdate. For large agent count — split into groups with update every-other-frame or every N frames depending on distance to player.
GPU profiling
RenderDoc — mandatory tool for any serious GPU profiling. Attaches to Android/PC, captures one frame, shows every draw call with GPU time, Input/Output textures, pipeline state. Exactly where you see which shader eats 60% GPU time.
Unity Frame Debugger — easier to use but less detailed. Shows render order, why objects aren't batched, render target state. For initial diagnosis sufficient enough.
On mobile devices — ARM Streamline (Mali) or Snapdragon Profiler (Adreno). Show metrics unavailable in Unity: memory bandwidth, ALU utilization, texture cache miss rate. Exactly texture cache miss (many small textures instead of atlas) or high bandwidth (textures without mipmaps) often true cause of lag when Draw Calls seemed fine.
Real case: mobile arcade runner, 45 fps on Snapdragon 730. CPU profile — clean, scripts take < 3 ms. GPU — suspiciously high fill rate by Snapdragon Profiler. RenderDoc showed: custom distortion shader on water sampled GrabPass (Screen Space Texture) every frame, plus was in Transparent queue on top of three other blended layers. Replacing GrabPass with pre-baked cubemap texture for background reflections + moving water mesh lower in Z-order removed 11 ms from GPU time. Result: 58–60 fps stable.
Profiling process
First define target metrics: fps budget (30/60/120), acceptable frame time (16.6/8.3 ms), platform. Without target metrics unclear what counts as "good enough."
Profile in several scenarios: idle (character standing), peak load (combat with max effects), scene transitions. Each scenario — separate Profiler capture.
Create report with specific bottlenecks, their weight in ms, and elimination suggestions. Prioritize by effort-to-gain ratio.
| Task Scope | Estimated Timeline |
|---|---|
| CPU/GPU profiling + report (1–2 scenes) | 2–4 days |
| Deep audit + fix top-3 bottlenecks | 1–2 weeks |
| Comprehensive optimization for specific platform | 3–8 weeks |
Cost determined after learning project and target platforms.





