Mobile Game Rendering Optimization
On a Samsung Galaxy A52 (Adreno 618), a game holds 28–32 FPS at target 60. On Xiaomi Redmi Note 11 with Helio G96 and Mali-G57—stable 55–60 FPS. Different performance on similarly-priced devices is typical for mobile gamedev. The reason: optimization was done on a flagship, and mid-range GPUs with different architectures show a different picture.
Diagnosis: What's the Bottleneck
CPU-bound vs GPU-bound
Before optimizing, identify the bottleneck. In Unity: Frame Debugger + Profiler. Enable Profile GPU in Android Player Settings and check Profiler → GPU. If GPU time per frame approaches 16ms (60 FPS) but CPU time is significantly less—GPU-bound. If opposite—CPU-bound.
On Unreal Engine: stat GPU in console, ProfileGPU command, RenderDoc for frame capture. r.ScreenPercentage 50—quick test: if FPS jumps on resolution halving, it's GPU-bound.
// Unreal console commands for diagnosis
stat unit // overall CPU/GPU/frame time breakdown
stat drawcalls // draw calls this frame
r.ShowFlag.Rendering 1 // detailed rendering stats
Draw Calls
On mobile, draw call overhead exceeds PC/console. 500+ draw calls per frame—red zone for mid-range Android. Each unique material = separate draw call. Each MeshRenderer with unique material = another.
Static batching (Unity): objects with identical material combine into one mesh. Requirement—same Material asset (not just identical settings). Mark as Static in Inspector. Works automatically on build.
GPU Instancing: for repeated objects (grass, trees, enemies of same type):
// Material must support instancing
material.enableInstancing = true;
// Draw 1000 instances in one call
Graphics.DrawMeshInstanced(mesh, 0, material, matrices, 1000);
SRP Batcher (Unity URP/HDRP): automatically batches objects with different materials but same shader. Enable in URP Asset → SRP Batcher = enabled. Simplest way to cut draw calls without manual batching.
Shaders for Mobile GPUs
Tile-Based Rendering (TBIMR)
Mobile GPUs (Adreno, Mali, PowerVR, Apple) use Tile-Based Immediate Mode Rendering. Screen divides into tiles, each renders completely in fast on-chip memory. This means:
-
Framebuffer fetch—reading from current framebuffer within a tile—practically free. Use for deferred lighting:gl_LastFragDatain GLSL (EXT_shader_framebuffer_fetch). -
Depth pre-passon mobile—often unnecessary overhead, TBIMR efficiently handles depth test within tile. -
Discardin fragment shader (alpha-test, clip)—on tile-based GPU kills early depth test for entire tile. Replace with alpha-blend or alpha-to-coverage where possible.
Precision Qualifiers in GLSL/Metal
// SLOW — highp everywhere by default
uniform highp mat4 ModelMatrix;
varying highp vec2 TexCoord;
// FAST — minimum necessary precision
uniform highp mat4 ModelMatrix; // matrices—highp required
varying mediump vec2 TexCoord; // UV coords—mediump sufficient
varying lowp vec4 VertexColor; // color—lowp
On Mali, switching from highp to mediump for texture samplers—10 to 25% fragment shader boost.
ALU vs Texture Fetch
On most mobile GPUs, texture fetch is cheaper than heavy ALU operations (sin, pow, sqrt). Pre-baked lookup tables in texture are faster than shader computation:
// Slow: compute fresnel in shader
float fresnel = pow(1.0 - dot(viewDir, normal), 5.0);
// Fast: lookup texture
float fresnel = texture2D(fresnelLUT, vec2(dot(viewDir, normal), roughness)).r;
Resolution and Dynamic Resolution Scaling
Rendering at native iPhone 15 Pro resolution (2556×1179)—overkill for intensive mobile game. Standard practice: render scale 0.7–0.85 from native with upscaling.
Unity URP Dynamic Resolution:
ScalableBufferManager.ResizeBuffers(0.75f, 0.75f); // 75% of native
Unreal Mobile Super Resolution (MSR)—built-in temporal upscaler for mobile since Unreal 5.1+. r.Mobile.TemporalAA 1. Quality near-native with less GPU load.
Adaptive Performance (Samsung Game SDK + Unity): auto-reduces load on overheat:
using UnityEngine.AdaptivePerformance;
var ap = Holder.Instance;
ap.ThermalStatus.ThermalMetrics // current temperature
ap.PerformanceStatus.PerformanceMetrics // bottleneck info
Case: 40 → 58 FPS on Adreno 618
Runner game: on Galaxy A52—40 FPS. AGI profiling showed: Fragment ALU 87%, fragment bandwidth overloaded. Three changes:
- Water shader: replaced
pow(fresnel, 5.0)with LUT texture → -8ms GPU -
highp→mediumpfor all texture samplers → -4ms GPU - Dynamic resolution 0.80 vs native → -6ms GPU
Result: 40 to 58 FPS without visual style change. Pro devices unchanged, already 60 FPS with headroom.
Timeline
Profiling and rendering analysis—2–3 days. Shader optimization, batching, dynamic resolution—1–3 weeks depending on project state.







