Developing Shaders for Mobile Games
A mobile GPU isn't a desktop GPU with lower clock speeds. The tile-based deferred rendering (TBDR) architecture on Apple GPUs (PowerVR legacy) and Arm Mali is fundamentally different from immediate mode rendering in desktop Nvidia/AMD. A shader written without understanding this difference performs 3–5 times slower on mobile than it could.
Architectural Limitations That Change Shader Approach
Fillrate and Overdraw. Mobile GPUs are sensitive to overdraw—when one pixel gets rasterized multiple times. Each extra draw call over already drawn pixels is expensive. For Unity: Sorting Layer + correct Queue in shader (Geometry before Transparent) allow the renderer to use early-z rejection. Transparent objects kill early-z—keep their count minimal.
Precision. On mobile, the difference between float (32-bit), half (16-bit), and fixed (10-bit, obsolete) is noticeable. Arm Mali and Adreno support native 16-bit arithmetic, and on them half runs twice as fast as float. Use float for vertex positions, half for colors and UVs. In GLSL ES 3.0 this is highp vs mediump.
Texture Sampling. Mip-mapping is mandatory. Without it, texture samplers work with full resolution even for tiny objects on screen—a bandwidth killer on mobile with shared CPU/GPU memory.
Shaders in Unity: URP vs Built-in
Universal Render Pipeline is standard for mobile Unity games. Shader Graph enables visual shader building, but bottlenecks still require HLSL edits. A custom SubGraph in Shader Graph is a reusable block compiled once.
Example of a simple dissolve shader in URP HLSL:
half4 frag(Varyings input) : SV_Target {
half4 baseColor = SAMPLE_TEXTURE2D(_BaseMap, sampler_BaseMap, input.uv);
half noiseValue = SAMPLE_TEXTURE2D(_NoiseMap, sampler_NoiseMap, input.uv).r;
// cutout by threshold with soft edge
half edge = smoothstep(_Threshold - _EdgeWidth, _Threshold + _EdgeWidth, noiseValue);
clip(noiseValue - _Threshold); // discard pixels
half3 edgeColor = lerp(baseColor.rgb, _EdgeColor.rgb, edge);
return half4(edgeColor, baseColor.a);
}
clip() is fast on mobile—TBDR architecture discards the tile before rasterization if all pixels are clipped.
Post-effects: What's Acceptable on Mobile
Full-screen post-effects (bloom, depth of field, motion blur)—expensive. Strategy:
-
Bloom—use
Kawase blurinstead of Gaussian. Kawase uses fewer samples for comparable visual quality. In URPBloomoverride withHigh Quality Filteringoff for mobile preset. -
Color Correction—
Tonemapping+Color Adjustmentsnearly free, done in one pass. - DOF—only for cinematic scenes, not gameplay. Bokeh DOF on mobile—roughly 3 ms extra on Snapdragon 778G.
-
Motion Blur—better to simulate via
Trail Rendereron specific objects than a full-screen velocity buffer.
Particle Shaders and VFX
VFX Graph requires Compute Shader—supported on Vulkan (Android 7+) and Metal (iOS 11+). The old Particle System works through CPU, worse with many particles. For mobile: VFX Graph on modern devices, CPU particles as fallback for budget phones.
Particle shaders should be as simple as possible: Unlit, Additive blending, half precision, no Lighting. Lit shaders on 500 particles—common mistake costing –10 FPS.
Godot and GLSL ES
In Godot 4, shaders use GLSL with proprietary extensions (uniform, varying replaced with uniform and out). Godot's shader language compiles to SPIR-V (Vulkan) or GLSL ES (compatibility). For mobile export, Godot uses Compatibility renderer based on OpenGL ES 3.0—wider support than Vulkan on old Android.
Estimate
Shader volume depends heavily on visual style: stylized 2D with cel-shading—1–2 custom shaders, realistic 3D with PBR, water, atmosphere—10+. Timeline: 3 days to 4–6 weeks. Cost calculated individually.







