Development of VoIP Voice Calls in Mobile Application
VoIP implementation in mobile application — not one task but several interconnected technical layers: signaling protocol, media transport, audio session management and system call API integration. Skip or simplify any layer — get call working in demo but breaking production: echo, audio dropout on incoming notification, inability to accept call from locked screen.
VoIP Call Architecture
Voice call consists of two independent streams:
Signaling — control channel: who calls, accept/decline, terminate. Transmitted via WebSocket, SIP or XMPP. Signaling messages small (JSON hundreds bytes) but require reliable delivery.
Media stream — audio between participants. Transmitted via WebRTC (RTP over UDP) or proprietary protocols (Twilio, Vonage). UDP tolerates packet loss for low latency — normal for voice. Latency more important than 1–2% packet loss.
Independent media layer implementation — WebRTC. Managed solutions — Twilio Voice, Vonage Voice, Agora. Difference in complexity, flexibility and cost.
iOS: CallKit — Indispensable
On iOS VoIP without CallKit technically possible but:
- App doesn't get audio session on incoming call on locked screen
- System doesn't show system incoming call screen (users used to it)
- After iOS 13 apps without CallKit don't get PushKit notifications for VoIP
CallKit — system framework for VOIP call integration into phone interface. Shows system incoming screen, manages audio session, supports Bluetooth/AirPods, displays calls in call history.
import CallKit
class CallManager: NSObject {
let provider: CXProvider
let callController = CXCallController()
init() {
let config = CXProviderConfiguration()
config.supportsVideo = false
config.maximumCallsPerCallGroup = 1
config.supportedHandleTypes = [.phoneNumber, .emailAddress]
provider = CXProvider(configuration: config)
super.init()
provider.setDelegate(self, queue: nil)
}
func reportIncomingCall(uuid: UUID, callerName: String) {
let update = CXCallUpdate()
update.remoteHandle = CXHandle(type: .generic, value: callerName)
update.hasVideo = false
provider.reportNewIncomingCall(with: uuid, update: update) { error in
// start answering call after system permission
}
}
}
PushKit for incoming calls in background. Unlike regular APNs push, PushKit wakes app instantly with high priority. But with iOS 13 Apple requires immediately calling reportNewIncomingCall on VoIP push receipt — otherwise crash. Can't make network request before CallKit call.
Android: ConnectionService and Telecom API
Android equivalent of CallKit — ConnectionService from android.telecom package. Lets app become "phone account" in system, show calls on lock screen, manage audio routing.
For background incoming calls — FCM push with priority: high. On Android 14+ background services limited but ForegroundService type phoneCall (added in Android 14) solves exactly this — no restrictions on launch on incoming call.
Audio session management via AudioManager:
val audioManager = context.getSystemService(Context.AUDIO_SERVICE) as AudioManager
audioManager.mode = AudioManager.MODE_IN_COMMUNICATION
audioManager.isSpeakerphoneOn = false
// on termination:
audioManager.mode = AudioManager.MODE_NORMAL
Bluetooth headsets — separate headache. BluetoothHeadset profile, SCO connection for voice (not A2DP!), BroadcastReceiver on ACTION_SCO_AUDIO_STATE_UPDATED. Without explicit SCO management audio goes through speaker even with connected Bluetooth headset.
WebRTC for Media Layer
If using WebRTC independently (not via Twilio/Vonage), need TURN server for NAT traversal. Without it calls work only in same network — classic demo bug breaking at client behind corporate NAT.
coturn — open source TURN server, deployed on VPS in hours. ICE configuration:
// Android WebRTC SDK
val iceServers = listOf(
PeerConnection.IceServer.builder("stun:stun.example.com:3478").createIceServer(),
PeerConnection.IceServer.builder("turn:turn.example.com:3478")
.setUsername("user")
.setPassword("password")
.createIceServer()
)
Codecs: Opus for audio (adaptive bitrate, works well on packet loss). WebRTC SDK includes it by default.
Typical Production Errors
Echo. Appears if AudioManager.MODE_IN_COMMUNICATION not set — system doesn't enable echo cancellation. WebRTC SDK includes software AEC but hardware (via mode) more reliable.
Call interrupted on incoming SMS. AVAudioSession on iOS loses focus. Subscribe to AVAudioSessionInterruptionNotification and reactivate session after interruption.
Latency > 300 ms. Usually — TURN relay instead of direct P2P. Check ICE candidate type in WebRTC stats: relay instead of host or srflx.
What's Included
Design architecture per requirements (managed SDK vs WebRTC), implement CallKit/ConnectionService integration, configure signaling protocol, media transport and TURN server. Test on real devices with different networks, Bluetooth headsets, interruptions.
Timeline: 2–5 weeks depending on selected stack and audio quality requirements.







