Implementing AI Scene Recognition for Smart Home Automation in Mobile App
The task sounds nice: phone "sees" what's happening in room and automatically controls lights, climate, blinds. In practice — three independent problems: reliable scene recognition on-device without server, low-latency IoT device control, and automation logic that doesn't annoy with false triggers.
On-Device Scene Recognition: CoreML vs TFLite
Sending camera frames to server for classification is bad for home automation. 200–500ms latency is unacceptable; plus privacy concerns. Everything must work locally.
iOS: CoreML + Vision framework
Apple Vision Scene Classification — built-in VNClassifyImageRequest. Works offline, returns VNClassificationObservation with confidence score:
import Vision
import CoreML
class SceneClassifier {
private lazy var request: VNClassifyImageRequest = {
let r = VNClassifyImageRequest { [weak self] request, error in
self?.handleResults(request.results as? [VNClassificationObservation])
}
return r
}()
func classify(pixelBuffer: CVPixelBuffer) {
let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer, options: [:])
try? handler.perform([request])
}
private func handleResults(_ results: [VNClassificationObservation]?) {
guard let top = results?.filter({ $0.confidence > 0.6 }).first else { return }
// top.identifier: "bedroom", "kitchen", "living_room", "bathroom"
SmartHomeAutomation.shared.triggerScene(top.identifier)
}
}
VNClassifyImageRequest returns 3000+ categories — home automation needs only ~20. Filter by confidence > 0.6 and relevant identifiers. Don't process frames faster than every 2–3 seconds — Vision runs at 30 FPS, but classifying every 2 seconds is enough and saves battery.
For custom scenarios (recognizing specific furniture, people in frame) — Create ML for MobileNetV3 fine-tuning on custom dataset. Export to .mlpackage, ~4 MB size.
Android: ML Kit Scene Detection + TFLite
ML Kit Subject Segmentation and Google ML Kit Scene Detection work offline on device:
val image = InputImage.fromMediaImage(mediaImage, rotation)
val labeler = ImageLabeling.getClient(
ImageLabelerOptions.Builder()
.setConfidenceThreshold(0.65f)
.build()
)
labeler.process(image)
.addOnSuccessListener { labels ->
val sceneLabel = labels.firstOrNull { it.text in SMART_HOME_SCENES }
sceneLabel?.let { automationEngine.trigger(it.text, it.confidence) }
}
SMART_HOME_SCENES — set of "bedroom", "kitchen", "living room", "bathroom", "office". For custom models — TFLite Interpreter with .tflite file optimized for specific devices via TensorFlow Model Maker.
Personalized model via TFLite Transfer Learning: 500–1000 photos per class, MobileNetV2 fine-tuning, INT8 quantized export — model size ~2 MB, inference < 50ms on Snapdragon 778G.
Integration with IoT Automation
Scene recognition is just the trigger. Next is automation logic without false positives.
Debounce and confidence threshold. Classifier may waver between "bedroom" and "living_room" in poor light. Pattern: scene change counts only if one category dominates 3 seconds straight with confidence > 0.7:
class SceneDebouncer(private val windowMs: Long = 3000) {
private var currentScene: String? = null
private var firstSeenAt: Long = 0
fun process(scene: String, confidence: Float): String? {
if (confidence < 0.7f) return null
val now = System.currentTimeMillis()
if (scene != currentScene) {
currentScene = scene
firstSeenAt = now
return null
}
return if (now - firstSeenAt >= windowMs) scene else null
}
}
IoT commands via MQTT or Matter. After scene confirmation, publish command to MQTT broker or send via Matter controller:
// MQTT
mqttClient.publish(
"home/automation/scene",
MqttMessage("""{"scene":"bedroom","timestamp":${System.currentTimeMillis()}}""".toByteArray()),
qos = 1,
retained = false
)
// Matter SDK (via Google Home Mobile SDK)
val deviceController = ChipDeviceController()
deviceController.sendCommand(
nodeId = lightbulbNodeId,
endpointId = 1,
clusterId = OnOffCluster.CLUSTER_ID,
commandId = OnOffCluster.Commands.On.ID,
tlvData = byteArrayOf()
)
Schedule and context. Scene automation should consider time of day: "bedroom" at 23:00 → dim lights, "bedroom" at 7:00 → open blinds. Context added via TimeOfDay filter in automation rules at app level.
Privacy: Camera in Home Context
App with constant camera access — red flag for users and App Store/Google Play moderators. Rules:
- Classification only when user explicitly enables "Scene Detection" mode
- No frames saved or leave device
- On iOS —
NSCameraUsageDescriptionwith explicit explanation of local processing - Privacy manifest in iOS 17+ with
NSPrivacyAccessedAPICategoryCameradeclaration
App Store rejection for 4.3 Spam or privacy violations from opaque camera use is real risk. App Privacy Report description must be honest.
Stages and Timeline
Audit requirements: target devices, IoT protocols (MQTT, Matter, Zigbee via hub, HomeKit), set of trigger scenes. Develop classification model: built-in or custom with fine-tuning. Integrate with MQTT broker or Matter SDK. Implement debounce logic and automation. Test in real conditions — different lighting, camera angles, mixed scenes.
Basic recognition with 5–10 scenes and MQTT commands: 2–4 weeks. Custom ML model with fine-tuning + full Matter/HomeKit integration: 2–3 months. Cost depends on supported IoT protocols and automation logic complexity.







