AI object tracking in mobile app video stream

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
AI object tracking in mobile app video stream
Complex
~1-2 weeks
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    761
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    649
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1071
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    884
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    466

AI Object Tracking in Video Streams for Mobile Apps

Object tracking is a separate task from detection. A detector says "there's a car here" on each frame independently. A tracker says "this is the same car #7 that was on the left in the previous frame." Loss of object identity is a typical error with naive approaches: the object exits the frame and returns—the tracker assigns it a new ID.

Classification of Tracking Tasks

SOT (Single Object Tracking) — tracking one selected object. User taps an object → app follows it. Applications: sports broadcasts, tracking a specific person in frame. Algorithms: SiamFC, OSTrack, STARK.

MOT (Multi-Object Tracking) — simultaneous tracking of all objects of a target class. Applications: visitor counting, traffic control, production conveyors. Algorithms: SORT, ByteTrack, StrongSORT, OC-SORT.

MOT: Detector + Tracker Pipeline

Standard pipeline for mobile:

// iOS: YOLOv8 detection + SORT tracking
class MultiObjectTracker {

    private let detector: YOLOv8Detector
    private let tracker: SORTTracker

    // SORT parameters—important to tune for your task
    init(targetClass: String,
         maxAge: Int = 10,          // frames without detection before track removal
         minHits: Int = 3,          // frames of detection to confirm track
         iouThreshold: Float = 0.3) {
        self.detector = YOLOv8Detector(targetClass: targetClass)
        self.tracker = SORTTracker(maxAge: maxAge,
                                   minHits: minHits,
                                   iouThreshold: iouThreshold)
    }

    func processFrame(_ pixelBuffer: CVPixelBuffer) async -> [TrackedObject] {
        // 1. Detection on current frame
        let detections = await detector.detect(pixelBuffer)

        // 2. Tracker update
        let tracks = tracker.update(detections: detections.map { det in
            Detection(bbox: det.boundingBox, confidence: det.confidence)
        })

        // 3. Convert to TrackedObject
        return tracks.map { track in
            TrackedObject(
                id: track.trackId,
                boundingBox: track.bbox,
                isConfirmed: track.hitStreak >= tracker.minHits,
                velocity: track.kalmanFilter.velocity  // from Kalman state
            )
        }
    }
}

maxAge = 10 means a track lives for 10 frames without detection (object behind obstacle). At 30 FPS, this is 333 ms—sufficient for most brief occlusions.

ByteTrack: Better Than SORT for Occlusions

SORT uses only high-confidence detections. ByteTrack uses ALL detections—including low-confidence ones—for association with existing tracks. This dramatically reduces track loss during occlusions:

// Android: ByteTrack association
class ByteTracker(
    private val trackThresh: Float = 0.5f,
    private val highThresh: Float = 0.6f,
    private val matchThresh: Float = 0.8f
) {
    private val trackedStracks = mutableListOf<STrack>()
    private val lostStracks = mutableListOf<STrack>()

    fun update(detections: List<Detection>): List<STrack> {
        // Split detections into high/low confidence
        val highDetections = detections.filter { it.confidence >= highThresh }
        val lowDetections = detections.filter { it.confidence in trackThresh..<highThresh }

        // 1. Associate high-confidence with active tracks
        val (matches1, unmatched_tracks1, unmatched_dets1) =
            linearAssignment(trackedStracks, highDetections, matchThresh)

        // 2. Associate low-confidence with unmatched tracks from step 1
        val (matches2, _, _) =
            linearAssignment(unmatched_tracks1, lowDetections, 0.5f)

        // 3. Initialize new tracks for unassociated high-conf detections
        val newTracks = unmatched_dets1.map { STrack(it) }

        return (matches1 + matches2).map { it.track } + newTracks
    }
}

SOT: Tap-to-Track

// iOS: user selects object with tap, app follows
class SingleObjectTracker {

    // Use Vision VNTrackObjectRequest
    private var trackingRequest: VNTrackObjectRequest?

    func initializeTracking(at point: CGPoint, in frame: CVPixelBuffer) {
        let observation = VNDetectedObjectObservation(
            boundingBox: CGRect(center: point, size: CGSize(width: 0.1, height: 0.1))
        )

        trackingRequest = VNTrackObjectRequest(
            detectedObjectObservation: observation
        ) { [weak self] request, _ in
            guard let obs = request.results?.first as? VNDetectedObjectObservation else { return }
            self?.delegate?.didUpdateTracking(boundingBox: obs.boundingBox,
                                              confidence: obs.confidence)
        }

        trackingRequest?.trackingLevel = .accurate  // vs .fast
    }

    func trackInFrame(_ pixelBuffer: CVPixelBuffer) {
        guard let request = trackingRequest else { return }
        let handler = VNImageRequestHandler(cvPixelBuffer: pixelBuffer)
        try? handler.perform([request])
    }
}

trackingLevel = .accurate uses a more heavyweight tracker (CorrelateBased vs Optical Flow). Difference: .fast — 50+ FPS, loses track at fast motion. .accurate — 20–30 FPS, more robust to fast objects. Choose based on your task.

Track Rendering

@Composable
fun TrackingOverlay(
    tracks: List<TrackedObject>,
    imageSize: Size,
    modifier: Modifier = Modifier
) {
    val colors = remember { generateTrackColors(maxTracks = 100) }

    Canvas(modifier = modifier) {
        tracks.forEach { track ->
            val color = colors[track.id % colors.size]
            val rect = track.boundingBox.toScreenRect(imageSize, size)

            // Bounding box
            drawRect(color = color, topLeft = rect.topLeft,
                     size = rect.size, style = Stroke(width = 3f))

            // ID badge
            drawIntoCanvas { canvas ->
                canvas.nativeCanvas.drawText(
                    "ID: ${track.id}",
                    rect.left + 4f,
                    rect.top + 20f,
                    Paint().apply { this.color = color.toArgb(); textSize = 32f }
                )
            }

            // Velocity vector (optional)
            if (track.velocity != null) {
                drawLine(
                    color = color.copy(alpha = 0.6f),
                    start = rect.center,
                    end = rect.center + track.velocity.toOffset(scale = 20f),
                    strokeWidth = 2f
                )
            }
        }
    }
}

Timeline Estimates

SOT (Vision VNTrackObjectRequest) with tap for object selection takes 2–3 days. MOT with YOLOv8 + ByteTrack, track rendering, multiple object classes, and iOS + Android support requires 1–2 weeks.