AI object counting in mobile camera frame

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.
Development and support of all types of mobile applications:
Information and entertainment mobile applications
News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators
E-commerce mobile applications
Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.
Business process management mobile applications
CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems
Electronic services mobile applications
Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Showing 1 of 1 servicesAll 1735 services
AI object counting in mobile camera frame
Complex
~1-2 weeks
FAQ
Our competencies:
Development stages
Latest works
  • image_mobile-applications_feedme_467_0.webp
    Development of a mobile application for FEEDME
    756
  • image_mobile-applications_xoomer_471_0.webp
    Development of a mobile application for XOOMER
    624
  • image_mobile-applications_rhl_428_0.webp
    Development of a mobile application for RHL
    1052
  • image_mobile-applications_zippy_411_0.webp
    Development of a mobile application for ZIPPY
    947
  • image_mobile-applications_affhome_429_0.webp
    Development of a mobile application for Affhome
    862
  • image_mobile-applications_flavors_409_0.webp
    Development of a mobile application for the FLAVORS company
    445

AI Object Counting from Camera Frame in Mobile Apps

Counting objects via camera seems simple but hides several non-trivial issues: overlapping objects, objects at different scales in one frame, and the main trap—double counting when the camera moves. An industrial warehouse, a herd of animals, coins on a table—each scenario has its own characteristics.

Two Approaches: Detection vs Density Estimation

Detection-based counting — YOLOv8 or RT-DETR detects each object; count = number of detections. Works with low density (up to 50–100 objects per frame) when objects don't overlap heavily.

Density map estimation — CNN predicts a density map; count = integral of the map. Used for high density: crowds, grain in a bin, cells under microscope. CSRNet, DMCount, BL-model are current architectures.

// iOS: method selection based on expected density
enum CountingStrategy {
    case detection(model: VNCoreMLModel)      // < 100 objects
    case densityMap(model: VNCoreMLModel)     // > 100 objects per frame
    case hybrid                                // adaptive selection
}

class AdaptiveObjectCounter {

    func selectStrategy(for objectClass: CountableObject) -> CountingStrategy {
        switch objectClass {
        case .vehicle, .person_sparse:
            return .detection(model: vehicleDetector)
        case .crowd, .grain, .cell:
            return .densityMap(model: densityEstimator)
        case .product_shelf:
            return .hybrid
        }
    }
}

Detection-Based: Implementation with Deduplication

class DetectionCounter {

    func count(in sampleBuffer: CMSampleBuffer,
               targetClass: String) async throws -> CountResult {
        guard let pixelBuffer = CMSampleBufferGetImageBuffer(sampleBuffer) else {
            throw CounterError.invalidFrame
        }

        let request = VNCoreMLRequest(model: detectionModel)
        request.imageCropAndScaleOption = .scaleFill

        try VNImageRequestHandler(cvPixelBuffer: pixelBuffer).perform([request])

        let observations = (request.results as? [VNRecognizedObjectObservation]) ?? []

        // Filter by class and confidence
        let targetObjects = observations.filter { obs in
            obs.labels.first?.identifier == targetClass &&
            obs.confidence >= 0.4
        }

        // NMS to eliminate duplicate bounding boxes
        let deduplicated = applyNMS(targetObjects, iouThreshold: 0.45)

        return CountResult(
            count: deduplicated.count,
            detections: deduplicated,
            confidence: deduplicated.map { $0.confidence }.average()
        )
    }

    private func applyNMS(_ observations: [VNRecognizedObjectObservation],
                          iouThreshold: Float) -> [VNRecognizedObjectObservation] {
        // Sort by confidence (descending)
        let sorted = observations.sorted { $0.confidence > $1.confidence }
        var kept: [VNRecognizedObjectObservation] = []

        for obs in sorted {
            let overlapping = kept.contains { existingObs in
                iou(obs.boundingBox, existingObs.boundingBox) > iouThreshold
            }
            if !overlapping { kept.append(obs) }
        }
        return kept
    }
}

Vision framework does not apply NMS automatically with VNCoreMLRequest—you must do it manually, otherwise objects at crop boundaries are counted twice.

Density Map for High Density

// Android: density map estimation via TFLite
class DensityMapCounter(context: Context) {

    private val interpreter: Interpreter by lazy {
        val model = FileUtil.loadMappedFile(context, "csrnet_lite.tflite")
        Interpreter(model, Interpreter.Options().apply {
            addDelegate(GpuDelegate())
            numThreads = 4
        })
    }

    fun estimate(bitmap: Bitmap): Int {
        // Model input size—typically 512×512 or multiple of 16
        val resized = Bitmap.createScaledBitmap(bitmap, 512, 512, true)
        val inputBuffer = TensorImage.fromBitmap(resized).buffer

        // Output tensor—density map at same resolution
        val outputBuffer = TensorBuffer.createFixedSize(
            intArrayOf(1, 512, 512, 1), DataType.FLOAT32
        )

        interpreter.run(inputBuffer, outputBuffer.buffer)

        // Sum across all pixels of density map = estimated count
        val densitySum = outputBuffer.floatArray.sum()

        // Scaling: sum corresponds to object count
        return densitySum.roundToInt()
    }
}

Counting with Camera Motion: Tracking

If the user smoothly pans the camera (warehouse, auditorium), tracking is needed to avoid counting the same objects twice:

class TrackingObjectCounter {

    private var tracker = ByteTracker()  // BYTE tracking algorithm
    private var countedIds: Set<Int> = []  // unique IDs in session

    func processFrame(_ detections: [Detection]) -> TrackingCountResult {
        let tracks = tracker.update(detections: detections)

        // New IDs = new objects entering frame
        let newIds = tracks.map { $0.trackId }.filter { !countedIds.contains($0) }
        countedIds.formUnion(newIds)

        return TrackingCountResult(
            currentFrameCount: tracks.count,    // in frame now
            totalUniqueCount: countedIds.count  // total in session
        )
    }
}

ByteTracker is one of the best tracking algorithms for this task, robust to occlusions.

Timeline Estimates

Detection-based counting with a ready model (single object class) and counter UI takes 3–5 days. Adaptive system with detection + density map, tracking for camera motion, multiple object classes, and iOS + Android support requires 1–2 weeks.