AI 3D Model Generation from Photo in Mobile Apps
Generating a 3D object from one shot on mobile—one of the most resource-intensive tasks in mobile AI. Classical approaches require dozens of photos (photogrammetry) or special equipment (LiDAR). Neural network generation from single photo is real in 2024, but with significant quality limitations when fully on-device.
Architectural Variants
Fully on-device—lightweight models like DepthPro (Apple, 2024) for depth estimation + point cloud, or One-2-3-45 mobile edition. Get rough 3D structure, suitable for AR preview but not professional export.
Hybrid—on-device depth map and object segmentation, on server full 3D reconstruction via Zero123++, One-2-3-45, or TripoSR. Server returns .obj or .glb file.
LiDAR-augmented—iPhone 12 Pro+ and iPad Pro have LiDAR scanner. ARKit + ARMeshAnchor get real scene mesh. LiDAR mesh + camera texture + AI texture inpainting gives quality result without server.
On-Device: DepthPro for Initial Depth
Apple DepthPro (2024)—Foundation Model for metric depth estimation. Converts to Core ML:
let model = try DepthPro(configuration: MLModelConfiguration())
// Input image → depth map
let inputImage = try MLFeatureValue(cgImage: sourceImage.cgImage!, constraint: nil)
let prediction = try model.prediction(image: inputImage)
// prediction.depth—MLMultiArray with metric depth values (in meters)
let depthArray = prediction.depth // shape [1, H, W]
Depth map → point cloud: for each pixel (x, y) with known depth Z compute 3D coordinate via pinhole camera model with focal length from EXIF. Get point cloud.
Visualize point cloud in AR via RealityKit and ModelEntity with custom MeshDescriptor:
var descriptor = MeshDescriptor(name: "pointCloud")
descriptor.positions = MeshBuffers.Positions(points) // [SIMD3<Float>]
descriptor.primitives = .points(Array(0..<points.count))
let mesh = try MeshResource.generate(from: [descriptor])
let entity = ModelEntity(mesh: mesh, materials: [UnlitMaterial(color: .white)])
Not full 3D model with mesh, but point cloud—visually works for demo, needs meshing for export.
Meshing: Poisson or Marching Cubes
Point cloud → polygonal mesh via Poisson Surface Reconstruction algorithm. On mobile via Open3D (C++ library via Objective-C bridge) or custom Metal compute shaders. Poisson reconstruction needs normals at each point; estimate normals from local neighborhood via PCA.
Non-trivial on mobile: Open3D compiled for iOS/Android—~15 MB binary, requires C++17, runs background thread. Result—.obj file with mesh.
LiDAR Path: ARKit ARMeshAnchor
On iPhone with LiDAR, most reliable—ARKit:
let configuration = ARWorldTrackingConfiguration()
configuration.sceneReconstruction = .meshWithClassification
// In ARSession delegate
func session(_ session: ARSession, didUpdate anchors: [ARAnchor]) {
for anchor in anchors.compactMap({ $0 as? ARMeshAnchor }) {
let geometry = anchor.geometry
// geometry.vertices, geometry.faces, geometry.normals—ready mesh
exportMesh(geometry: geometry, transform: anchor.transform)
}
}
ARMeshAnchor.geometry.vertices—ARGeometrySource with Metal buffer. Export to .obj:
func exportToOBJ(geometry: ARMeshGeometry, transform: simd_float4x4) -> String {
var obj = ""
let vertices = geometry.vertices
// Iterate MTLBuffer directly via withUnsafeBytes
vertices.buffer.contents().withMemoryRebound(to: SIMD3<Float>.self, capacity: vertices.count) { ptr in
for i in 0..<vertices.count {
let v = ptr[i]
let world = transform * SIMD4<Float>(v.x, v.y, v.z, 1)
obj += "v \(world.x) \(world.y) \(world.z)\n"
}
}
// Similarly for faces (indices)
return obj
}
Mesh texturing—project video frame onto mesh via UV-mapping. Separate task; without it mesh stays gray.
Server Generation: TripoSR and Zero123++
For high quality without LiDAR—server pipeline. TripoSR (Stability AI, 2024): takes one photo, generates .obj in 0.5–1 second on A10. API:
func generateModel(from image: UIImage) async throws -> URL {
let imageData = image.jpegData(compressionQuality: 0.9)!
var request = URLRequest(url: URL(string: "https://api.example.com/triposr")!)
request.httpMethod = "POST"
request.setValue("multipart/form-data; boundary=\(boundary)", forHTTPHeaderField: "Content-Type")
// ... upload + poll
}
Result—.glb file, load in RealityKit via Entity.loadModel(named:) or via ModelEntity(mesh: try .loadModel(contentsOf: url)).
AR Preview of Result
Any variant ends same: show 3D object in AR via RealityKit/ARSCNView. User can "place" object on real surface, rotate, scale. Covers scenario "see how furniture looks in room" or "show product in AR".
Export: .usdz for iOS (native Apple format, supports AR Quick Look), .glb for Android and web.
Process
Choose architecture per task (LiDAR/on-device depth/server), implement capture and processing pipeline, AR preview, export to required formats. Separate—test on complex objects: glass surfaces, thin details, monotone colors.
Timeline Estimates
LiDAR-based scanning with iOS export takes 3–5 weeks. Full pipeline with on-device depth + server reconstruction + AR preview on both platforms requires 8–14 weeks.







