Document Upload and Indexing for RAG in Mobile App

NOVASOLUTIONS.TECHNOLOGY is engaged in the development, support and maintenance of iOS, Android, PWA mobile applications. We have extensive experience and expertise in publishing mobile applications in popular markets like Google Play, App Store, Amazon, AppGallery and others.

8+Years of workmore info 900+Completed projectsmore info 100+In house employeesmore info 19+Partnersmore info

Development and support of all types of mobile applications:

Information and entertainment mobile applications

News apps, games, reference guides, online catalogs, weather apps, fitness and health apps, travel apps, educational apps, social networks and messengers, quizzes, blogs and podcasts, forums, aggregators

E-commerce mobile applications

Online stores, B2B apps, marketplaces, online exchanges, cashback services, exchanges, dropshipping platforms, loyalty programs, food and goods delivery, payment systems.

Business process management mobile applications

CRM systems, ERP systems, project management, sales team tools, financial management, production management, logistics and delivery management, HR management, data monitoring systems

Electronic services mobile applications

Classified ads platforms, online schools, online cinemas, electronic service platforms, cashback platforms, video hosting, thematic portals, online booking and scheduling platforms, online trading platforms

These are just some of the types of mobile applications we work with, and each of them may have its own specific features and functionality, tailored to the specific needs and goals of the client.

Offered services

Showing 1 of 1 servicesAll 1735 services

Document Upload and Indexing for RAG in Mobile App

Complex

~5 business days

FAQ

Our competencies:

Free consultation

Book a free consultation if you have any questions. A dedicated specialist will advise you.

Cost calculation

If you know what exactly you need to develop, or you already have a ready-made technical task.

Development stages

Latest works

Development of a mobile application for FEEDME
756
Development of a mobile application for XOOMER
624
Development of a mobile application for RHL
1054
Development of a mobile application for ZIPPY
947
Development of a mobile application for Affhome
862
Development of a mobile application for the FLAVORS company
445

Show more works

Document Upload and Indexing for RAG in Mobile Applications

User attaches a PDF from Files.app or gallery, hits Upload, and within seconds can ask questions about the document. Behind those seconds lies a pipeline of upload, parsing, chunking, embedding creation, and writing to vector DB. Each stage has its bottlenecks.

File Upload from Mobile: Technical Details

Android. ActivityResultContracts.GetContent() with "application/pdf" or "*/*" is the right approach for Android 13+. You get Uri type content://. For server upload, you need InputStream:

val uri: Uri = // from ActivityResult
val bytes = contentResolver.openInputStream(uri)?.use { it.readBytes() }
    ?: throw IOException("Failed to open file")

// Multipart upload via OkHttp
val requestBody = MultipartBody.Builder()
    .setType(MultipartBody.FORM)
    .addFormDataPart("file", filename,
        bytes.toRequestBody("application/octet-stream".toMediaType()))
    .build()

For large files (50+ MB) — chunked upload. Don't read entire file into ByteArray at once: on devices with 2 GB RAM this causes OutOfMemoryError. Use InputStream directly with OkHttp RequestBody via custom writeTo.

iOS. UIDocumentPickerViewController with UTType.pdf, UTType.plainText, etc. URL type file://. For upload:

let data = try Data(contentsOf: fileURL)
// For large files — URLSession uploadTask with stream
let request = URLRequest(url: uploadEndpoint)
let (_, response) = try await URLSession.shared.upload(for: request, from: data)

Data(contentsOf:) for files > 20 MB is a bad idea on iOS. Use URLSession.shared.uploadTask(with:fromFile:) directly — it reads file in chunks, not loading into memory.

Upload Progress. URLSession provides uploadProgress (iOS), OkHttp provides RequestBody.writeTo with CountingOutputStream (Android). Progress bar during document upload is mandatory: user must understand what's happening with their 10-MB file.

Server Pipeline: From File to Vectors

After receiving file, backend runs async pipeline. Sync response to client: {"job_id": "abc123", "status": "processing"}. Client polls status or receives push.

# FastAPI + Celery task
@app.post("/api/documents")
async def upload_document(file: UploadFile, user_id: str = Depends(get_user_id)):
    # Save file
    file_path = save_to_storage(await file.read(), file.filename)

    # Start async processing
    job = process_document.delay(file_path, user_id, file.content_type)
    return {"job_id": job.id, "status": "processing"}

@celery.task
def process_document(file_path: str, user_id: str, content_type: str):
    # 1. Parsing
    text = extract_text(file_path, content_type)
    # 2. Chunking
    chunks = split_into_chunks(text, chunk_size=500, overlap=50)
    # 3. Embeddings batch
    embeddings = create_embeddings_batch(chunks)
    # 4. Upsert to vector DB
    upsert_to_vector_store(chunks, embeddings, user_id)
    # 5. Update status
    update_document_status(file_path, "completed")

Document Parsing

Format	Tool	Notes
PDF (text)	PyMuPDF (fitz)	Fast, preserves structure
PDF (scanned)	Tesseract + pdf2image	Slow, needs OCR
DOCX	python-docx	Without images
TXT / MD	Native	Trivial
HTML	BeautifulSoup	Need tag cleanup
XLSX	openpyxl	Tables → text per row

PyMuPDF is best for PDF: 10x faster than PyPDF2, correctly handles Cyrillic, preserves font info (useful for heading detection).

Showing Indexing Status on Mobile

While document is being processed — show progress. Two options:

Polling. Every 2–3 seconds request /api/documents/{job_id}/status. Simple, works everywhere. Downside — extra requests.

WebSocket / SSE. Client subscribes to events for job_id. Backend sends updates: {"step": "chunking", "progress": 0.3} → {"step": "embedding", "progress": 0.7} → {"step": "completed"}. Better UX, harder to implement on client in background mode.

After indexing completes — user notification and document list update. Documents stored with metadata: filename, size, upload date, chunk count, status.

Managing User Documents

User must be able to delete documents. Deletion means:

Delete file from storage
Delete all chunks from vector DB (by user_id + document_id)
Update record in relational DB

In Pinecone: index.delete(filter={"document_id": "xyz"}, namespace=user_id). In pgvector: DELETE FROM documents WHERE document_id = $1 AND user_id = $2.

Implementation Timeline

File upload with progress on mobile → backend pipeline with task queue → format parsing → chunking and embeddings → vector DB upsert → status API and push notifications → document management → testing on real files.

MVP with PDF + TXT, basic chunking, pgvector — 3–4 weeks. Full pipeline with OCR, multiple formats, async queue, WebSocket status — 6–8 weeks.