How I Built an OCR System with ML at Bizcotap

The problem, the approach, and the tech behind automating document intelligence for NFC business cards.


At Bizcotap we were solving a simple but annoying problem: people would tap or scan a business card, and then still have to type out the contact details by hand. The card held the information — the app just wasn't reading it.

That's what the OCR project was about.

The problem

Bizcotap lets professionals share contact information via NFC chips and QR codes. The digital side was solid, but a large share of our users were coming from traditional paper cards. They'd take a photo, save it, and then manually re-enter name, phone, email, and company into their contacts.

That's three steps too many. We needed to go from photo → structured contact data automatically.

The approach

The system needed to do two things:

  1. Extract text from an image (the OCR part)
  2. Understand what the text means (the NLP part — is this a phone number? an email? a job title?)

For OCR I used a pre-trained model fine-tuned on business card layouts. Business cards are actually one of the harder OCR problems because of non-standard fonts, logos overlapping text, dark backgrounds, and rotated layouts. The model needed to be robust to all of that.

For parsing the extracted text into structured fields, I built a lightweight NLP classifier that categorized tokens (name, email, phone, company, role, address). This ran on the extracted strings after OCR completed.

The stack

Flutter (Dart)   — mobile client, camera capture, contact save
Python           — OCR model serving, NLP pipeline
Firebase         — image storage, real-time sync
Node.js          — REST API between mobile and ML backend

The Flutter app captured the image, sent it to the Python backend via the Node.js API, received the parsed contact JSON back, and pre-filled the contact form. The user reviewed the result and tapped save — one step instead of manual typing.

What I learned

Model accuracy is a moving target. The first version worked well on standard white cards but struggled with colourful or low-contrast designs. I ended up building a small test set of ~200 cards photographed in different lighting conditions, which drove most of the accuracy improvements.

Preprocessing matters more than model choice. Grayscale conversion, binarization, and deskewing the image before passing it to the OCR model consistently outperformed switching to a "better" model on raw input.

Users will trust the result if the UX lets them correct it easily. Rather than showing a success message, the app shows the pre-filled form so users can fix any error before saving. This removed most of the friction from wrong extractions.

Result

The feature cut the time from card scan to saved contact from ~90 seconds (manual entry) to under 5 seconds. It became one of the most-used features in the Bizcotap app.

If you're working on something similar — OCR pipelines, mobile ML integration, or just want to talk through the architecture — feel free to reach out at uiteka11@gmail.com.