1 minute read

This project wins IDCamp Challenge 2023 as top 20 products. Posted in instagram as well as its demo. Try it here

Reason

I’ve imagined that can the phone just speak out about what the camera sees. This will help people with disabilities to understand it’s surrounding. The AI model’s is available there only waiting to be used.

However this kind of technology was easily accessible now via ChatGPT app. As far as I remember, it wasn’t available at that time. So I built this app and also submit it in IDCamp Challenge about inclusivity with technology.

Initial Build

The project initially uses image-to-text model from huggingface with inference API to send the picture to. And since it’s in English I translate the returned text using DeepL API. A little frontend code with Vue.js and there we have it. However, this wasn’t efficient. All of the requests made in the client. And there is a risk to exposing the API key to public.

At that time I wasn’t aware of Azure AI Service. Until I get the chance to get the bootcamp, Azure Certification, and another competition, I decided to move the project to use Azure AI services.

The Migration

Azure AI Service provide a convinient single resource to use all the AI services. And since I’m using Azure, I also implement the text-to-speech so that the returned, translated text was also read out loud. So what I did was built a simple FastAPI server to handle the request made from the frontend and return only the result. This way the API key stored in the backend service and theoritically, the data flow is more efficient. The request from image captioning, translation, and speech synthesis made in one place. The backend code is available on my github.

Updated: