Dao Nguyen Duong

AI Engineer

nguyenduongyht@gmail.comVietnamGitHubLinkedIn

Professional Summary

AI engineer passionate about building production-ready machine learning systems. Experienced in large language models, natural language processing, deep learning, and full-stack software engineering. Strong expertise in advanced retrieval systems, multimodal learning, and document intelligence. Published researcher in Vietnamese NLP with multiple competition awards.

Work Experience

AI Engineer (Natural Language Processing - Data Ingestion Pipeline)

2024-05 - Present

Vision Team, FPT SmartCloud (FCI), FPT

Leading the development of advanced document intelligence and data ingestion systems for healthcare and administrative documents. Building retrieval and classification systems for complex PDF processing.

Key Achievements:

  • Developed and deployed text-based retrieval and classification models for OCR Healthcare data (ICD-9, ICD-10, surgical datasets)
  • Designed and implemented advanced RAG-based system for question-answering on healthcare insurance PDF documents with module-level performance evaluation
  • Implemented modular, extensible Multimodal Data Ingestion Pipeline for administrative PDF/Docx/PPTX documents, delivering structured outputs in HTML and JSON formats
  • Developed and enhanced models addressing Language Detection, Table Merging, Text Mapping, Reading Order, and Document Structure Analysis challenges

Technologies:

Advanced retrievalMultimodal vision-language modelsRagasDoclingPaddleOCRElasticsearchKotaemonLLamaindexLangfuseGrafanaPrometheusDockerUnstructured

AI Engineer (Natural Language Processing - Chatbot system)

2019-09 - 2024-04

Innovation Center, VNPT-IT, VNPT

Designed and built neural network models and retrieval systems for VNPT's SMARTBOT framework. Handled back-end engineering for scaling and improving performance of retrieval-based systems.

Key Achievements:

  • Designed neural networks models including intent classification, rule-based entity recognition, statistical entity recognition, out-of-vocabulary detection, and accent-corrector models for VNPT's SMARTBOT system
  • Designed legal document retrieval system and matching address algorithm for VNPT's eKYC integrated with OCR system
  • Built, served, scaled and improved performance of retrieval-based system using Python, Elasticsearch, FastAPI, AsyncIO, Redis and parallel programming techniques

Technologies:

Huggingface transformersTensorflowPytorchONNXRasaRedisREST APIsFlaskFastAPIAsyncIODockerElasticsearchPandasTrie data structureLLamaIndexScrapyBeautifulsoup4

Data Scientist Intern

2018-06 - 2018-08

Viettel Research and Development Institute

Worked on information extraction problems in natural language processing within a team environment.

Key Achievements:

  • Worked in a team of five people to solve Information Extraction problems in Natural Language Processing
  • Wrote technology internal paper on 'Supervised Methods for Named Entity Recognition problem'

Quality of Experience Research

2017-07 - 2018-05

HUST's Embedded System and Reconfigurable Lab - University of Aizu's Communication Lab

Research on machine learning approaches for video quality assessment.

Key Achievements:

  • Developed Machine learning models to solve Video Quality Assessment (VQA) problem about Video Streaming

Education

Engineering Degree in Electronics and Telecommunications

2019

Hanoi University of Science and Technology

Major: Electronics and Telecommunications - Talent Program

Skills

NLP & Language Models

Huggingface transformersTensorflowPytorchONNXRasaLLamaIndexLLMsBERTIntent ClassificationNamed Entity Recognition

Machine Learning

Deep LearningCNN modelsVideo Quality AssessmentFine-tuningRetrieval-Augmented Generation (RAG)Prompt Engineering

Document Intelligence

PaddleOCRDoclingMultimodal vision-language modelsAdvanced RetrievalTable MergingReading Order Analysis

Backend & Infrastructure

FastAPIAsyncIORedisElasticsearchFlaskREST APIsDockerParallel Programming

Data Processing

PandasTrie data structureScrapyBeautifulsoup4Crawling

Data Engineering

IBM Data Engineer SpecializationMLOps Specialization

Cloud & Tools

KotaemonLangfuseGrafanaPrometheusDockerUnstructured

Awards & Honors

Top-1 in VLSP 2025 Shared Task: Vietnamese Temporal Question Answering

2025

VLSP 2025

Sub-Task 2 DurationQA - Applied retrieval and fine-tuning techniques to Qwen models

Top-5 (#2 Leaderboard) in Trustii.io's 'Secure RAG System' Data Challenge

2024

Trustii.io

Built, deployed offline, secure and efficient RAG System for Understand.Tech

5th/71 in Zindi's TechCabal Ewe Audio Translation Challenge

2024

Zindi / TechCabal

Trained on-edge CNN model for audio classification problem

Top 5 in Zalo AI Challenge 2023

2023

Zalo AI

Designed Large Language Model (LLM) for Elementary Maths Solving

Fifth Prize in Zalo AI Challenge 2019

2019

Zalo AI

Vietnamese Wiki Question Answering Challenge

Encouragement Prize in Vietnam Olympiad of Informatics

2013

Vietnam Olympiad of Informatics (Ministry of Education)

National informatics olympiad organized by Ministry of Education

Back to home