Tesseract est un moteur de reconnaissance optique de caractères (ROC ou OCR en anglais) conçu par HP puis repris par Google.
Expérience professionnelle
WORK EXPERIENCE
09-2023 - Present | Data scientist / ML Engineer
BNP Paribas, France
Spreadauto project:
- Use of NLP alogirthm to detect, extract and parse financial
tables.
- Code base refactorisation with updates from old to new
python version.
- Retraining models and build its automatic workflow.
- Creation of a docker container to be used with kubernetes.
Valuation Report project:
- Setup an LLM with VLLM on docker along its deployment with
jenkins and ansible with queuing capabilities on a server with
4 A100 Nvidia GPU.
- Setup in production an LLM with RAG technologies to extract
financial data from financial reports.
- Use an IBM Queue system to scale the API.
Creation of a REST API along a DB persistency, an MlOps and
Devops pipeline using Jenkins and kubernetes for production.
Use an IBM Queue system to scale the API.
Python SQL Linux Numpy REST API Docker Scikit-Learn Pandas
API REST NVIDIA RAG Large Language Models (LLM) Ansible
Kubernetes Jenkins Oracle
01-2020 - 05-2023 | Data officier NLP
Gino LegalTech - Gino LegalTech, France
Creation of a complete AI solutions in production:
- Extract text from contracts (OCR)
- Classify contract clauses
- Identify and extract key terms
- Link related words (e.g., address + company name)
- Improve accuracy with active learning
- Develop an annotation tool
- 80% accuracy with 50 contracts, 90% with 100
- Stable and scalable with optimisation on RAM & CPU usage
Python NLP SQL OCR Tesseract Cython Linux Scikit-Learn GIT
Numpy MariaDB Spacy Python3 Redis MLFlow OpenCV Tensorflow
PyTorch Celery Nltk Django CI/CD REST API Docker Qt SaaS
DevOps C++ Gensim NGINX Azure
01-2018 - 12-2019 | Data scientist
Indépendant, France
Stock Exchange Prediction Optimization Project
Objective: Optimize real-time stock market prediction by
extrapolating stock option data.
Key Subprojects:
- Data Collection:
Developed a Python web scraping tool using Selenium &
BeautifulSoup.
Scrapes stock option data in real time.
Stores data on a web server using a Redis database.
- Data Analysis & Prediction:
Applied deep learning & machine learning models:
LSTM, Perceptron, Random Forest.
- Real-time peak detection for risk assessment.
- Optimization:
Speed Optimization:
Algorithms converted to C++ & Fortran for processing under 1
second.
Visualization:
Used OpenCL & Vispy for real-time data display.
Achieved latency under 30ms.
Python NLP SQL Optimization Cython Scikit-Learn Numpy
MariaDB Spacy Redis C++ Fortran Selenium
08-2017 - 12-2017 | Data scientist
Prof en Poche, France
Developed an AI-powered chatbot that solves math exercises
step by step and provides relevant course materials.
Built an image recognition system using Tesseract OCR to detect
equations in photos and convert them into LaTeX format.
Created a seq2seq neural network to enhance AI performance
in understanding and solving math problems.
Improved the speed and accuracy of the AI, making it more
efficient in processing and delivering solutions.
Designed a second AI system capable of solving any middle to
high school math problem step by step with LaTeX-based
explanations.
Implemented growth hacking strategies to evaluate marketing
impact on the mobile app.
Contributed to the Business Plan development and conducted a
financial analysis for the company.
Python NLP Python3 Latex
02-2017 - 07-2017 | Marketing MBA internship
Prof en Poche, France
Région de Pau, France
• Use of growth hacking method for a mobile application
impact evaluation of several marketing approaches on the
potential customers and study of their behaviors related to the
market with a qualitative way
• Contribution in the Business Plan development and the
company financial analysis
• Creation of an Artificial Intelligence: For a given photo, use
Tesseract OCR then sends to students the course and exercises
content associated with that picture. Detects the presence of
equations and converts them into LaTeX format to solve them
step by step
• Development of a neural network based on the "seq2seq
attention" model to improve the artificial intelligence result
• Develop a parsing software in Java 8 and python 2.7
• Develop a program which put office files in sequences in VBA
Python OCR Tesseract Java Office VBA Latex
02-2016 - 07-2016 | Geophysics for geomodelling: neural
network classification
Total, France
Région de Pau, France
• Tests and parameterizations of Sismage Neural Network to
model seismic attribute from substacks, tutorial proposal.
• Several Java plugins for Sismage's software were created
Java
06-2015 - 06-2015 | Seismology Internship
Technological Educational Institute of Crete, France
• Magnitude and Hypocenter calculation
• Seismic catalogue creation on Zmap
06-2014 - 06-2014 | Gravimetry Internship
Istituto Nazionale di Geofisica e Vulcanologia, France
Sicile (Région), Italie
• Measurement on field and data processing on MatLab
MATLAB