Apache Spark est un framework open source. Spark était initialement développé à par AMPLab. Apache Spark est une infrastructure de traitement parallèle qui permet d'exécuter des applications analytiques, en utilisant des machines en clusters.
Designed and built Modern Data Plateform (Dagster / Airbyte / Dbt) thanks to Terraform, Helm/K8S, AWS
EKS
Python development of Spark streaming / batch for new AI application in Python
MLOPS with MLFlow for LLMs & ML models.
5M€ project management for building HealthCare DWH
On demand training about Apache Cassandra & Apache Spark for a IT professionals
Stack: AWS, Apache Spark, Apache CassandraData pipelines developments using Apache Spark and dedicated Spark Java framework.
Stack: Cloudera, AWS, Apache SparkMLOPS consulting, Apache Spark training
Stack: AWS, MLFLOW, Apache SparkDesigned and implemented pipelines using PySpark / Scala for populating AWS Redshift DWH
Stack: Databricks, AWS S3, Redshift, Lambda, Python, Scala, CDP CI/CD GITHUBConsulting about Apache Spark, Apache Cassandra, ML Ops
Master Data Management and data gouvernance consulting for Pole Emploi Data owners and Data stewards.
Data pipelines developments in Apache Spark (Java), dev. in Java of a Data lineages appl. based on Apache
Atlas data for data stewards.
Courses: Big Data & Data Engineering introduction, MLOPS
Stack: AWS, Apache Spark, Apache Kafka, MDP