Nabil - Chef de projet SQL SERVER
Ref : 111122F002-
94500 CHAMPIGNY/MARNE
-
Chef de projet, Formateur, Architecte (61 ans)
-
Freelance
ExperienceS
Freelance (since January 2007)
Intermarche (April/2021 -):
Role:
Big Data/Spark Expert and Solution Architect
Technical Environment:
Improvement proposition about quality metrics, data governance, Cloud Resource cost reduction, resource monitoring, architecture patterns, infrastructure templates.
Data sharing solution: design and implementation.
Data Lake management: backup strategy, environment refresh.
EDF (12/2020-02/2021, 50 days) :
Role:
Auditor
Technical Environment:
Elastic Stack, Java/Python, Jenkins
Achievements:
Audit of an Elasticsearch application (version 6.8, 400Tb of data, Cluster composed of 105 Nodes)
Domain of Observability (Index Lifecycle Management), Streams of technical & application logs and measures.
Revue of Java & Python code (DevSecOps best practices)
Revue of Elasticsearch Data Model (Static/Dynamic schemas, usage of Analyzers, Indices usage, Fields normalization and naming, ECS compatibility, Continuous Aggregation …)
Banque de France (August 2017- March 2021) :
Role:
Big Data/Spark Expert
Technical Environment:
Hortonworks HDP, Hive/LLAP, Tez, Spark, Sqoop, Kafka, Java, Python, Kubernetes, OpenShift, Zeppelin, Jenkins
Achievements:
Hortonworks platforms administration (HDP 2.x - 3.1, HDF 2.x - 3.x): Installation, clusters configuration, security configuration (kerberization, data encryption), service management, designing and configuring queues, HDFS and Yarn sizing, configuring Knox and Ranger, clusters monitoring
Defining the best practices for Hive (partitions, bucketing, ACID and LLAP) and for Spark.
LLAP configuration and resources allocation
Architecting the ingestion strategy (batch & near real-time process, Change Data Capture, Streaming, Triggers)
Architecting a pricing assets application on an open source and distributed environment
Installation of a Spark Cluster on Kubernetes
Designing and Deploying Elasticsearch indices and analyzers for an internal application
analysis, optimization, performance improvement and data anonymization (Java, spark, Hortonworks)
Designing Kubernetes Patterns for database systems, logging systems and monitoring systems (Postgres, Barman, Filebeat, Elasticsearch, Logstash, Kibana, Fluentd, Prometheus, Grafana etc., Kafka, Pulsar)
Auditing many Big Data applications and Pipelines (Sqoop, Hive, Spark, Elasticsearch, HBase, Tez)
Optimizing many Spark, Hive/Tez applications/Pipelines in Java and Python
Defining Python best practices
Proposal about Big Data Ingestion strategy
Designing the Data lake, data type zones (Silver, Bronze, Gold and Sensitive), Data Catalog and Lineage.
Adding a new functionality to AWS Deequ (Data validation Development in Spark and Scala)
Customization of Spark (SDL & VTL parser) for ECB Anacredit application.
Migration Affects from HDP 2.6.3 to CDP-Cloudera Runtime 7.2.6: What projects have to change (HiveQL, Spark, etc.)
Audit Java code and Elasticsearch Data model (ILM, logs and Metrics)
Next Challenge (2016 – 2017):
Role:
Managing a Big Data and Data Science training team
Technical Environment:
Cloudera CDH, Kafka, Pulsar, Cassandra, MongoDB, Scala, Python
Achievements:
Designing a training program in Big Data domains
Advising and Helping customers for choosing and setting Big Data Architectures (Lambda, Kappa, Zeta, SMACK, DataLake)
Orange-CloudWatt (2016):
Role:
Product Owner for BigData As A Service
Technical Environment:
OpenStack
Achievements:
Setting up many Hadoop production environments (Cloudera, Hortonworks, MapR)
Managing requirements
Management of schedule and defining priorities
Benchmarking Hadoop Distributions (Cloudera, Hortonworks, MapR)
tags in elastic search feeding an oriented graphics internet browser
Umanis (2015 – 2016):
Role:
Big Data Architect
Technical Environment:
Cloudera CDH, SAP Hana, Elasticsearch
Achievements:
Designing a big data architecture (lambda architecture) for data integration including Hadoop cluster, SAP Hana Cluster and Talend Data Integration
Development of a near real time analytical dashboard with Seller performance graphics and objectives alignment
Design of a system to predict incomes in the future using spark machine learning to access historical data and others environmental data like weather and holidays
Crawling many internet sources about Arab world political, social and cultural blogs and newspaper in arabic, french or english. Storing the data on a Hadoop cluster and indexing them with Elasticsearch. Improving the search function of Elasticsearch by generating dynamic multilingual thesauri based on the Deeplearning model
Amundi (2012 – 2015):
Role:
Database Architect
Technical Environment:
Oracle, Sybase, Ingres, MS SQL Server, MySQL, MongoDB
Achievements:
Administration of MongoDB NoSQL database: Installation, Performance, Optimization, Backup and High Availability of the system.
Administration of Oracle, Sybase, Ingres, MS SQL Server and MySQL Databases.
Collecting Financial Data from databases and Internet Financial Sites.
Data quality analysis, provisioning building pricing using Neuron Network Models and Machine Learning Algorithms.
Société Générale (2007 – 2011):
Role:
Database Architect
Technical Environment:
Oracle, Ingres, MS SQL Server
Achievements:
Role:
Database Administrator
Technical Environment:
Unix, Windows, Oracle, Ingres, MS SQL Server
Achievements:
Architecting and administrating many Databases (Oracle, MS SQL SERVER, MySQL)
Participation in the setup of Michelin Data Center of Europe (Oracle Database)
Conception and Designing an Archiving/Unarchiving Manager for Oracle Databases (Thomson)
Database (Oracle, Sybase) Administration and performance analysis (Cegetel)
Database (Oracle) Administration and Upgrading (Sanofi)
Tunisian Minister of High Education & Research (2003 – 2007):
Role:
Professor/Researcher
Achievements:
Head of Computer Science Department
Teaching C++/Logic/Databases/Algorithm courses
Natural Language Processing Research
Managing the Computer Science department
Neuroware (1997 – 2003):
MongoDB Certified Developer Associate, 2014
CISM (Certified Information Security Manager), ISACA – 2011
CobiT Foundation Certificate, ISACA - 2011
CISA (Certified Information System Auditor), ISACA – 2010
OCA (Oracle Certified Associate), Oracle – 2003
Arab World Institute - Paris (1992 – 1997):
Role:
NLP Researcher
Achievements:
Development and design of tools relating to Natural Language Processing
Development and design of tools for automatic document classification
Automatic Translation English- French- Arabic
Automatic full form Dictionary construction
Design and Programming a C++ library for Neural Network Algorithms
Skills
Data Science
Natural Language Processing (NLTK, GenSim, spaCy, sparkNLP)
Deep Learning (TensorFlow, Theano, Keras)
Machine Learning (SparkML)
Big Data
Hadoop Ecosystem (Cloudera, Hortonworks)
NoSQL Data Stores (MongoDB, Cassandra)
Elastic (Search, Logstash, Kibana)
Data Streaming (Kafka, Pulsar and Spark)
Data Management
Data Quality (Talend, AWS Deequ)
Data Integration (Talend, Azure Data Factory)
IT Governance
Enterprise Architecture
Cobit
Project Management (PMI, Agile - SCRUM)
Data Analysis
IT System & Cloud Administration
Database Administration
Oracle, MS SQL Server, MySQL, SAP Hana
Operating System Administration
Windows server, Linux Centos
Big Data Environment
Hadoop (Cloudera, Hortonworks)-HDFS-YARN, Spark (Core, SQL, Streaming, ML, GraphX, Scala, Python), MapReduce (Hive, Pig), Flume, Kafka, MongoDB, Container (Docker, cgroups), Mesos, Kubernetes, OpenShift
Programming Languages
C++, Java, Python, Scala
Cloud
Openstack, Kubernetes, AWS, OpenShift, Azure
Azure: Databricks/Spark, Data Factory, SQL Database, Data Lake, Blob Storage, EvenHub
Achievements:
Azure Databricks pipelines Optimization, Audit, Best Practices
Azure Data Hub Architecture improvement adding the usage of Delta Lake (ADLS Gen2), Metadata management and the Document Architecture process.
Data quality improvement with Data Profiling, Data Validation and Data Observability (Deequ: Spark Library)
CI/CD pipeline review and development of Best Practices.
Testing and validating the usage of sqoop for data ingestion from Relational Database Management Systems (Oracle, Postgres, MS SqlServer) to the DataLake within Secure and Authenticate Connections.
Kafka qualification for data streaming and real-time data analysis.
Audit, qualification and designing Big Data development environment and integration with Hortonworks Data Platform (Hadoop) for a Data Lake use case (Pig, Sqoop, Hive, Spark, HBase, Elasticsearch,…)
Spark pipelines
Defining recommendations for customers using Orange Cloud Platforms
Setting the pre-configured versions with optimized operating system and Hadoop parameters
Developing the Cloud distributed image factory process
Designing technical architecture for huge volume cluster
Architecting some data science use cases for customers: Crawling social networks, storing data in Hadoop Distributed File system, processing fixed image with spark and tagging them automatically, Inserting the produced
Defining best practice for database administration
Establish the team organization
Developing database infra structure Architecture
Managing DBA team
Developing the roadmap
Managing the schedule
Education
PhD, Paris 11 (October 1988 - 1992)
PhD in Computer Science
Master, Paris 11 (1987 - 1988)
Master in Electronics and Computer Science
AWARDS and Certificate
Databricks training: Certificate of Completion of Apache Spark Tuning & Best Practices, June 2019
Certificate in Deep Learning, Coursera, 2018
Togaf 9 Level 1 & 2 Certificate, the Open group, 2015
Zachman Certified, 2014
Language
English, French and Arabic