Prewave is making supply chains more resilient by monitoring and predicting supply chain risks through AI technology.
My mission is to:
- improve and maintain Prewave’s data processing pipeline and apply their data science toolset to the data
- analyse, review and improve existing algorithms
- design efficient distributed algorithms
- improve performance of SQL queries (couchbase, postgres)
Ankbot project: real-time and distributed stream based programming
- Implemented a stream based framework for designing and running concurrent and distributed algorithms
- the objective is to be simpler to use and faster than Spark + Kafka; I hope this objective can be reached; by trying we cannot fail; whatever the outcome it is time well spent
- implemented a buffered and typed streams; the buffered streams use a space and time efficient serialization algorithms (encode/decode)
- using the buffered and typed streams, implemented a column based table with dynamic partitioning
- tables can be written by parallel processes running in different computing nodes
- tables can be read as files (non-blocking mode) or as a never ending streams (blocking mode)
- in blocking mode we can read table rows in real-time while other processes write data to the tables
- tables have tagged offsets, thus we can read tables starting from a given saved offset
- we can use Ankbot tables to handle real-time events; unlike Kafka, no repartitionning and no duplicates
- added streaming windows to reactors (ie. batch length, batch time, slide length and slide time windows)
- implemented a fast Hashtable and HashSet algorithms with the possibility to merge two HashSets/Hashtables into one HashSet/Hashtable
- implemented a simple and fast logging library using macros; an alternative to lightbend scala-logging that is a library wrapping SLF4J.
- we use Docker swarm to launch the program in distributed mode
In progress:
- write docs before releasing the concurrent version of Ankbot to whoever is interested in adding concurrency to its programs
- create an Enterprise Ankbot Plaform (distributed and real-time event streaming) to come: implement AnkbotML for machine learning applications; but many other applications may benefit from distributed computing
Learning a new language is not difficult but setting up the environment and tooling is an issue specially if there is many different solutions and frameworks; while learning clojure/clojurescript I stumbled against a complex and changing eco-system.
This motivated me to create my own framework (based on ideas from ANKBIT Javascript that I developed around 2010);
Clojure/Clojurescript is not difficult to learn (having a strong background in Common Lisp and Java), but for
beginners it is not easy to start developing applications because of the ecosystem is huge and changing and it is really a challenge to choose from all
existing tools and frameworks.
The objective is to create an online environment for developing client and server applications. Some of the niceties in the framework follows:
- managed state of components
- hot reload; any code change (client and server) will re-render the page in less than 3 seconds (ie. we are not using figwheel, we developed our own solution)
- remote procedure calls with callbacks
- software events (on/emit)
- it is possible to use reagent/react components and external scripts (ie. javascript)
- version control system (ie. git)
Design and implement a program to query data, using Java and Spark SQL,
from different sources: Elasticsearch, BigQuery, Google Storage and Azure
Storage, HDFS. The users create views from different data sources and write
Spark SQL queries. The system allows to correlate data from different sources,
and is mainly used by data scientists and security engineers.
Design and development using Scala, Spark, Pyhton, Kafka, Mesos,
Marathon, Hdfs, Yarn, Docker, Cassandra, Elasticsearch, Ansible, ...
- devops: linux, docker, hadoop, kafka, elk, mesos, marathon, ansible, git, prometheus, ...
- design and implement gladiator, a proxy server with a marathon/mesos auto service discovery (implemented in go1.8, the name GLADiator comes from the following brainstorming reaction: Service Auto Discovery -> SAD -> GLAD ->
GLADiator :)
- designed Connector, a service discovery tool used with a proxy server (ie. HAProxy, nginx)
- I implemented filebot, a simple log management solution using node.js; many node.js processes index log records in parallel into elasticsearch. The
solution is powerful, fast and easy to configure. For each log message, filebot matches the messages with user defined regex patterns and send the result to user defined javascript handlers and index returned json documents to elasticsearch.
- write and refactor ansible scripts (ie. filebeat, logstash, kibana, elasticsearch, ...)
- create ansible modules (as much as possible, I prefer using parameterised roles as modules instead of creating modules using python)
- design logstash patterns and create kibana dashboards
- create/manage elasticsearch snapshots/backups
- create test clusters using docker containers; a test cluster can be created using a base image or a committed test cluster. The objective is to create
whole iso-prod test environments by deploying all ansible playbooks in test clusters (work not finished because of docker limitations).
- currently diving into mesos, kafka, spark, zookeeper, cassandra, hive, scala, akka, ...
- add elasticsearch plugins: shield, marvel, watcher (groovy, javascript, mustache script)
- implement elasticsearch watchers to track different system ressources and application data conditions
- write a script to detect kafka partitions/replicas on the same rack or replicas not in-sync; generate a json document to reassign the replicas to different nodes.
- write a script to backup/restore elasticsearch indexes (using scroll and bulk); also, the script can generate csv files from elasticsearch documents.
keywords: cloudera/hadoop (yarn, mapreduce, hdfs), mesos, spark, kafka, zookeeper, elasticsearch, icinga2, node.js, cassandra, pig, hive, ansible, java,
scala, groovy, mustache, python, akka, node.js, linux (RHEL 6.7), git, ansible,
terraform, carrefour.io (OpenStack), REST, agile, scrum, HP Insight Cluster
Management Utility, iLO4, ...
Taoufik ********
1. Design and implement a new WEB IDE, ANKBIT; With ANKBIT you develop web components/applications online. An ANKBIT is a generalisation of HTML tags (an HTML tag is a primitive ANKBIT). I have a working prototype, and I am currently designing a library of ANKBIT web components (animation, layout, style, ...).
Adding webRTC to ANKBIT allows easy development of peer-to-peer web applications (ie. video conferencing).
I am an experienced Common Lisp Architect/Programmer and would like to
join a dream team to work on challenging projects.
Page 5 of 12
I have a great interest in general for functional programming languages such
as Lisp, Haskell, Clean, Erlang, Ruby, F#, ...
I strive to be a creative Architect/Programmer to solve difficult issues with
simple solutions.
Lately I designed a fast and powerful pattern matching (see examples of
use below) that I am using daily for all my Lisp projects; and currently I am
designing a new programming language: lazy evaluation, partial evaluation,
complete sharing, generalized lisp macros (based on pattern matching)
29/05/2015 Update: pattern matching
I improved the pattern matching tool (written in common lisp) in many areas:
the most important additions are matching any cyclic graph, patterns with
general recursion, and generic patterns (types):
Examples of matching graphs with cycles:
1. match an infinite list of ones:
(match '#1=(1 . #1#) (^foo=(1 . ^foo) 'ok))
2. a complex one
(match '#1=(1 #2=(2 #1# 3 #1# . #2#) . #1#) (^foo=(1 ^bar=(2 ^foo 3 ^foo .
^bar) . ^foo) 'ok) (_ 'ko))
Define patterns:
(defpattern some (x) `(~^foo=(:or (,x) (,x . ^foo)) . _))
(defpattern lone (x) `~(:or nil (,x)))
(defpattern all (x) `^foo=(:or nil (,x . ^foo)))
Define matchers:
(defmatcher term ()
`(((?x= #^fact (:and ?op (:or * /)) ?y= #^term)
(funcall op x y))
#^fact=(((:and ?x (:integer)) x)
Page 6 of 12
(?x= #^expr x))))
(defmatcher expr ()
`(((?x= #^term (:and ?op (:or + -)) ?y= #^expr) (funcall op x y))
(?x= #^term x)))
;;; while term and expr are globally defined matchers, fact is locally defined
inside the term matcher
> (match '(2 + (3 * 4)) (?x= #^expr x))
14