Norbert Eke's Portfolio

Bio. I am Norbert Eke, an enthusiastic, intellectually curious, data-driven, and solution-oriented Data Scientist with problem-solving strengths and expertise in machine learning & data analysis. I completed my graduate studies at Carleton University in Ottawa, Canada and I have been awarded the Master of Computer Science Specialization in Data Science degree (Class of 2020).

I worked in Canada for a short period of time, then in 2021 I relocated to Zürich, Switzerland, where I currently live and work. I have 2.5 years of work experience in Data Science and Machine Learning. I am a motivated person with excellent leadership and communication skills. I am a self-starter, proactive, goal-oriented, hands-on learner with a passion for growth within my expertise area, creating meaningful and impactful work using new data science and machine learning techniques. I always have a positive mindset and I am looking to gain valuable experiences in data science. I find professional fulfillment in innovation, impact and providing a meaningful contribution to society. I am a big fan of Bayesian statistics, customer lifetime values, classification tasks, cluster analysis, outlier detection, recommendation systems, and data visualization. I enjoy analyzing data and solving real-life problems using data science and machine learning.

During my graduate studies I conducted research in Dr. Olga Baysal’s Software Analytics lab. My Master’s thesis, entitled "Cross-Platform Software Developer Expertise Learning" focuses on defining and quantifying the expertise of software developers based on publicly available data from GitHub and Stack Overflow. In order to achieve this goal, I worked with LDA topic models, which gave me an in-depth knowledge of Bayesian statistics.

My skill set includes (but not limited to):

• Programming: PYTHON, R, SQL, JAVA, HTML, Javascript, Algorithm Design, Data Structures, Object Oriented Programming

• Databases: Extensive experience with SQL, Database Design, BigQuery, MySQL, PostgreSQL, NoSQL

• Software Engineering: Experienced with Client‑Server Applications (projects), Agile Development, Algorithm Implementation, CI & CD, Git, Deployment, Unit Testing, Writing Reports & Documentation

• Cloud & Deployment: GCP (Vertex AI, App Engine, Compute Engine, Cloud Run, Cloud Build, Cloud Functions & Cloud Scheduler), Flask, Docker, Familiarity with Linux environments

• ML & DS Libraries: TensorFlow, Keras, Scikit‑Learn, Scikit‑Optimize, Imbalanced‑Learn, PyCaret, StatsModels, Lifetimes, Tslearn, Sktime, Numpy, Pandas, Matplotlib, Plotly, Seaborn, Streamlit, SpaCy, Gensim, NLTK, dedupe, TextBlob

• Machine Learning: Traditional ML (Random Forest, Decision Trees, SVM), Regression, Classification, Clustering, Time‑series Forecasting, Feature Selection, Outlier Detection, NLP, Hyper‑parameter Optimization

• Deep Learning: Deep Neural Nets for NLP & Time‑series Forecasting using MLP, CNN, RNN, LSTM, GRU, GAN

• Statistics: Descriptive Stats, Probability Theory, Regression Analysis, Bayesian Statistics, Statistical Modeling, Bayesian AB Testing, Inference, Dimensionality Reduction, Hypothesis Testing

• Data Analysis: Extensive experience with Data Acquisition, Cleaning & Visualization (Streamlit & Looker dashboards), Feature Engineering, Data Mining, Predictive Modeling, Handling Unstructured Data

• Interpersonal Skills: Teamwork, Intellectually Curious, Continuous Learner, Leadership & Communication Skills, Time Management, Self‑Starter, Motivated, Analytical Thinker, Data‑Driven Problem Solver

My interests include, but not limited to machine learning, healthcare applications of machine learning, financial data analysis, applications of deep learning, time series analysis & forecasting, outlier detection, classification tasks, cluster analysis, text analytics, data mining, sports analytics, real-time data analytics and simulations, motorsports data analytics & race strategy, conveying useful information through dashboard visualizations, and applications of computer vision and natural language processing.

Timeline.

April 2023 - present: I am starting a new position as Data Scientist at EF Education First in Zürich, Switzerland, where I will continue to grow my expertise in data science and machine learning.
July 2021 - April 2023: I started a new position as Data Scientist at Migros Online, Switzerland’s largest digital supermarket, where I will continue to grow my skill set, transform raw data into actionable insights, and always strive to tell a story through data.
June 2021: I relocated to Zürich, Switzerland to continue my professional journey here, in the heart of Europe.
February 2021 - June 2021: I worked as a Data Scientist in Health Canada's Data Analytics and Reporting Team (DART) in the Regulatory Operations and Enforcement Branch in Ottawa, Canada.
June 19th, 2020: Graduated with the Master of Computer Science Specialization in Data Science degree from Carleton University in Ottawa, Canada. See my LinkedIn post about it.
April 21st, 2020: Successfully defended my Master's thesis titled Cross-Platform Software Developer Expertise Learning at Carleton University in Ottawa, Canada.
September 2018 - April 2020: Graduate Research and Teaching Assistant at Carleton University
Master's thesis in mining Stack Overflow and GitHub creating a novel approach to cross platform software developer expertise learning
September 2019: Featured in Carleton University's Eureka! magazine:
A LinkedIn post about my article can be found here. This article was featured on Carleton University's Instagram and LinkedIn page as well
May 2019 - August 2019: Data Scientist Intern at National Research Council Canada:
Worked in NRC's Data Analytics Center in Ottawa, and completed a 4 month contract for a government client
September 2018 - June 2020: Carleton University - M.Sc. in Computer Science with Specialization in Data Science:
Data Mining, Machine Learning, NLP, Deep Learning and Empirical Software Engineering. Adviser: Prof. Olga Baysal
May 2017 - August 2017: Undergraduate Researcher at University of British Columbia Okanagan:
Received an Undergraduate Research Award and worked a modern approach to feature-based opinion mining, using word embeddings
September 2015 - April 2018: Undergraduate Teaching Assistant at University of British Columbia Okanagan:
Helped students apply concepts taught in lectures via hands-on programming
September 2014 - June 2018: University of British Columbia Okanagan - B.Sc. Honours in Computer Science, Minor In Data Science
Completed an Undergraduate Thesis Data Science project under the supervising of Prof. Abdallah Mohamed and Prof. Jeffrey Andrews

Coursera Courses (2020)

Introduction to TensorFlow for Artificial Intelligence, Machine Learning, and Deep Learning:

My Google Colab Notebooks:
1: Simple House Price Prediction
2: MNIST Classifier without Convolutions
3: MNIST Classifier with Convolutions
4: Fashion MNIST Classifier without Convolutions
5: Fashion MNIST Classifier with Convolutions
6: Horse-or-Human Classifier 1
7: Horse-or-Human Classifier 2
8: Happy-or-Sad Emoji Classifier

Convolutional Neural Networks in TensorFlow:

My Google Colab Notebooks:
1: Cat-or-Dog Classifier 1
2: Cat-or-Dog Classifier 2
3: Cat-or-Dog Classifier with Data Augmentation
4: Horse-or-Human Classifier with Data Augmentation

Natural Language Processing in TensorFlow:

My Google Colab Notebooks:
1: Intro to TensorFlow's Tokenizer and Padding
2: Sarcasm data Tokenization and Padding
3: BBC News data Tokenization and Padding
4: Training Embedding Layer and Classifier on IMDB reviews
5: Training Embedding Layer and Classifier on Sarcasm data
6: Training Embedding Layer and Classifier on IMDB with Subwords
7: Training Embedding Layer and Classifier on BBC News
8: Single Layer LSTM Classifier on IMDB with Subwords
9: Multi Layer LSTM Classifier on IMDB with Subwords

Academic Research Projects (2017 - 2020)

Master's Thesis: Cross-Platform Software Developer Expertise Learning

In today's world software development is a competitive field. Being an expert gives software engineers opportunities to find better, higher-paying jobs. Recruiters are always searching for the right talent, but it is difficult to determine the expertise of a developer only from reviewing their resume. To solve this problem expertise detection algorithms are needed. A few problems arise when expertise is put into application: how can developer expertise be defined, measured, extracted or even learnt? Our work is attempting to provide recruiters a data-driven alternative to reading the candidate's CV or resume.In this thesis, we propose three novel topic modeling based, robust, data-driven techniques for expertise learning. Our extensive analysis of cross-platform developer expertise suggests that using multiple collaborative platforms is the optimal path towards gaining more knowledge and becoming an expert, as cross-platform expertise tends to be more diverse, thus creating opportunities for more effective learning by collaboration.

Eke, Norbert

Defended April 21st, 2020

Exploring the Evolution of Stack Overflow Discussions Using Sentimental Analysis on Comments

Stack Overflow is a popular QA forum for software developers, providing a large amount of discussion in form of posts and their comments. SO posts evolves with time, both in text and code snippets, so does the associated discussion with them. In this paper, we investigate the evolution of SO posts with respect to SO discussions, a factor usually ignored in techniques aimed to find relevance of a post for particular objective. To accomplish our goal, we mine SOTorrent data set that provides version history of posts and comments with time line. We then study the characteristics of discussions in form of comments with respect to evolution time line of post. Our results demonstrate that on average sentimental trend favors positive sentiment as posts becomes more stable with time, characterizing more approval from SO community in comment section.

Eke, Norbert and Manes, Saraj Singh

Linking Stack Overflow and Github Public Data for Mining Purposes

Developer expertise learning and recommendation is the task of defining and quantifying the expertise areas and levels of developers, then creating a top-n ranking for developers who are most qualified to perform a task. A software repository mining approach on this task would allow the creation of a developer expertise profile consisting of topical expertise and interest distributions learned from Stack Overflow and Github public data. This project addresses building a database consisting of Stack Overflow and Github public data, then linking them together based on a common attribute.

Eke, Norbert

Anomaly detection with Generative Adversarial Networks and text patches

In this research work the possibility of adapting image based anomaly detection into text based anomaly detection was explored. Two main approaches are being proposed, namely anomaly detection as a task of classification and unsupervised anomaly detection using text patches. Both approached explore the use of generative adversarial networks to perform anomaly detection and results presented show that such can be fruitful.

Eke, Norbert and Drozdyuk, Andriy

Honours Thesis: Identification and Classification of Sexual Predatory Behavior in Online Chat-Room Environments

According to the Crimes Against Children Research Center, one in five U.S. teenagers who regularly use the Internet have received an unwanted sexual solicitation via the web. There is an increasing danger in online environments such as chat-rooms, where predatory behaviour is more and more frequent, creating an unsafe environment for minors. This project aims to design an approach for online communities to enhance their member's safety by detecting malicious conversations of sexual nature. This project joins the powers of computational linguistics with statistical machine learning to decipher the insight lying in conversations, then make predictions on whether or not a specific conversation should be flagged for containing sexual predatory behaviour. The contribution of this novel approach is 2-fold: firstly, the approach is able to capture the contextual details by putting an emphasis on insight that lies within the conversation, and secondly it contains a 2 stage classification system, which is highly flexible and customizable for detecting and classifying other malicious textual data.

Eke, Norbert and Mohamed, Abdallah and Andrews, Jeffrey

Feature Based Opinion Mining: A Modern Approach

In a world where customers can buy products with a few clicks online, future customers must consider the opinions and satisfaction levels of previous customers. In order to allow one to understand what previous customers have said, the design of an automated technique that summarizes opinions of thousands of customers is desirable. A promising technique has been developed that combines continuous vector representation models, natural language processing techniques and statistical machine learning models. This technique has been tested on labelled datasets and it extracts over 80% of opinions correctly. Future research can focus on improving the technique's limitations on edge cases.

Eke, Norbert, and Andrews, Jeffrey and Mohamed, Abdallah

Academic Course Projects (2015-2019)

Early Data Science Work

December 2017. A repository dedicated to showcasing my Data Science research from 2015 to 2017 summarized into one portfolio document

Exploratory Topic Modeling

June 2016. Exploratory project in Deep Learning and Topic Modelling combined with Natural Language Processing in order to find topics within textual data.

Formula 1 Fan Forum

March 2017. Forum type client and server side web-development using a database.

Text Entailment and Semantic Relatedness

March 2019. In this project deep neural networks were used to solve the task of Text Entailment and Semantic Relatedness.

Consulting for Statistical Society of Canada

December 2017. Consulting project to create a better, data driven conference schedule for the Statistical Society of Canada

Scenic Route Generator for Touristic Attractions

October 2016. In this project we designed a local tourism route creator based on attractions within the Okanagan valley.

Data Collector

May-August 2015. Implemented my own scraping algorithm for airlinequality.com

Chain-HashMap Implementation

August 2016. Implementation of a Chain HashMap with the help of a Data Structure textbook.

Skip List Implementation

August 2016. Implementation of a Skip List using a SortedMap

Software Eng. Capstone

September 2017-April 2018. Undergraduate Capstone Project: A torrent based video sharing service backend and web application front end for use with Raspberry Pi's