Bio. I am a graduated Data Scientist with problem solving strengths and newly acquired skills in the context of big data analysis. I am also an intellectually curious individual with a passion for new data mining and machine learning techniques. I am a big fan of Bayesian statistics and data visualization using ggplot2. I enjoy designing complex statistical and algorithmic solution to problems.

My Master's thesis research focused on Data Mining, Machine Learning, NLP and Deep Learning. I graduated from Carleton University's Data Science program in May 2020 with a Master of Computer Science Specialization in Data Science degree. I am actively looking for data scientist position. My key technical skills include, but not limited to Python, R, SQL, Java, report writing, traditional machine learning, deep learning, data mining, data analysis, data cleaning and wrangling, data visualization, statistical modeling, EDA, algorithms and experiment design, strong analytical and communication skills.

My interests include, but not limited to applied Machine Learning, Data Mining, Medical Imaging, Knowledge Discovery, Big Data Analytics, Real-time Analytics, Motorsports Data Analytics & Race Strategy.

April 21st 2020: Successfully defended my Master's thesis titled Cross-Platform Software Developer Expertise Learning and graduated with the Master of Computer Science Specialization in Data Science degree from Carleton University in Ottawa, Canada.
September 2018 - April 2020: Graduate Research and Teaching Assistant at Carleton University
Master's thesis in mining Stack Overflow and GitHub creating a novel approach to cross platform software developer expertise learning
September 2019: Featured in Carleton University's Eureka! magazine:
A LinkedIn post about my article can be found here. This article was featured on Carleton University's Instagram and LinkedIn page as well
May 2019 - August 2019: Data Scientist Intern at National Research Council Canada:
Worked in NRC's Data Analytics Center in Ottawa, and completed a 4 month contract for a government client
September 2018 - June 2020: Carleton University - M.Sc. in Computer Science with Specialization in Data Science:
Data Mining, Machine Learning, NLP, Deep Learning and Empirical Software Engineering. Adviser: Prof. Olga Baysal
May 2017 - August 2017: Undergraduate Researcher at University of British Columbia Okanagan:
Received an Undergraduate Research Award and worked a modern approach to feature-based opinion mining, using word embeddings
September 2015 - April 2018: Undergraduate Teaching Assistant at University of British Columbia Okanagan:
Helped students apply concepts taught in lectures via hands-on programming
September 2014 - June 2018: University of British Columbia Okanagan - B.Sc. Honours in Computer Science, Minor In Data Science
Completed an Undergraduate Thesis Data Science project under the supervising of Prof. Abdallah Mohamed and Prof. Jeffrey Andrews

Research Projects

Master's Thesis: Cross-Platform Software Developer Expertise Learning
In today's world software development is a competitive field. Being an expert gives software engineers opportunities to find better, higher-paying jobs. Recruiters are always searching for the right talent, but it is difficult to determine the expertise of a developer only from reviewing their resume. To solve this problem expertise detection algorithms are needed. A few problems arise when expertise is put into application: how can developer expertise be defined, measured, extracted or even learnt? Our work is attempting to provide recruiters a data-driven alternative to reading the candidate's CV or resume.In this thesis, we propose three novel topic modeling based, robust, data-driven techniques for expertise learning. Our extensive analysis of cross-platform developer expertise suggests that using multiple collaborative platforms is the optimal path towards gaining more knowledge and becoming an expert, as cross-platform expertise tends to be more diverse, thus creating opportunities for more effective learning by collaboration.
Eke, Norbert
Defended April 21st, 2020
Exploring the Evolution of Stack Overflow Discussions Using Sentimental Analysis on Comments
Stack Overflow is a popular QA forum for software developers, providing a large amount of discussion in form of posts and their comments. SO posts evolves with time, both in text and code snippets, so does the associated discussion with them. In this paper, we investigate the evolution of SO posts with respect to SO discussions, a factor usually ignored in techniques aimed to find relevance of a post for particular objective. To accomplish our goal, we mine SOTorrent data set that provides version history of posts and comments with time line. We then study the characteristics of discussions in form of comments with respect to evolution time line of post. Our results demonstrate that on average sentimental trend favors positive sentiment as posts becomes more stable with time, characterizing more approval from SO community in comment section.
Eke, Norbert and Manes, Saraj Singh
Linking Stack Overflow and Github Public Data for Mining Purposes
Developer expertise learning and recommendation is the task of defining and quantifying the expertise areas and levels of developers, then creating a top-n ranking for developers who are most qualified to perform a task. A software repository mining approach on this task would allow the creation of a developer expertise profile consisting of topical expertise and interest distributions learned from Stack Overflow and Github public data. This project addresses building a database consisting of Stack Overflow and Github public data, then linking them together based on a common attribute.
Eke, Norbert
show more
Anomaly detection with Generative Adversarial Networks and text patches
In this research work the possibility of adapting image based anomaly detection into text based anomaly detection was explored. Two main approaches are being proposed, namely anomaly detection as a task of classification and unsupervised anomaly detection using text patches. Both approached explore the use of generative adversarial networks to perform anomaly detection and results presented show that such can be fruitful.
Eke, Norbert and Drozdyuk, Andriy
Honours Thesis: Identification and Classification of Sexual Predatory Behavior in Online Chat-Room Environments
According to the Crimes Against Children Research Center, one in five U.S. teenagers who regularly use the Internet have received an unwanted sexual solicitation via the web. There is an increasing danger in online environments such as chat-rooms, where predatory behaviour is more and more frequent, creating an unsafe environment for minors. This project aims to design an approach for online communities to enhance their member's safety by detecting malicious conversations of sexual nature. This project joins the powers of computational linguistics with statistical machine learning to decipher the insight lying in conversations, then make predictions on whether or not a specific conversation should be flagged for containing sexual predatory behaviour. The contribution of this novel approach is 2-fold: firstly, the approach is able to capture the contextual details by putting an emphasis on insight that lies within the conversation, and secondly it contains a 2 stage classification system, which is highly flexible and customizable for detecting and classifying other malicious textual data.
Eke, Norbert and Mohamed, Abdallah and Andrews, Jeffrey
Feature Based Opinion Mining: A Modern Approach
In a world where customers can buy products with a few clicks online, future customers must consider the opinions and satisfaction levels of previous customers. In order to allow one to understand what previous customers have said, the design of an automated technique that summarizes opinions of thousands of customers is desirable. A promising technique has been developed that combines continuous vector representation models, natural language processing techniques and statistical machine learning models. This technique has been tested on labelled datasets and it extracts over 80% of opinions correctly. Future research can focus on improving the technique's limitations on edge cases.
Eke, Norbert, and Andrews, Jeffrey and Mohamed, Abdallah

Coursera Courses

Side Projects

Early Data Science Work
December 2017. A repository dedicated to showcasing my Data Science research from 2015 to 2017 summarized into one portfolio document
Exploratory Topic Modeling
June 2016. Exploratory project in Deep Learning and Topic Modelling combined with Natural Language Processing in order to find topics within textual data.
Formula 1 Fan Forum
March 2017. Forum type client and server side web-development using a database.
Text Entailment and Semantic Relatedness
March 2019. In this project deep neural networks were used to solve the task of Text Entailment and Semantic Relatedness.
Consulting for Statistical Society of Canada
December 2017. Consulting project to create a better, data driven conference schedule for the Statistical Society of Canada
Scenic Route Generator for Touristic Attractions
October 2016. In this project we designed a local tourism route creator based on attractions within the Okanagan valley.
Data Collector
May-August 2015. Implemented my own scraping algorithm for
Chain-HashMap Implementation
August 2016. Implementation of a Chain HashMap with the help of a Data Structure textbook.
Skip List Implementation
August 2016. Implementation of a Skip List using a SortedMap
Software Eng. Capstone
September 2017-April 2018. Undergraduate Capstone Project: A torrent based video sharing service backend and web application front end for use with Raspberry Pi's
Road Line & Sign Detection
April 2017.In this project we applied famous object detection algorithms such as SIFT and Hough transform to detect road lines and signs
Spam Email Detection
April 2016. Spam email detection in R using statistical machine learning
Academic Project Proposals and Reports
March 2019. A repository dedicated to showcasing my latest academic projects and thesis work during my Master's degree
Game Of the Amazons AI
April 2017. We built a state space based AI agent capable of playing the Game Of the Amazons using a heuristic function and the Minimax algorithm
Book Citation Generator
April 2017. We designed a digital library and book citation generator using a database

© Norbert Eke
Design adapted from Ekaba Bisong