Summary
Overview
Work History
Education
Skills
Affiliations
Peer Reviewed Publications
Training
Languages
Timeline
Generic
WELLINGTON NZOKA MBITHI

WELLINGTON NZOKA MBITHI

Summary

Experienced data scientist with a strong background in statistical analysis, machine learning, and data visualization. Proficient in Python, R, SQL, and various data processing tools, with a focus on delivering actionable insights. Collaborative approach and adaptability consistently lead to impactful results in dynamic environments. Recognized for problem-solving abilities and innovative thinking in leveraging data to drive business decisions.

Overview

23
23
years of professional experience

Work History

International Consultant

UNFPA Pacific Sub Regional
07.2024 - 12.2024
  • UNFPA Pacific Sub regional office
  • Data Analysis – Indicator construction using 1TB of MICS/DHS Datasets to provide a baseline on basic Sexual Reproductive Health (SRH) and family planning knowledge and attitudes amongst women, men and young people in Kiribati (2024), Samoa (2024); Vanuatu (2024)
  • Utilized Python on unstructured machine learning technique using the Principal Component Analysis (PCA) K-modes clustering in Python and R
  • Analytical framework of Exploratory data analysis: Data cleaning, Desciptive statistics, Correlation analysis, Heat Maps, Principal Component Analysis (PCA) K-modes clustering
  • The elbow method /scree plot is a graphical technique for finding the best number of clusters in a k-modes clustering algorithm
  • A model was built with 3 Clusters (3 Ntiles) as the optimal number of clusters under the Elbow method
  • The Kaiser-Meyer-Olkin Measure of Sampling Adequacy varies between 0 and 1
  • The values that are closer to 1 are better
  • A value of 0.6 is a suggested minimum acceptable value
  • In our model, we had a value of 0.826, which is quite satisfactory at 82.6%
  • Final report disseminated to global senior leadership with executive summary, a clear methodology combining Health, Biostatistics, and data science and results/visualizations with recommendations for policy implication.

Data analysis and ICT Expert

Hellen Keller International/NTAP Consultants
04.2024 - 06.2024
  • HKI-NTAP Independent Third party Monitoring and Post Event Coverage Survey in Kenya
  • Skilled at working independently and collaboratively in a team environment.
  • Worked well in a team setting, providing support and guidance.
  • Literature review on Vitamin A supplementation in Kenya, Sub-Saharan Africa
  • Designed and maintained largescale data base for unstructured and structured data , and for data collection mobile apps
  • Preparation of training materials and user guides for Survey CTO tool
  • Customization of survey CTO mobile app and reviews
  • Data analysis plan using R and Python for Exploratory data analysis
  • Lead in data analysis using STATA and Python and Geospatial analysis to identify key spatial patterns and correlations.

Technical Specialist- Data Scientist and Analytics

UNFPA South Sudan
05.2019 - 02.2024


Key Accomplishments :

  • Planning, Staffing, Coordinating, Managing and Budgeting the Digital Population Estimation Survey(PES) 2021 with UN, Ministries and National Bureau of statistics
  • Scientific Data analytics, AI & ML, and Population Data Modelling
  • Strengthening the capacity of MOH and NBS on national health information system and programming skills to improve the availability, quality, and use of big data
  • Support population modelling and Population Projection using Spectrum, Padis, Spectrum, Mortpak, Stata, Demographic Analysis and Population Projection System (DAPPS) and the Population Analysis System (PAS) Software
  • Provide technical support on the Population survey big data using Python, Jupyter, Scikit-learn, Anaconda, R, for population Modelling and Exploratory analysis
  • Use of Visualization technologies, Qlik, Looker, Tableau, Bokeh, Dash, PowerBI, Shinny, Plotly, Flask, Seaborne, Google charts, Dundas BI to unlock the value in health and big data on population survey
  • Developed data Visualization dashboards , Analytics & Storytelling to communicate insights using Tableau, Power BI and Python Pandas, Matplotlib, seaborn)
  • Sub-national analysis of Survey data by the use of Small Area Population Estimation with python, GIS
  • Analyze and interpret Big survey data /complex datasets in My SQL database using Python and R under Hadoop and PySpark framework
  • R and Python to extract useful insights that inform strategic business decisions
  • Develop, validate, and implement predictive models and statistical analyses
  • Monitor and maintain the performance of existing data models and implement improvements as necessary
  • Applying data science techniques Exploratory data analysis(EDA), Correlation, Heat maps, regression, etc to uncover insights from large datasets
  • Worked with Geospatial Clustering Algorithms to identify Migration spots and K- nearest Neighbor for Household size in the sampled clusters
  • Mathematical modeling of Infectious diseases support to MOH and NBS
  • PES 2021 Thematic Analytic reporting on Population distribution, Household characteristics, Education, Economic activities, Health and fertility, Migration, Gender, Youth and young people, Disability, Elderly population, Agriculture/livestock and Data quality/GIS/innovation for dissemination and information use
  • Used Python to implement Random forests / random decision forests a machine learning algorithm that combines multiple decision trees to make predictions for the Population estimates data by improving accuracy and reducing overfitting
  • The country benefited from the modeling of the Most recent population estimation aggregated by sub-national levels, Age, and Sex to address 100% under coverage
  • Key geospatial covariates were vegetative index, distance to conflicts (2020-2021, elevation, Distance to water bodies, and Distance to main roads
  • The model structure was flexible enough to recover the observed population counts, as 95.8% of the observations were within the 95% credible interval
  • The squared Pearson correlation coefficient (R2) for observed and predicted populations was 0.832 indicating overall good correspondence
  • Innovation, knowledge management and use of Information technology
  • Documenting on current and emerging Population dynamics trends from projects, strategies, approaches, and ongoing experience for lessons learned
  • Use of geospatial data analysis methods, including leveraging traditional data, Big Data, satellite data, and health service delivery data, to provide timely data and evidence in population, health and sustainable development issues
  • Working with the UNFPA HQ for modules in Use of GIS in census, Population Modelling using random forest, Census and surveying, modelled population estimates to address census under-coverage, Computer Assisted Personal Interviewing (CAPI) solutions for census and UNFPA Census Strategy
  • Using Python, Jupyter, and Scikit-learn built time series forecasting models for Family planning commodities supply prediction and anomaly detection
  • Strengthening the capacity of MOH and National statistical offices and the Government
  • Training on basic and moderate population modelling and practice with R and python for modelling with World Pop, UNFPA HQ and ESARO and University of Southampton to a team of University and Government staff
  • Strengthening capacity on the use of Geospatial data to support Population and development, Health, roads, Agriculture, service delivery etc
  • Training on PES sampling and data weighting in preparation for modelling and statistical/spatial analysis
  • Develop geospatial data dissemination and visualization approaches
  • Using basic machine learning techniques and statistical methods to identify relevant features and variables in structured and unstructured sources of information and data
  • Training on data quality and Quality assurance to support the collection of quality survey data in the field.

Data Scientist -HIV Data Analytics

UNICEF KENYA
11.2015 - 05.2019

1. HIV Data analytics & use, quality assurance and HIV Modelling in Kenya

· Ensure that the Country Office and national partners have a timely and accurate measurement of change in conditions at the national and County levels, including both monitoring of HIV trends and through spectrally specific management information systems (e.g. DHIS, EID, Viral load, TB Tibu System)

· Provide Kenya HIV Prevalence estimates for both National and county planning. This is done jointly with CDC, UNAIDS, UNICEF, NASCOP, NACC and other HIV partners Using HIV mathematical modeling tools - EPP Spectrum

· Support UNICEF in National and county-level data collection and analysis on HIV at the outcome/impact levels of relevant indicators in collaboration with international partners.

· Provide technical support on Exploratory data analysis using Python, Jupyter, Anaconda, R and R studio for programming.

· Using python conduct Predictive analytics regression analysis, PCA, Survival analysis, to extract useful insights that inform strategic program decisions

· Using Python, Jupyter, and Scikit-learn built time series forecasting models for HIV commodities supply prediction and anomaly detection.

· Big data processing using My SQL database with Python and R under Hadoop and PySpark framework.

· Perform biannual Data Quality Assessments in the 47 counties in both HTS, PMTCT and Care &Treatment
2. Programme management, monitoring and delivery of results

· Ensure that the Country Office and partners have necessary information to assess progress towards expected results established in work plans, with special attention paid to identifying proper HIV indicators and means of verification during the planning phases, measuring progress in removing bottlenecks and barriers, and measuring the quality of MOH/ UNICEF implementation of its commitments to the host nation.

· Lead on supporting county teams in developing a system for monitoring and evaluating HIV program performance.

3. Research, surveys and evaluations

· Working with NASCOP, NACC, KNBS, CDC, UNAIDS, WHO and her partners within the technical working groups for conducting AIDS Impact Assessments in Kenya (PHIA/KAIS) for 2017. These include PHIA protocols, informed consent forms,Sampling, Questionnaires, Standard Operating Procedures, training materials and training plans, TOTs, data management and Analysis, country-specific work plans and corresponding budgets the collection, analysis and synthesis of data.

· Designed and maintained largescale databases for unstructured and structured data, and for data collection mobile apps

· Exploratory Data Analysis (EDA) to identify trends and correlations in survey data

· Developed data Visualization dashboards, Analytics & Storytelling to communicate insights using Tableau, Power BI and Python Pandas, matplotlib, and seaborn)

· Use of a centralized, integrated, and publicly accessible data repository for storing and maintaining all national health data-DHIS.
4. County Capacity development and support

· Strengthening the capacity of NBS and MOH Technical programs in the field of data management and alignment between health actors for transparency among partners.

· Revise Monitoring of MOH HIV tools with NASCOP, NACC, MOH and partners, Train MOH staff on HIV tools & Guidelines and Support in Rolling out in the county.

· Use of Visualization technologies, such as Tableau, PowerBI, Bokeh,Dash, Shinny, Plotly, Flask, Google charts, and Dundas BI for Visualizationand methodologies to unlock the value in UNICEF data

Senior Monitoring and Evaluation Specialist- CQI

ICAP at Columbia University
11.2013 - 11.2015

· Analyzing qualitative and quantitative data from EMR (CPAD, Open-MRS) systems and Clinical Studies Scientifically for Cohort data using SPSS, SAS and STATA;

  • Data presentation using Tableau Software or Ms Excel pivot tables for tables, charts and graphs for assessments and evaluations in line with data analysis plans.

· Exploratory Data Analysis (EDA) to identify trends and correlations in survey data

· Developed data Visualization dashboards, Analytics & Storytelling to communicate insights using Tableau, Power BI and Python Pandas, matplotlib, and seaborn)

· Using Python, Jupyter, and Scikit-learn built time series forecasting models for HIV commodities supply prediction and anomaly detection.

  • Big data processing using My SQL database with Python and R under Hadoop and PySpark framework.

· Analyze and interpret Big survey data /complex datasets in My SQL database using Python and R under Hadoop and PySpark framework.

· Development and implementation of frameworks, and strategic plans in health information systems e.g. M&E plans, Registers for routine data collection and program monitoring; developing plans to ensure data quality.

  • Making recommendations for program improvement (CQI) based on data from individual country programs and drawing on lessons learned across the organization.
  • Supporting ICAP Global Nursing Capacity Building Program (GNCBP) Program in Nurses training in HIV and Implement CQI in select Health Facilities.
  • Providing training and technical assistance to ensure CQI and M & E integration in all program areas; increase program teams’ comfort with M&E, data activities, quality improvement processes and developing peer reviewed journal Sections.
  • Support Health facilities develop Quality Improvement work plans, Standards of care and evaluate them regularly
  • Support M&E teams across Programme Area on Standards of Care Indicators, DQAs and Donor Reporting.
  • Data quality Assurance using DATIM, KePMS and DHIS and provide feedback to Government (MOH and KNBS), senior technical team and CDC/PEPFAR.

Associate

ICF INTERNATIONAL Nairobi
08.2012 - 12.2013
  • Monitoring & Evaluation, Programme Support
  • With the Team lead and Monitoring and Evaluation Advisor, prepare quality and timely monthly, quarterly and annual narrative and financial reports on program achievements as per donor requirements and dissemination of results and lessons learnt for AfyaInfo program improvement
  • Developing of concept notes, proposals, Annual workplan and budget for the continuation of funds and ensuring proposed outputs are aligned with programme outcomes and impacts
  • Development and implementation of frameworks, and strategic plans in health information systems e.g
  • Kenya Health sector strategic plan, Community Health worker Policy and training manuals
  • Training, Managing and reporting on NGI PEPFAR indicators through KePMS from all USG Implementing partners to PEPFAR, CDC and USAID
  • Strengthening MOH; Division of HMIS
  • Strengthening of the MOH and NBS national health information system to improve the availability, quality and use of health data and information as a basis for better results and accountability
  • Build capacity in quality collection, analysis, dissemination, and use of health information at the National and County levels for program planning and management use
  • Development, implementation and enhancement of a centralized, integrated and publicly accessible DHIS2 data repository for storing and maintaining all national health data
  • Developed data Visualization dashboards, Analytics & Storytelling to communicate insights using Tableau, Power BI and Python (Pandas, Matplotlib, seaborn)
  • Assessments and Evaluations
  • Develop protocols, instruments (e.g
  • Questionnaires, checklists, discussion guides and interview forms) for data collection
  • Contribute to assessments and operational research development, analysis and reporting on finding and recommendations e.g WHO SARAM, Health infrastructure Assessments, KMTC Training needs, DDIU Assessment, KDHS 2013/14 and community Health Strategy studies for dissemination and information use

Monitoring and Evaluation

PATH
06.2011 - 07.2012
  • Associate
  • (APHIA+ Western Kenya USAID Project)
  • Designing and implementing M&E activities and special studies for integrated public health programs that include HIV/AIDS, MNCH, FP/RH, Malaria, Nutrition, TB, WASH,VMMC,OVC ,NTDs, Community Health strategy and Social determinants of Health
  • Strengthening of the MOH and NBS national health information system to improve the availability, quality and use of health data and information as a basis for better results and accountability
  • To provide technical assistance to health care facilities including review of performance and quality of service delivery, patient tracing systems, on-the-job mentoring for health workers, DQA/DQI and use of strategic information
  • Assessments and Evaluations -Contribute to assessments and operational research development, analysis and reporting on finding e.g Baseline APHIA Plus assessment, LQAS in Western Province.

Monitoring and Evaluation Intern

ICAP
01.2011 - 05.2011
  • Provide technical assistance to health care facilities including review of performance and quality of service delivery, assessment of M&E strategies, implementation and evaluation of patient tracking systems, data management, data quality assurance, data summarization and analysis
  • Provide expertise for developing indicators for monitoring and evaluation framework.

Monitoring and Evaluation officer intern

IntraHealth International
06.2010 - 08.2010
  • Analysis of quantitative and qualitative Data using SPSS, Atlas-ti 8, Dodoose, QDAP and NVivo8
  • Actively participating in field work as PNA phase II Team leader (in training and data collection using episurveyor)

Co –Consultant and Field Manager

Africa Infectious Disease Village Clinics, Inc
10.2009 - 12.2009
  • Programming and synchronizing Portable Digital Assistants (PDA) used for data capture with desktop applications and GPS/GIS mapping.

Education Specialist

Mutini High School Kitui
09.2001 - 08.2008
  • Curriculum dev: Head of Maths/Science department-Holding regular departmental meetings, assisting newly qualified teachers on subject matters, maintenance of records of work, preparation of schemes of work for all classes
  • Provision of Math/Chem /Comp testing exams for Kitui District

Education

Post Graduate Program - Data Science, Deep and Machine Learning: Making Data-Driven Decisions

Massachusetts Institute of Technology(MIT): Institute for Data, Systems, and Society
06.2024

Post Graduate Program - Data Science, Deep and Machine Learning

University of Cape Town
11.2023

Master of Epidemiology and Biostatistics -

Moi University, School of Public Health and Biostatistics
01.2010

Graduate Diploma - Project Management

Kenya Institute of Management
01.2010

Bachelor of Science - Mathematics, Chemistry and Computer

Egerton University
01.2000

Skills

  • Python
  • R
  • SQL
  • Scikit-learn
  • TensorFlow
  • NumPy
  • Pandas
  • Matplotlib
  • Deep Learning
  • Deep Neural Networks
  • CNN models
  • RNN models
  • NLP
  • SVMs
  • Decision trees
  • K-means clustering
  • PCA
  • Bagging
  • Boosting
  • Stacking
  • A/B Testing
  • ETL
  • Data science pipeline
  • Cleaning
  • Wrangling
  • Pre-processing
  • Modeling
  • Visualization
  • Interpretation
  • Exploratory Data Analysis
  • Unstructured data
  • Structured data
  • Hypothesis testing
  • Regression
  • Prediction
  • Basic Statistics
  • Advanced Statistics
  • Recommendation systems
  • Networking
  • Graphical Models
  • Git
  • GitHub
  • STATA
  • SPSS
  • Survey CTO
  • CsPro
  • ODK
  • KOBO Collect
  • Training and facilitation
  • Business process improvement

Affiliations

  • Center for Epidemiological Modelling and Analysis (CEMA) Kenya
  • Epidemiologic Management & Consulting - Epidemiology and Public Health Consultants
  • UN STATS
  • Monitoring and Evaluation Professional
  • Research, Monitoring and Evaluation
  • Global Public Health - HIV/AIDS, Malaria and TB
  • Applied Biostatistics & Bioinformatics
  • American Evaluation Association
  • DHS User forum
  • STATA, & SPSS users
  • R, Python, Data science
  • AL and ML professionals

Peer Reviewed Publications

  • Olajide A, Tukwasibwe F, Mbithi W: South Sudan Population Estimation Survey, 2021: Up-to-date Modelled Population Estimates, April 2024. Presented in UAPS Conference, Malawi.
  • South Sudan Population Estimation Thematic Reports 2021
  • South Sudan Modelled Population Estimates Report 2021
  • Mohamed A, Franklin E, Maureen J, Sainan Z, Witness C, Wellington M: A GIS-based Modeling Approach to Assess Physical Access to Family Planning Services in South Sudan December 2021
  • E. Gitau, M. Chizororo, N. W Mbithi, U. Gilbert: A Needs Assessment on HIV Response for Children and Adolescents in Kakuma Refugee Camp in Kenya.
  • W. Mbithi, U. Gilbert, E. Gitau, J. Odhiambo, J. Gitonga: Innovative approach to adolescent and Young people information collection and dissemination.
  • A Needs Assessment on Adolescents and decision-makers tool conducted in select counties of Kenya (2015).
  • NASCOP/NACC Kenya Modelled HIV Estimates and Profiles (2016)
  • MOH Kenya Training package prepared with seven (HIS) modules: Implementing the Master Facility List (MFL), Implementing the Master Community Unit List (MCUL), District Health Information Software (DHIS), General Data Management, Data Quality, Data Demand and Information Use and Inpatient Medical Record System and ICD-10 Coding. June 2013.
  • MOH Kenya Training package prepared: Modules Prepared for Training Community Health Workers in Division of Community Health. June 2013.
  • Republic of Kenya DDIU Assessment Report: Nov 2013, AfyaInfo
  • Republic of Kenya National DQA Assessment Report: October 2013, AfyaInfo
  • UNWHO/Kenya Service Availability and Readiness Assessment (SARAM); June 2013
  • Republic of Kenya NHIS Infrastructure Assessment Report: October 2012, AfyaInfo
  • Kenya Medical Training College (KMTC) Training Needs Assessment (TNA) Report on the KMTC HRIO Curriculum: Sep 2012: AfyaInfo Project.
  • Rapid Child Survival Indicator Survey in Western and Eastern Provinces Using the Lot Quality Assurance (LQAS) Methodology June 2012
  • APHIA plus Western Kenya Project: Baseline Health Assessment Report August 16, 2011 TB, PMTCT and ART Component Analysis.
  • Study to assess the appropriateness, leveraging, and sustainability of CDC scaled-up evidence based HIV/AIDS community prevention intervention in Western Kenya
  • NASCOP Evidence Based Intervention assessment for HIV/AIDS prevention intervention activities.

Training

  • 01/01/24, Power BI and Tableau training with MIT Great learning, Allison
  • 01/01/23, Power BI and Tableau training with Simple Learn, Allison, Cousera
  • 09/01/23, UNICEF Global MICS Round 7 Training in Johannesburg, South Africa.
  • 01/01/21, Population modelling using Population data and R software
  • 12/01/19, Geospatial data analysis, Map delineation using High resolution satellite imagery, World Pop, University of Southampton, UK
  • 01/01/19, Training on Power BI and Tableau for data visualization and dashboard, GitHub with MIT Great learning, Allison and Coursera
  • 08/01/19, Mapping Spatial Access to Health Facilities GIS-based Accessibility Analysis Small area Estimation, Lusaka Zambia
  • 10/01/18, International conference on Adolescence HIV; Cape Town South Africa
  • 08/01/18, Sample size determination, Weighting and Sampling for DHS, Training by Measure DHS team, Cleverton USA
  • 01/01/18, Training of Trainers, interviewers and use of ODK for the PHIA Survey, Kenya.
  • 06/01/17, Information presentation and Advanced Analysis using QGIS, Tableau Software, Power BI Pro, in Kenya: By WHO-Brazzaville
  • 06/01/17, UNICEF Results Based Management (RBM) and Theory of change Training in Nairobi, Kenya; By UNICEF-ESARO and NY team
  • 10/01/16, UNICEF Global MICS Round 6 Training in Nairobi, Kenya; By Attila Hancioglu and NY team
  • 01/01/16, Training on MySQL and Python programming language with GeekforGeeks, W3Shools, Allison and Coursera.
  • 09/01/16, Kenya HIV data modelling for Care and treatment and PMTCT Using EPP modelling spectrum, Dr John Stover Avenir Health USA and Oxford College
  • 11/01/14, Global Training on Quality and Quality Improvement Johannesburg, South Africa
  • 09/01/13, Infant and young child feeding certificate from Cornell University
  • 09/01/12, Equity and Social Justice Focused Evaluations, Country-Led M&E Systems and Emerging Trends in Evaluations. Mymande.org UNICEF
  • 09/01/12, Impact Evaluation of Population, Health, and Nutrition Programs, South Africa Measure Evaluation
  • 03/01/12, Performance monitoring and Evaluation (USAID) and Monitoring & Evaluation Plan and Frameworks at Pathwise (PATH e-learning), Data quality at e-learning Global fund.
  • 02/01/12, Data for Decision Making by Dr Hozumi Dai, PATH Seattle
  • 11/01/11, Research development and Ethics. By Terry Elliot, PATH Seattle
  • 07/01/10, Monitoring & Evaluation Fundamentals, Monitoring & Evaluation of HIV Programs, PMTCT, Sexual Reproductive Health, Data Quality, Nutrition, DHS: Data use and Malaria. Self-study at Global Health e-learning center-USAID
  • 07/01/10, Participant in Monitoring and Evaluation workshop. Facilitators: Monitoring & Evaluation dep Team -Intra Health International
  • 01/01/10, Global Quantitative and Qualitative data collection and Analysis, Global Health Program Development and Implementation, George Washington University, U.S.A

Languages

French
Intermediate (B1)
English
Advanced (C1)

Timeline

International Consultant

UNFPA Pacific Sub Regional
07.2024 - 12.2024

Data analysis and ICT Expert

Hellen Keller International/NTAP Consultants
04.2024 - 06.2024

Technical Specialist- Data Scientist and Analytics

UNFPA South Sudan
05.2019 - 02.2024

Data Scientist -HIV Data Analytics

UNICEF KENYA
11.2015 - 05.2019

Senior Monitoring and Evaluation Specialist- CQI

ICAP at Columbia University
11.2013 - 11.2015

Associate

ICF INTERNATIONAL Nairobi
08.2012 - 12.2013

Monitoring and Evaluation

PATH
06.2011 - 07.2012

Monitoring and Evaluation Intern

ICAP
01.2011 - 05.2011

Monitoring and Evaluation officer intern

IntraHealth International
06.2010 - 08.2010

Co –Consultant and Field Manager

Africa Infectious Disease Village Clinics, Inc
10.2009 - 12.2009

Education Specialist

Mutini High School Kitui
09.2001 - 08.2008

Post Graduate Program - Data Science, Deep and Machine Learning

University of Cape Town

Master of Epidemiology and Biostatistics -

Moi University, School of Public Health and Biostatistics

Graduate Diploma - Project Management

Kenya Institute of Management

Bachelor of Science - Mathematics, Chemistry and Computer

Egerton University

Post Graduate Program - Data Science, Deep and Machine Learning: Making Data-Driven Decisions

Massachusetts Institute of Technology(MIT): Institute for Data, Systems, and Society
WELLINGTON NZOKA MBITHI