Projects | Ryan Pontius

Diabetes Classification Case Studies

I developed two machine learning case studies using harmonized open-source CGM datasets from the Glucose-ML collection to evaluate diabetes status classification from participant-level glucose features. The first benchmark trained logistic regression, random forest, and XGBoost models across 13 datasets to classify T1D, T2D, no diabetes, and prediabetes. The second compared single-dataset versus multi-dataset training to evaluate how dataset diversity and sample size influence classification performance.

Python • Scikit-Learn • Machine Learning

View Code

Glucose-ML Standardization Workflows

I developed a Python-based data acquisition and harmonization pipeline for the Glucose-ML collection to support reproducible use of open-source continuous glucose monitoring (CGM) datasets. The workflow automates dataset downloads from public repositories, validates and extracts raw files, and then standardizes heterogeneous CGM data into consistent participant-level formats. It harmonizes dataset-specific differences in file structure, timestamps, glucose units, metadata, and coverage criteria to generate analysis-ready time series CGM data for downstream machine learning and statistical analyses.

Python • Bash

View Code

GONE Workflow

I developed a Snakemake-based pipeline to perform GONE analyses for estimating recent effective population size from whole genome sequencing data. The workflow automates population assignment, VCF filtering, SNP subsampling, and conversion to PLINK formats, and incorporates multiple random seed replicates to improve robustness of inference. It also integrates downstream analyses, including runs of homozygosity and nucleotide diversity, enabling scalable and reproducible population genomic analyses.

Snakemake Pipeline • Python • Bioinformatics • HPC

View Code

Stairway Workflow

I developed a Snakemake-based pipeline to perform Stairway Plot analyses for reconstructing long-term effective population size (Ne) trajectories from reference haplotypes. The workflow integrates outputs from PAV and MSMC, automates input preparation (including reference genome configuration and chromosome extraction), and incorporates validation steps to ensure reliable downstream inference. Applied across 100+ custom reference genomes from the California Conservation Genomics Project, this pipeline supports scalable and reproducible demographic reconstruction.

Snakemake Pipeline • Python • Bash • R • Bioinformatics • HPC

View Code

NCBI SRA/BioSample Metadata Generator

I developed a Python tool to convert MongoDB sample metadata into structured SRA and BioSample sheets, streamlining the NCBI submission process. The tool handles read pair matching, validates sample metadata, and adapts formatting based on species type (plant, vertebrate, or invertebrate).

Python • MongoDB • NCBI

View Code

Coding Projects

Diabetes Classification Case Studies

Glucose-ML Standardization Workflows

GONE Workflow

Stairway Workflow

NCBI SRA/BioSample Metadata Generator