Flagship Case Study

Plasma Proteomics and Machine Learning for Parkinson's Disease Prediction

Parkinson's disease is difficult to diagnose early, and there are no reliable blood-based biomarkers for population screening. This project applies machine learning to plasma protein data from over 50,000 UK Biobank participants to predict Parkinson's disease before clinical diagnosis.

  • Research question: can plasma proteomics support earlier Parkinson's disease prediction, and what do the model errors reveal about disease heterogeneity?
  • Data: UK Biobank plasma proteomics across more than 52,000 participants, with external validation in PPMI.
  • Method: supervised machine learning methods, SHAP-based explanation, age-stratified calibration analysis, and misclassification analysis.
  • Outcome: interpretable risk modelling with explicit transfer checks across cohorts.
  • Insight: false positive predictions systematically resemble one proteomic subtype more than another, revealing that model errors reflect biological overlap rather than random noise.
  • Takeaway: current Parkinson's protein panels may not cleanly distinguish early neurodegeneration from metabolic comorbidity, so future diagnostic tools need to account for disease heterogeneity to be clinically reliable.
Project Cards
Project 1

Plasma Proteomics and Machine Learning for Parkinson's Disease Prediction

Predict Parkinson's disease before clinical diagnosis from plasma proteomics and validate the learned signal in PPMI.

  • Data: UK Biobank plasma proteomics across more than 52,000 participants, with external validation in PPMI.
  • Method: supervised machine learning, SHAP-based explanation, age-stratified calibration, and misclassification analysis.
  • Outcome: interpretable risk modelling with explicit transfer checks across cohorts.
Project 2

Multimodal EHR and proteomics fusion

Fuse EHR and proteomics to learn patient representations for stratification and prediction.

  • Data: structured EHR with targeted proteomics.
  • Method: deep learning and representation learning for multimodal fusion.
  • Outcome: ongoing work on subgroup discovery and interpretable latent structure.
Project 3

Sleep, proteomics, and Parkinson's disease risk

Test whether sleep disturbances contribute to Parkinson's risk through proteomic and digital biomarker pathways.

  • Data: UK Biobank sleep, proteomics, and related biomarker signals.
  • Method: Mendelian randomisation and causal machine learning.
  • Outcome: ongoing work on causal pathways and mediator discovery.
Project 4

Transcriptomics in REM behavioural disorder

Quantify alpha-synucleinopathic phenoconversion risk from whole-blood transcriptome data.

  • Data: whole-blood transcriptome profiles and bulk RNA-seq workflows.
  • Method: transcriptomic preprocessing and downstream modelling pipelines.
  • Outcome: supervised project work on phenoconversion risk estimation.
Project 5

Single-nucleus analysis of motor cortex in Parkinson's disease

Identify disease-relevant cell populations and molecular programs in the primary motor cortex.

  • Data: single-nucleus transcriptomic profiles.
  • Method: cell-type specific analysis and molecular signature discovery.
  • Outcome: supervised project work on disease-relevant cell populations.
Methods

Core technical capabilities.

Clinical proteomics

Targeted proteomics platforms including Olink and SomaLogic are used to define predictive signatures and mechanistic leads.

Representation learning

Latent feature learning across structured and multimodal patient data to capture pre-diagnostic patterns, comorbidity structure, and clinical trajectories.

Deep learning and digital biomarkers

Deep learning pipelines for smartwatch and remote-monitoring data support continuous behavioural and physiological phenotyping outside clinic walls.