Plasma Proteomics and Machine Learning for Parkinson's Disease Prediction
Parkinson's disease is difficult to diagnose early, and there are no reliable blood-based biomarkers for population screening. This project applies machine learning to plasma protein data from over 50,000 UK Biobank participants to predict Parkinson's disease before clinical diagnosis.
- Research question: can plasma proteomics support earlier Parkinson's disease prediction, and what do the model errors reveal about disease heterogeneity?
- Data: UK Biobank plasma proteomics across more than 52,000 participants, with external validation in PPMI.
- Method: supervised machine learning methods, SHAP-based explanation, age-stratified calibration analysis, and misclassification analysis.
- Outcome: interpretable risk modelling with explicit transfer checks across cohorts.
- Insight: false positive predictions systematically resemble one proteomic subtype more than another, revealing that model errors reflect biological overlap rather than random noise.
- Takeaway: current Parkinson's protein panels may not cleanly distinguish early neurodegeneration from metabolic comorbidity, so future diagnostic tools need to account for disease heterogeneity to be clinically reliable.