Stephen D. Coleman

PhD student

MRC Biostatistics Unit, Cambridge University


I am a PhD student in the Wallace Group based at the MRC Biostatistics Unit of the University of Cambridge. I am working on integrative clustering methods applied to ‘omics data. Clustering methods aim to improve our ability to interpret this high dimensional data and improve our understanding of human health and disease.

I completed a MSc in Bioinformatics at Wageningen University in 2019. My dissertation subject was an ensemble approach using MCMC chains of suitable for high dimensional, multi-modal datasets. I used this method of perofming inference upon Multiple Dataset Integration, a Bayesian integrative clustering model, applied to gene expression data collected from various tissues and cell types.

Previously I have worked with Paul Kirk, Laurent Gatto and Olly Crook to extend MDI, a Bayesian integrative clustering method, to a semi-supervised predictive tool in spatial proetomics. This is available online in tagm-mdi.

I also worked with Gerrit Gort, Elias Kaiser and Rachel Schipper to model the Farquhar-van Cammerer-Berry model of photosynthesis using mixed-effect models.

My research interests include Bayesian clustering applied to ‘omics data to extract biological information. Within ‘omics we have huge quantities of data, often with several datasets pertaining to the same objects of interest across several views (e.g. genomic, proteomic, methylation, metabolomic, etc.); interpreting this and creating a coherent story of disease is a non-trivial problem. In the course of my PhD I hope to use clustering methods applied to immune mediated disease data to improve our understanding of disease.


  • Clustering methods
  • Computational statistics
  • Immunology
  • Machine Learning


  • MSc in Bioinformatics, 2019

    Wageningen University & Research

  • BA in Mathematics, 2016

    Trinity College, University of Dublin

Recent Posts

An introduction to MDI

Multiple Dataset Integration If we have observed paired datasets \(X_1=(x_{1,1},\ldots, x_{n,1})\), \(X_2=(x_{1,2},\ldots, x_{n,2})\), …