Learning transition times in event sequences: the Event-Based Hidden Markov Model of disease progression


Progressive diseases worsen over time and are characterised by monotonic change in features that track disease progression. Here we connect ideas from two formerly separate methodologies – event-based and hidden Markov modelling – to derive a new generative model of disease progression. Our model can uniquely infer the most likely group-level sequence and timing of events (natural history) from limited datasets. Moreover, it can infer and predict individual-level trajectories (prognosis) even when data are missing, giving it high clinical utility. Here we derive the model and provide an inference scheme based on the expectation maximisation algorithm. We use clinical, imaging and biofluid data from the Alzheimer’s Disease Neuroimaging Initiative to demonstrate the validity and utility of our model. First, we train our model to uncover a new group-level sequence of feature changes in Alzheimer’s disease over a period of ∼17.3 years. Next, we demonstrate that our model provides improved utility over a continuous time hidden Markov model by area under the receiver operator characteristic curve ∼0.23. Finally, we demonstrate that our model maintains predictive accuracy with up to $50%$ missing data. These results support the clinical validity of our model and its broader utility in resource-limited medical applications.