Numéro spécial : Special Issue on Modelling and Inference for Infectious diseases
Construction of semi-Markov genetic-space-time SEIR models and inference
[Construction de modèles stochastiques génético-spatio-temporels et inférence]
Journal de la société française de statistique, Tome 157 (2016) no. 1, pp. 129-152.

Identifier les évènements de transmission d’une maladie infectieuse dans une population hôte est essentiel pour comprendre son épidémiologie et améliorer les mesures de lutte contre la maladie. Les hôtes infectés proches spatialement et temporellement sont souvent supposés être liés, mais les données temporelles et spatiales seules sont généralement compatibles avec de nombreux scénarios de qui a infecté qui. Pour inférer de manière plus précise qui a infecté qui au cours d’une épidémie causée par un pathogène à évolution rapide, des données génomiques sur le pathogène ont été associées aux données spatiales et temporelles. Cependant, la manière d’associer ces données reste aujourd’hui un défit en terme de modélisation et de statistique.

Une des approches récemment développées est basée sur une extension des modèles stochastiques Susceptible-Exposé-Infectieux-Retiré (SEIR). Dans cet article, nous présentons cette extension qui associe (i) un modèle SEIR individu-centré, spatial et semi-markovien pour la dynamique spatio-temporelle du pathogène, et (ii) un modèle markovien d’évolution temporelle des séquences génétiques du pathogène. Le modèle résultant est un modèle à espace d’états incorporant des vecteurs latents de grande dimension. Ensuite, nous décrivons un nouvel algorithme permettant de mener une inférence bayésienne approchée des paramètres du modèle et des variables latentes. Enfin, la capacité de l’algorithme d’estimation à reconstruire les arbres de transmission (c-à-d qui a infecté qui) est évaluée avec une étude simulatoire. Nous nous intéressons tout particulièrement aux performances de la méthode d’inférence lorsque seulement une fraction des données génomiques sur le pathogène est observée.

Identifying transmission links of an infectious disease through a host population is critical to understanding its epidemiology and informing measures for its control. Infected hosts close together in their locations and timings are often thought to be linked, but timing and locations alone are usually consistent with many different scenarios of who infected whom. To infer more reliably who-transmitted-to-whom over the course of a disease outbreak caused by a fast-evolving pathogen, pathogen genomic data have been combined with spatial and temporal data. However, the manner to combine these data remains today a modeling and statistical challenge.

One of the approaches recently proposed is based on an extension of stochastic Susceptible-Exposed-Infectious-Removed (SEIR) models. In this article, we present this extension that combines (i) an individual-based, spatial, semi-Markov SEIR model for the spatio-temporal dynamics of the pathogen, and (ii) a Markovian evolutionary model for the temporal evolution of genetic sequences of the pathogen. The resulting model is a state-space model including latent vectors of high dimension. Then, we describe a new algorithm that allows an approximate Bayesian inference of model parameters and latent variables. Finally, the capacity of the estimation algorithm to reconstruct transmission trees (i.e. who infected whom) is assessed with a simulation study. We especially investigate how the inference method performs when only a fraction of pathogen genomic data is available.

Keywords: Bayesian estimation, Genomic data, Spatiotemporal data, State-space model, Susceptible-Exposed-Infectious-Removed model
Mot clés : Estimation bayésienne, Données génomiques, Données spatiotemporelles, Modèle à espace d’état, Modèle Susceptible-Exposé-Infectieux-Retiré
@article{JSFS_2016__157_1_129_0,
     author = {Soubeyrand, Samuel},
     title = {Construction of {semi-Markov} genetic-space-time {SEIR} models and inference},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {129--152},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {157},
     number = {1},
     year = {2016},
     zbl = {1360.92117},
     language = {en},
     url = {http://archive.numdam.org/item/JSFS_2016__157_1_129_0/}
}
TY  - JOUR
AU  - Soubeyrand, Samuel
TI  - Construction of semi-Markov genetic-space-time SEIR models and inference
JO  - Journal de la société française de statistique
PY  - 2016
SP  - 129
EP  - 152
VL  - 157
IS  - 1
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2016__157_1_129_0/
LA  - en
ID  - JSFS_2016__157_1_129_0
ER  - 
%0 Journal Article
%A Soubeyrand, Samuel
%T Construction of semi-Markov genetic-space-time SEIR models and inference
%J Journal de la société française de statistique
%D 2016
%P 129-152
%V 157
%N 1
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2016__157_1_129_0/
%G en
%F JSFS_2016__157_1_129_0
Soubeyrand, Samuel. Construction of semi-Markov genetic-space-time SEIR models and inference. Journal de la société française de statistique, Tome 157 (2016) no. 1, pp. 129-152. http://archive.numdam.org/item/JSFS_2016__157_1_129_0/

[1] Anderson, Roy M; Donnelly, Christl A; Ferguson, Neil M; Woolhouse, Mark E; Watt, CJ; Udy, HJ; MaWhinney, S; Dunstan, SP; Southwood, TR; Wilesmith, JW Transmission dynamics and epidemiology of BSE in British cattle, Nature, Volume 382 (1996), pp. 779-788

[2] Barbu, Vlad Stefan; Limnios, Nikolaos Semi-Markov chains and hidden semi-Markov models toward applications — Their Use in Reliability and DNA Analysis, Lecture Notes in Statistics, 191, Springer, 2008 | Zbl

[3] Brocchieri, Luciano Phylogenetic Inferences from Molecular Sequences: Review and Critique, Theoretical Population Biology, Volume 59 (2001), pp. 27-40 | Zbl

[4] Cambra, M; Capote, N; Myrta, A; Llácer, G Plum pox virus and the estimated costs associated with sharka disease, EPPO Bulletin, Volume 36 (2006), pp. 202-204

[5] Fitch, Walter M Toward defining the course of evolution: minimum change for a specific tree topology, Systematic Biology, Volume 20 (1971), pp. 406-416

[6] Hartigan, John A Minimum mutation fits to a given tree, Biometrics (1973), pp. 53-65

[7] Huelsenbeck, John P; Bollback, Jonathan P Empirical and hierarchical Bayesian estimation of ancestral states, Systematic biology, Volume 50 (2001), pp. 351-366

[8] Haydon, Daniel T; Kao, Rowland R; Kitching, R Paul The UK foot-and-mouth disease outbreak — the aftermath, Nature Reviews Microbiology, Volume 2 (2004), pp. 675-681

[9] Hall, M; Rambaut, A Epidemic reconstruction in a phylogenetics framework: transmission trees as partitions, arXiv:1406.0428 (2014)

[10] Hanada, Kousuke; Suzuki, Yoshiyuki; Gojobori, Takashi A large variation in the rates of synonymous substitution for RNA viruses and its relationship to a diversity of viral infection and transmission modes, Molecular biology and evolution, Volume 21 (2004), pp. 1074-1080

[11] Hanson-Smith, Victor; Kolaczkowski, Bryan; Thornton, Joseph W Robustness of ancestral sequence reconstruction to phylogenetic uncertainty, Molecular biology and evolution, Volume 27 (2010), pp. 1988-1999

[12] Jombart, Thibaut; Cori, Anne; Didelot, Xavier; Cauchemez, Simon; Fraser, Christophe; Ferguson, Neil Bayesian reconstruction of disease outbreaks by combining epidemiologic and genomic data, PLoS Computational Biology, Volume 10 (2014)

[13] Jenkins, Gareth M; Rambaut, Andrew; Pybus, Oliver G; Holmes, Edward C Rates of molecular evolution in RNA viruses: a quantitative phylogenetic analysis, Journal of molecular evolution, Volume 54 (2002), pp. 156-165

[14] Kimura, M Estimation of evolutionary distances between homologous nucleotide sequences, Proc. Natl Acad. Sci., Volume 78 (1981), pp. 454-458 | Zbl

[15] Kolaczkowski, Bryan; Thornton, Joseph W Performance of maximum parsimony and likelihood phylogenetics when evolution is heterogeneous, Nature, Volume 431 (2004), pp. 980-984

[16] Lemey, P; Rambaut, A; Drummond, A J; Suchard, M A Bayesian phylogeography finds its roots, PLoS Comput. Biol., Volume 5 (2009)

[17] Lemey, P; Rambaut, A; Welch, J J; Suchard, M A Phylogeography takes a relaxed random walk in continuous space and time, Mol. Biol. Evol., Volume 27 (2010), pp. 1877-1885

[18] Mollentze, N; Nel, L H; Townsend, S; le Roux, K; Hampson, K; Haydon, D T; Soubeyrand, S A Bayesian approach for inferring the dynamics of partially observed endemic infectious diseases from space-time-genetic data, Proceedings of the Royal Society B, Volume 281 (2014) | DOI

[19] Morelli, M J; Thébaud, G; Chadœuf, J; King, D P; Haydon, D T; Soubeyrand, S A Bayesian inference framework to reconstruct transmission trees using epidemiological and genetic data, Plos Computation Biology, Volume 8 (2012)

[20] Pirie, Michael D; Humphreys, Aelys M; Antonelli, Alexandre; Galley, Chloé; Linder, H Peter Model uncertainty in ancestral area reconstruction: a parsimonious solution?, Taxon, Volume 61 (2012), pp. 652-664

[21] Pybus, O G; Suchard, M A; Lemey, P; Bernardin, F J; Rambaut, A; Crawford, F W; Gray, R R; Arinaminpathy, N; Stramer, S L; Busch, M P; Delwart, E L Unifying the spatial epidemiology and molecular evolution of emerging epidemics, Proceedings of the National Academy of Sciences, Volume 109 (2012), pp. 15066-15071

[22] Rambaut, A; Pybus, O G; Nelson, M I; Viboud, C; Taubenberger, J K; Holmes, E C The genomic and epidemiological dynamics of human influenza A virus, Nature, Volume 453 (2008), pp. 615-619

[23] Rasmussen, David A; Ratmann, Oliver; Koelle, Katia Inference for nonlinear epidemiological models using genealogies and time series, PLoS computational biology, Volume 7 (2011)

[24] Stadler, T; Bonhoeffer, S Uncovering epidemiological dynamics in heterogeneous host populations using phylogenetic methods, Phil. Trans. Roy. Soc. B, Volume 368 (2013) | DOI

[25] Sankoff, David; Rousseau, Pascale Locating the vertices of a Steiner tree in an arbitrary metric space, Mathematical Programming, Volume 9 (1975), pp. 240-246 | Zbl

[26] Ypma, R. J. F.; Bataille, A. M. A.; Stegeman, A.; Koch, G.; Wallinga, J.; van Ballegooijen, W. M. Unravelling transmission trees of infectious diseases by combining genetic and epidemiological data, Proceedings of the Royal Society B, Volume 279 (2012), pp. 444-450

[27] Ypma, Rolf J.F.; Jonges, Marcel; Bataille, Arnaud; Stegeman, Arjan; Koch, Guus; van Boven, Michiel; Koopmans, Marion; van Ballegooijen, W. Marijn; Wallinga, Jacco Genetic Data Provide Evidence for Wind-Mediated Transmission of Highly Pathogenic Avian Influenza, Journal of Infectious Diseases, Volume 207 (2013), pp. 730-735