Flexible modelling in statistics: past, present and future
Journal de la société française de statistique, Tome 156 (2015) no. 1, pp. 76-96.

Dans des temps où de plus en plus de données deviennent accessibles et où ces données sont de plus en plus complexes (asymétrie évidente, queues lourdes ou légères), la modélisation flexible est devenue une tâche essentielle pour les statisticiens ainsi que pour les chercheurs et praticiens de domaines tels que l’économie, la finance ou les sciences environnementales. Ceci est reflété par la richesse de propositions existantes pour des distributions flexibles ; des exemples connus sont la skew-normale d’Azzalini, la g -et- h de Tukey, des distributions de mixture ainsi que des distributions deux-morceaux, pour ne citer que celles-là. Mon but dans cet article est de donner une introduction à ce domaine de recherche, destinée à être utile à la fois pour des novices et des professionnels du domaine. Après une brève description du courant de recherche lui-même, je vais raconter l’histoire passionnante de la modélisation flexible, mettant en vedette des héros emblématiques comme Edgeworth et Pearson, puis je vais décrire trois familles de distributions flexibles qui figurent parmi les plus utilisées, et finalement donner un aperçu sur le futur de la modélisation flexible en posant des questions ouvertes stimulantes.

In times where more and more data become available and where the data exhibit rather complex structures (significant departure from symmetry, heavy or light tails), flexible modelling has become an essential task for statisticians as well as researchers and practitioners from domains such as economics, finance or environmental sciences. This is reflected by the wealth of existing proposals for flexible distributions; well-known examples are Azzalini’s skew-normal, Tukey’s g -and- h , mixture and two-piece distributions, to cite but these. My aim in the present paper is to provide an introduction to this research field, intended to be useful both for novices and professionals of the domain. After a description of the research stream itself, I will narrate the gripping history of flexible modelling, starring emblematic heroes from the past such as Edgeworth and Pearson, then depict three of the most used flexible families of distributions, and finally provide an outlook on future flexible modelling research by posing challenging open questions.

Keywords: heavy and light tails, skewness and kurtosis, skew-normal distribution, symmetry and normality tests, transformation approach, two-piece distributions
Mots clés : queues lourdes et légères, asymétrie et kurtosis, distribution skew-normalen tests de symétrie et de normalité, approche par transformation, distributions deux-pièces
@article{JSFS_2015__156_1_76_0,
     author = {Ley, Christophe},
     title = {Flexible modelling in statistics: past, present and future},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {76--96},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {156},
     number = {1},
     year = {2015},
     mrnumber = {3338241},
     zbl = {1316.62023},
     language = {en},
     url = {http://archive.numdam.org/item/JSFS_2015__156_1_76_0/}
}
TY  - JOUR
AU  - Ley, Christophe
TI  - Flexible modelling in statistics: past, present and future
JO  - Journal de la société française de statistique
PY  - 2015
SP  - 76
EP  - 96
VL  - 156
IS  - 1
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2015__156_1_76_0/
LA  - en
ID  - JSFS_2015__156_1_76_0
ER  - 
%0 Journal Article
%A Ley, Christophe
%T Flexible modelling in statistics: past, present and future
%J Journal de la société française de statistique
%D 2015
%P 76-96
%V 156
%N 1
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2015__156_1_76_0/
%G en
%F JSFS_2015__156_1_76_0
Ley, Christophe. Flexible modelling in statistics: past, present and future. Journal de la société française de statistique, Tome 156 (2015) no. 1, pp. 76-96. http://archive.numdam.org/item/JSFS_2015__156_1_76_0/

[1] Azzalini, A.; Arellano-Valle, R. B. Maximum penalized likelihood estimation for skew-normal and skew-t distributions, J. Statist. Plann. Infer., Volume 143 (2013), pp. 419-433 | MR | Zbl

[2] Arnold, B. C.; Beaver, R. J. Skewed multivariate models related to hidden truncation and/or selective reporting (with discussion), Test, Volume 11 (2002), pp. 7-54 | MR | Zbl

[3] Azzalini, A.; Capitanio, A. Distributions generated by perturbation of symmetry with emphasis on a multivariate skew-t distribution, J. Roy. Stat. Soc. Ser. B, Volume 65 (2003) no. 367-389 | MR | Zbl

[4] Azzalini, A.; Capitanio, A. The Skew-Normal and Related Families, Cambridge: IMS Monographs, Cambridge University Press, 2014 | MR

[5] Azzalini, A.; Capitanio, A. Statistical applications of the multivariate skew-normal distributions, J. Roy. Stat. Soc. Ser. B, Volume 61 (1999), pp. 579-602 | MR | Zbl

[6] Azzalini, A.; Dalla Valle, A. The multivariate skew-normal distribution, Biometrika, Volume 83 (1996), pp. 715-726 | MR | Zbl

[7] Azzalini, A.; Genton, M. G. Robust likelihood methods based on the skew-t and related distributions, Internat. Statist. Rev., Volume 76 (2008), pp. 106-129 | Zbl

[8] Arnold, B. C.; Groeneveld, R. A. Measuring skewness with respect to the mode, Amer. Statist., Volume 49 (1995), pp. 34-38 | MR

[9] Aigner, D. J.; Lovell, C. A. K.; Schmidt, P. Formulation and estimation of stochastic frontier production function model, J. Economet., Volume 12 (1977), pp. 21-37 | MR | Zbl

[10] Allard, D.; Naveau, P. A new spatial skew-normal random field model, Comm. Statist. Theor. Meth., Volume 39 (2007), pp. 1821-1834 | MR | Zbl

[11] Azzalini, A.; Regoli, G. Some properties of skew-symmetric distributions, Ann. Inst. Statist. Math., Volume 64 (2012), pp. 857-879 | MR | Zbl

[12] Azzalini, A.; Regoli, G. The work of Fernando de Helguero on non-normality arising from selection, Chil. J. Statist., Volume 3 (2012), pp. 113-128 | MR | Zbl

[13] Arellano-Valle, R. B.; Azzalini, A. The centred parametrization for the multivariate skew-normal distribution, J. Multivariate Anal., Volume 99 (2008), p. 1362-1382. Corrigendum: vol. 100 (2009), p. 816 | MR | Zbl

[14] Arellano-Valle, R. B.; Gómez, H. W.; Quintana, F. A. Statistical inference for a general class of asymmetric distributions, J. Statist. Plann. Infer., Volume 128 (2005), pp. 427-443 | MR | Zbl

[15] Azzalini, A. The skew-normal distribution and related multivariate families (with discussion), Scand. J. Statist., Volume 32 (2005), pp. 159-188 | MR | Zbl

[16] Azzalini, A. A class of distributions which includes the normal ones, Scand. J. Statist., Volume 12 (1985), pp. 171-178 | MR | Zbl

[17] Box, G. E. P.; Cox, D. R. An analysis of transformations, J. Roy. Stat. Soc. Ser. B, Volume 26 (1964), pp. 211-252 | MR | Zbl

[18] Birnbaum, Z. W. Effect of linear truncation on a multinormal population, Ann. Math. Statist., Volume 21 (1950), pp. 272-279 | Zbl

[19] Bauwens, L.; Laurent, S. A new class of multivariate skew densities, with application to generalized autoregressive conditional heteroscedasticity models, J. Bus. Econom. Statist., Volume 23 (2005), pp. 346-354

[20] Balanda, K. P.; MacGillivray, H. L. Kurtosis: a critical review, Amer. Statist., Volume 42 (1988), pp. 111-119

[21] Barndorff-Nielsen, O.; Kent, J.; Sorensen, M. Normal variance-mean mixtures and z distributions, Internat. Statist. Rev., Volume 50 (1982), pp. 145-159 | Zbl

[22] Cassart, D.; Hallin, M.; Paindaveine, D. Optimal detection of Fechner-asymmetry, J. Statist. Plann. Infer., Volume 138 (2008), pp. 2499-2525 | Zbl

[23] Critchley, F.; Jones, M. C. Asymmetry and gradient asymmetry functions: density-based skewness and kurtosis, Scand. J. Statist., Volume 35 (2008), pp. 415-437 | Zbl

[24] Cherubini, U.; Luciano, E.; Vecchiato, W. Copula Methods in Finance, New York: Wiley, 2004 | Zbl

[25] Cover, T. M.; Thomas, J. A. Elements of Information Theory, Wiley-interscience, 2006 | Zbl

[26] Charemza, W.; Vela, C. D.; Makarova, S. Too many skew-normal distributions? The Practitioners perspective. (2013) (Discussion Papers in Economics 13/07)

[27] de Helguero, F. Sulla rappresentazione analitica delle curve abnormali, Atti del IV Congresso Internazionale dei Matematici (Roma, 6-11 Aprile 1908), volume III, sezione III-B, Roma: R. Accademia dei Lincei, 1909, pp. 288-299 | JFM

[28] de Helguero, F. Sulla rappresentazione analitica delle curve statistiche, Giorn. Econ., Volume 38 (1909), pp. 241-265

[29] De Forest, E. On an asymmetrical probability curve, Analyst, Volume 9 (1882), pp. 135-143 | JFM

[30] De Forest, E. On an asymmetrical probability curve, Analyst, Volume 10 (1883), pp. 67-74 | JFM

[31] de Vries, H. Ueber halbe Galton-Curven als Zeichen discontinuirlicher Variation, Ber. Deutsch. Bot. Ges., Volume 12 (1894), pp. 197-207

[32] Duerinckx, M.; Ley, C.; Swan, Y. Maximum likelihood characterization of distributions, Bernoulli, Volume 20 (2014), pp. 775-802 | Zbl

[33] Edgeworth, F. Y. The law of error and the elimination of chance, Philos. Mag., Volume 21 (1886), pp. 308-324 | JFM

[34] Edgeworth, F. Y. On the representation of statistics by mathematical formulae, J. R. Statist. Soc., Volume 61 (1898), pp. 670-700

[35] Fujisawa, H.; Abe, T. A family of multivariate skew distributions with monotonicity of skewness (2014) (Manuscript)

[36] Fujisawa, H.; Abe, T. A family of skew-unimodal distributions with mode-invariance through transformation of scale, Statistical Methodology (2015) | Zbl

[37] Fechner, G. T. Kollektivmasslehre, Leipzig: Engelmann, 1897 | JFM

[38] Field, C.; Genton, M. G. The multivariate g -and- h distribution, Technometrics, Volume 48 (2006), pp. 104-111

[39] Fischer, M.; Klein, I. Kurtosis modelling by means of the J -transformation, Allgemeines Statistisches Archiv, Volume 88 (2004), pp. 35-50 | Zbl

[40] Ferreira, J. T. A. S.; Steel, M. F. J. A constructive representation of univariate skewed distributions, J. Amer. Statist. Assoc., Volume 101 (2006), pp. 823-829 | Zbl

[41] Ferreira, J. T. A. S.; Steel, M. F. J. A new class of skewed multivariate distributions with applications to regression analysis, Statist. Sinica, Volume 17 (2007), pp. 505-529 | Zbl

[42] Ferreira, J. T. A. S.; Steel, M. F. J. Model comparison of coordinate-free multivariate skewed distributions with an application to stochastic frontiers, J. Economet., Volume 137 (2007), pp. 641-673 | Zbl

[43] Fernández, C.; Steel, M. F. J. On Bayesian modeling of fat tails and skewness, J. Amer. Statist. Assoc., Volume 93 (1998), pp. 359-371 | Zbl

[44] Gauss, C. F. Theoria motus corporum coelestium in sectionibus conicis solem ambientium, Cambridge Library Collection. Cambridge University Press, 1809 | Zbl

[45] Genton, M. G. Skew-elliptical Distributions and Their Applications: A Journey Beyond Normality, Edited volume, Boca Raton, FL: Chapman and Hall/CRC, 2004 | Zbl

[46] Hald, A. A History of Mathematical Statistics from 1750 to 1930, Wiley, New York, 1998 | Zbl

[47] Hansen, B. E. Autoregressive conditional density estimation, Internat. Econ. Rev., Volume 35 (1994), pp. 705-730 | Zbl

[48] Hallin, M.; Ley, C. Skew-symmetric distributions and Fisher information - a tale of two densities, Bernoulli, Volume 18 (2012), pp. 747-763 | Zbl

[49] Hallin, M.; Ley, C. Skew-symmetric distributions and Fisher information: the double sin of the skew-normal, Bernoulli, Volume 20 (2014), pp. 1432-1453 | Zbl

[50] Haynes, M. A.; MacGillivray, H. L.; Mergersen, K. L. Robustness of ranking and selection rules using generalized g and k distributions, J. Statist. Plann. Infer., Volume 65 (1997), pp. 45-66 | Zbl

[51] Heinz, G.; Peterson, L. J.; Johnson, R. W.; Kerk, G. J. Exploring relationships in body dimensions, Journal of Statistics Education (online only), Volume 11 (2003) www.amstat.org/publications/jse/v11n2/datasets.heinz.html

[52] Jones, M. C.; Anaya-Izquierdo, K. On parameter orthogonality in symmetric and skew models, J. Statist. Plann. Infer., Volume 141 (2011), pp. 758-770 | Zbl

[53] Johnson, N. L.; Kotz, S.; Balakrishnan, N. Continuous Univariate Distributions, New York: John Wiley and Sons, 1994 | Zbl

[54] Johnson, N. L. Systems of frequency curves generated by methods of translation, Biometrika, Volume 36 (1949), pp. 149-176 | Zbl

[55] Jones, M. C. Families of distributions arising from distributions of order statistics, Test, Volume 13 (2004), pp. 1-43 | Zbl

[56] Jones, M. C. Distributions generated by transformation of scale using an extended Schlömilch transformation, Sankhya A, Volume 72 (2010), pp. 359-375 | Zbl

[57] Jones, M. C. Generating distributions by transformation of scale, Statist. Sinica, Volume 24 (2014), pp. 749-772 | Zbl

[58] Jones, M. C. On bivariate transformation of scale distributions, Comm. Statist. Theor. Meth. (2015)

[59] Jones, M. C. On families of distributions with shape parameters (with discussion), Internat. Statist. Rev. (2015)

[60] Jones, M. C.; Pewsey, A. Sinh-arcsinh distributions, Biometrika, Volume 96 (2009), pp. 761-780 | Zbl

[61] Jones, M. C.; Rosco, J. F.; Pewsey, A. Skewness-invariant measures of kurtosis, Amer. Statist., Volume 65 (2011), pp. 89-95

[62] Kapteyn, J. C. Skew Frequency Curves in Biology and Statistics, Groningen: Noordhoff, 1903 | JFM

[63] Kurowicka, D.; Joe, H. Dependence Modeling: Vine Copula Handbook, World Scientific: Singapore, 2010

[64] Kotz, S.; Vicari, D. Survey of developments in the theory of continuous skewed distributions, METRON, Volume 63 (2005), pp. 225-261 | Zbl

[65] Ley, C. Skew distributions, Encyclopedia of Environmetrics Second Edition, A.-H. El-Shaarawi and W. Piegorsch (eds), John Wiley & Sons Ltd, Chichester, UK, 2012, pp. 1944-1949

[66] Lee, C.; Famoye, F.; Alzaatreh, A. Methods for generating families of univariate continuous distributions in the recent decades, WIREs Comput. Statist., Volume 5 (2013), pp. 219-238

[67] Ley, C.; Paindaveine, D. Le Cam optimal tests for symmetry against Ferreira and Steel’s general skewed distributions, J. Nonparam. Statist., Volume 21 (2009), pp. 943-967 | Zbl

[68] Ley, C.; Paindaveine, D. Multivariate skewing mechanisms: a unified perspective based on the transformation approach, Stat. Probab. Lett., Volume 80 (2010), pp. 1685-1694 | Zbl

[69] Ley, C.; Paindaveine, D. On Fisher information matrices and profile log-likelihood functions in generalized skew-elliptical models, METRON, Volume 68, Special Issue on “Skew-symmetric and flexible distributions” (2010), pp. 235-250 | Zbl

[70] Ley, C.; Paindaveine, D. On the singularity of multivariate skew-symmetric models, J. Multivariate Anal., Volume 101 (2010), pp. 1434-1444 | Zbl

[71] Ley, C.; Paindaveine, D. Discussion of “On families of distributions with shape parameters” by M. C. Jones, Internat. Statist. Rev. (2015) (To appear.)

[72] Ley, C.; Verdebout, T. Skew-rotsymmetric distributions on unit spheres and related efficient inferential procedures (2014) (ECARES Working Paper 2014-46)

[73] McWilliams, T. P. A distribution-free test for symmetry based on a runs statistic, J. Amer. Statist. Assoc., Volume 85 (1990), pp. 1130-1133

[74] Mudholkar, G. S.; Hutson, A. D. The epsilon-skew-normal distribution for analyzing near-normal data, J. Statist. Plann. Infer., Volume 83 (2000), pp. 291-309 | Zbl

[75] McLachlan, G.; Peel, D. Finite Mixture Models, Wiley Series in Probability and Statistics, 2000 | Zbl

[76] Meeusen, W.; van den Broeck, J. Efficiency estimation from Cobb-Douglas production functions with composed error, Internat. Econ. Rev., Volume 18 (1977), pp. 435-444 | Zbl

[77] Nelsen, R. B. An Introduction to Copulas, Second Edition, Springer: New York, 2006 | Zbl

[78] Naveau, P.; Genton, M. G.; Shen, X. A skewed Kalman filter, J. Multivariate Anal., Volume 94 (2005), pp. 382-400 | Zbl

[79] O’Hagan, A.; Leonard, T. Bayes estimation subject to uncertainty about parameter constraints, Biometrika, Volume 63 (1976), pp. 201-203 | Zbl

[80] Pearson, K. Mathematical contributions to the theory of evolution. X. Supplement to a memoir on skew variation, Phil. Trans. R. Soc. Lond. A, Volume 197 (1901), pp. 443-459 | JFM

[81] Pearson, K. Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und Pearson. A rejoinder, Biometrika, Volume 4 (1905), pp. 169-212 | JFM

[82] Pearson, K. Mathematical contributions to the theory of evolution, XIX: Second supplement to a memoir on skew variation, Phil. Trans. R. Soc. Lond. A, Volume 216 (1916), pp. 429-457 | JFM

[83] Pearson, K. Asymmetrical frequency curves, Nature, Volume 48 (1893), pp. 615-616 | JFM

[84] Pearson, K. Contributions to the mathematical theory of evolution. II. Skew variation in homogeneous material, Phil. Trans. R. Soc. Lond. A, Volume 186 (1895), pp. 343-414 | JFM

[85] Pearson, K. On skew probability curves, Nature, Volume 52 (1895), p. 317-317 | DOI | JFM

[86] Pewsey, A. Invited discussion of “On families of distributions with shape parameters” by M. C. Jones, Internat. Statist. Rev. (2015)

[87] Poincaré, H. Calcul des Probabilités, Carré-Naud, Paris, 1896 | JFM

[88] Pretorius, S. J. Skew bivariate frequency surfaces, examined in the light of numerical illustrations, Biometrika, Volume 22 (1930), pp. 109-223 | JFM

[89] Randles, R. H.; Fligner, M. A.; Policello, G. E.; Wolfe, D. A. An asymptotically distribution-free test for symmetry versus asymmetry, J. Amer. Statist. Assoc., Volume 75 (1980), pp. 168-172 | Zbl

[90] Ranke, K. E.; Greiner, A. Das Fehlergesetz und seine Verallgemeinerungen durch Fechner und Pearson in ihrer Tragweite fur die Anthropologie, Archiv für Anthropologie, Volume 2 (1904), pp. 295-332

[91] Rayner, G. D.; MacGillivray, H. L. Numerical maximum likelihood estimation for the g -and- k and generalized g -and- h distributions, Stat. Comput., Volume 12 (2002), pp. 57-75 | Zbl

[92] Rieck, J. R.; Nedelman, J. R. A log-linear model for the Birnbaum-Saunders distribution, Technometrics, Volume 33 (1991), pp. 51-60 | Zbl

[93] Rubio, F. J.; Ogundimu, E. O.; Hutton, J. L. Robust modelling using two-piece sinh-arcsinh distributions (2014) (arXiv:1307.6021)

[94] Rosco, J. F. Aplicaciones de la Transformacion Sinh-arcsinh y Problemas Relacionados, Universidad de Extremadura (2012) (Ph. D. Thesis)

[95] Rubio, F. J.; Steel, M. F. J. Inference in two-piece location-scale models with Jeffreys priors (with discussion), Bayesian Analysis, Volume 9 (2014), pp. 1-22 | Zbl

[96] Rubio, F. J. Modelling of kurtosis and skewness : Bayesian inference and distribution theory, University of Warwick (2013) (Ph. D. Thesis)

[97] Razali, N. M.; Wah, Y. B. Power comparisons of Shapiro-Wilk, Kolmogorov-Smirnov, Lilliefors and Anderson-Darling tests, J. Statist. Model. Analytics, Volume 2 (2011), pp. 21-33

[98] Serfling, R. J. Multivariate symmetry and asymmetry, Encyclopedia of Statistical Sciences, Second Edition (S. Kotz, N. Balakrishnan, C.B. Read and B. Vidakovic, eds.), Vol. 8, Wiley, 2006, pp. 5338-5345

[99] Sklar, A. Fonctions de répartition à n dimensions et leurs marges, Publications de l’Institut de Statistique de l’Université de Paris, Volume 8 (1959), pp. 229-231 | Zbl

[100] Smith, R. L.; Naylor, J. C. A comparison of maximum likelihood and Bayesian estimators for the three-parameter Weibull distribution, J. Roy. Stat. Soc. Ser. C, Volume 36 (1987), pp. 358-369

[101] Steel, M. F. J.; Rubio, F. J. Discussion of “On families of distributions with shape parameters” by M. C. Jones, Internat. Statist. Rev. (2015)

[102] Stamhuis, I. H.; Seneta, E. Pearson’s statistics in the Netherlands and the astronomer Kapteyn, Internat. Statist. Rev., Volume 77 (2009), pp. 96-117

[103] Stigler, S. M. Francis Ysidro Edgeworth, statistician (with discussion), J. Roy. Stat. Soc. Ser. A, Volume 141 (1978), pp. 287-322 | Zbl

[104] Stigler, S. M. The history of statistics: The measurement of uncertainty before 1900, Cambridge and London: The Belknap Press of Harvard University Press, 1986 | Zbl

[105] Tukey, J. W. Modern techniques in data analysis, NSF-sponsored Regional Research Conference, Southern Massachusetts University, North Darthmout, MA (1977)

[106] Umbach, D. The effect of the skewing distribution on skew-symmetric families, Soochow Journal of Mathematics, Volume 33 (2007), pp. 657-668 | Zbl

[107] Wallis, K. F. The two-piece normal, binormal, or double Gaussian distribution: its origin and rediscoveries, Statist. Sci., Volume 29 (2014), pp. 106-112 | Zbl

[108] Wang, J.; Boyer, J.; Genton, M. G. A skew-symmetric representation of multivariate distribution, Statist. Sinica, Volume 14 (2004), pp. 1259-1270 | Zbl