This thesis explores some contemporary aspects of information theory, from source coding to issues of model selection. We first consider the problem of coding memoryless sources on a countable, infinite alphabet. As it is impossible to provide a solution which is both efficient and general, two approaches are considered : we first establish conditions under which the entropic rate can be reached, and we consider restricted classes for which tail probabilities are controlled. The second approach does not set any condition on the sources but provides a partial solution by coding only a part of the information - the pattern - which captures the repetitions in the message.
In order to study more complex processes, we come back to the case of finite memory sources on a finite alphabet : it has given rise to many works and efficient algorithms like the Context Tree Weighting (CTW) Method. We show here that this method is also efficient on a non-parametric class of infinite memory sources: the renewal processes.
We show then that the ideas on which CTW is based lead to a consistent estimator of the memory structure of a process, when this structure is finite. In fact, we complete the study of the BIC context tree estimator for Variable Length Markov Chains. In the last part, it is shown how similar ideas can be generalized for more complex sources on a (countable or not) infinite alphabet. We obtain consistent estimators for the order of hidden Markov models with Poisson and Gaussian emission.
Ce travail de thèse explore quelques aspects contemporains de la théorie de l’information allant de la théorie du codage à certains problèmes de choix de modèles. Nous y considérons d’abord le problème du codage de sources sans mémoire émettant dans un alphabet infini dénombrable. Comme il est impossible d'y apporter une solution générale et efficace, deux approches sont utilisées: dans un premier temps nous établissons des conditions dans lesquelles le taux entropique peut être approché, et nous nous restreignons à des classes pour lesquelles les queues de probabilités sont contrôlées. Dans un second temps, il n’est posé aucune restriction sur la source, il est possible de fournir une solution partielle en codant seulement une partie de l’information - le motif - qui capture les répétitions contenues dans le message.
Pour arriver à l'étude de processus plus complexes, nous revenons sur le cas de sources à mémoire finie sur un alphabet fini, qui a donné lieu à beaucoup de travaux, ainsi qu'à des algorithmes efficaces comme la Context Tree Weighting (CTW) Method. Nous prouvons ici que cet algorithme est également efficace sur une classe non paramétrique de sources a mémoire infinie : les sources de renouvellement.
Nous montrons ensuite que les idées sous-jacentes à la méthode CTW permettent de construire un estimateur consistant de la structure de mémoire d’un processus quand celle-ci est finie : nous complètons l’étude de l’estimateur BIC pour les chaînes de Markov à longueur variable. Dans une dernière partie, il est montré qu’une telle approche est généralisable dans un cadre plus large de sources émettant dans un alphabet infini, dénombrable ou non. On obtient ainsi des estimateurs consitants de l’ordre de chaînes de Markov cachées à émission poissonienne et gaussienne.
Mots-clés : BIC, Context trees, CTW, HMM, infinite alphabet, MDL, minimax, pattern coding, renewal processes, source coding, VLMC
@phdthesis{BJHTUP11_2006__0708__P0_0, author = {Garivier, Aur\'elien}, title = {Mod\`eles contextuels et alphabets infinis en th\'eorie de l'information}, series = {Th\`eses d'Orsay}, publisher = {Universite Paris-Sud Facult\'e des Sciences d'Orsay}, number = {708}, year = {2006}, language = {fr}, url = {http://archive.numdam.org/item/BJHTUP11_2006__0708__P0_0/} }
TY - BOOK AU - Garivier, Aurélien TI - Modèles contextuels et alphabets infinis en théorie de l'information T3 - Thèses d'Orsay PY - 2006 IS - 708 PB - Universite Paris-Sud Faculté des Sciences d'Orsay UR - http://archive.numdam.org/item/BJHTUP11_2006__0708__P0_0/ LA - fr ID - BJHTUP11_2006__0708__P0_0 ER -
Garivier, Aurélien. Modèles contextuels et alphabets infinis en théorie de l'information. Thèses d'Orsay, no. 708 (2006), 150 p. http://numdam.org/item/BJHTUP11_2006__0708__P0_0/
[AGM04] Asymptotic distribution and power of the likelihood ratio test for mixtures : bounded and unbounded case. Submitted, 2004. | MR | Zbl
, , and .[Ans] Answers.com. Binary tree.
[Bar85] Logically Smooth Density Estimation. PhD thesis, Stanford University, Dept. of Engineering, 1985. | MR
.[BBLM05] Moment inequalities for functions of independent random variables. Annals of Probability, 33(2) :514-560, 2005. | MR | Zbl
, , , and .[BGG06] Infinite alphabets : redundancy rates for enveloppe classes. Work in progress, to be submitted, 2006.
, , and .[Bou00] Théorie de l'information, notes de cours. http://www.proba.jussieu.fr/pageperso/boucheron/mpriit.php, 2000.
.[BP66] Statistical inference for probabilistic functions of finite state Markov chains. Ann. Math. Statist., 37 :1554-1563, 1966. | MR | Zbl
and .[BRY98] The minimum description length principle in coding and modeling. IEEE Trans. Inform. Theory, 44(6) :2743-2760, 1998. | MR | Zbl | DOI
, , and .[BW99] Variable length Markov chains. Ann. Statist., 27(2) :480-513, 1999. | MR | Zbl
and .[Cat01] Statistical Learning Theory and Stochastic Optimization. Lecture Notes in Mathematics, Vol. 1851. Springer-Verlag, Berlin, 2001. | MR | Zbl
.[CB90] Information-theoretic asymptotics of Bayes methods. IEEE Trans. Inf. Theory, 36 :453-471, 1990. | MR | Zbl | DOI
and .[CB94] Jeffrey's prior is asymptotically least favorable under entropy risk. J. Stat. Planning and Inference, 41:37-60, 1994. | MR | Zbl | DOI
and .[CBL06] Prediction, Learning, and Games. Cambridge University Press, 2006. ISBN 0521841089. | MR | Zbl | DOI
and .[CGG05] A MDL approach to HMM with Poisson and Gaussian emissions. Application to order identification. JSPI, submitted, 2005. | Zbl
, , and .[Cha03] Testing the order of a model. Ann. Statist., Accepted, 2003. | MR | Zbl
.[CK81] Information theory. Akadémiai Kiadó (Publishing House of the Hungarian Academy of Sciences), Budapest, 1981. Coding theorems for discrete memoryless systems. | MR | Zbl
and .[CMR05] Inference in hidden Markov models. Springer Series in Statistics. Springer, New York, 2005. With Randal Douc's contributions to Chapter 9 and Christian P. Robert's to Chapters 6, 7 and 13, With Chapter 14 by Gersende Fort, Philippe Soulier and Moulines, and Chapter 15 by Stéphane Boucheron and Elisabeth Gassiat. | MR | Zbl
, , and .[Cox93] An analysis of Bayesian inference for nonparametric regression. Annals of Statistics, 21 :903-923, 1993. | MR | Zbl
.[CR05] Nonasymptotic bounds for Bayesian order identification with application to mixtures. Submitted., 2005. | MR | Zbl
and .[CS96] Redundancy rates for renewal and other processes. IEEE Transactions on Information Theory, 42, Nov. 1996. | Zbl
and .[CS00] The consistency of the BIC Markov order estimator. Ann. Statist., 28(6) : 1601-1619, 2000. | MR | Zbl
and .[Csi02] Large-scale typicality of Markov sample paths and consistency of MDL order estimators. IEEE Trans. Inform. Theory, 48(6) :1616-1628, 2002. Special issue on Shannon theory : perspective, trends, and applications. | MR | Zbl | DOI
.[CT91] Elements of information theory. Wiley Series in Telecommunications. John Wiley & Sons Inc., New York, 1991. A Wiley-Interscience Publication. | MR | Zbl
and .[CT06] Context tree estimation for not necessarily finite memory processes, via BIC and MDL. IEEE-IT, 52(3), march 2006. | MR | Zbl
and .[Dav73] Universal noiseless coding. IEEE Trans. Information Theory, IT-19 :783-795, 1973. | MR | Zbl | DOI
.[DCG97] The estimation of the order of a mixture model. Bernoulli, pages 279-299, 1997. | MR | Zbl
and .[DEKM98] Biological sequence analysis. Cambridge University Press, 1998. | Zbl
, , , and .[Del01] Distribution du maximum d'un champ aléatoire et applications statistiques. PhD thesis, Université Paul Sabatier, Toulouse, 2001.
.[DF93] Nonparametric binary regression : a Bayesian approach. Ann. Statist., 21(4) :2108-2137, 1993. | MR | Zbl | DOI
and .[DG90] No empirical probability measure can converge in the total variation sense for all distributions. Ann. Statist., 18(3) : 1496-1499, 1990. | MR | Zbl
and .[DGL96] A probabilistic theory of pattern recognition, volume 31 of Applications of Mathematics (New York). Springer-Verlag, New York, 1996. | MR | Zbl
, , and .[DLG80] A source matching approach to finding minimax codes. IEEE Trans. Inform. Theory, 26(2) :166-174, 1980. | MR | Zbl | DOI
and .[DMPW81] Efficient universal noiseless source codes. IEEE Trans. Inf. Theory, 27 :269-279, 1981. | MR | Zbl | DOI
, , , and .[DN90] A tribute to Paul Erdös, chapter Partitions sans petits sommants, pages 121-152. Cambridge University Press, 1990. | MR | Zbl
and .[DR98] Balls and bins : A study in negative dependence. Random Struct. & Algorithms, 13(2) :99-124, 1998. | MR | Zbl | DOI
and .[DZ98] Large deviation techniques and applications. Springer, 1998. | MR | Zbl | DOI
and .[Eli75] Universal codeword sets and representations of the integers. IEEE Trans. Information Theory, IT-21 :194-203, 1975. | MR | Zbl | DOI
.[EM02] Hidden markov processes. IEEE Trans. Inform. Theory, 48 :1518-1569, 2002. | MR | Zbl | DOI
and .[EVKV02] Universal lossless source coding with the burrows Wheeler transform. IEEE Trans. Inform. Theory, 48(5) :1061-1081, 2002. | MR | Zbl | DOI
, , , and .[Fan61] Transmission of information : A statistical theory of communications. The M.I.T. Press, Cambridge, Mass., 1961. | MR | Zbl
.[Fin91] Consistent estimation of the order for Markov and hidden Markov chains. PhD thesis, University of Maryland, 1991. | MR
.[FMG92] Universal prediction of individual sequences. IEEE Trans. Inform. Theory, 38(4) :1258-1270, 1992. | MR | Zbl | DOI
, , and .[Fre63] On the asymptotic behavior of Bayes' estimates in the discrete case. Ann. Math. Statist., 34 :1386-1403, 1963. | MR
.[Fre99] On the Bernsteinvon Mises theorem with infinite dimensional parameters. Annals of Statistics, 27 :1119-1140, 1999. | MR | Zbl | DOI
.[Gal68] Information theory and reliable communication. John Wiley & sons, 1968. | Zbl
.[Gal76] Source coding with side information and universal coding, Sept. 1976.
.[Gar03] Rapport de DEA : Parsing et codage universel. Master's thesis, Université Paris Sud Orsay, 2003.
.[Gar05a] Redundancy of the context tree weighting algorithm on renewal and other processes. IEEE-IT, accepted, 2005.
.[Gar05b] Consistency of the unlimited BIC context tree estimator. IEEE Trans. Inform. Theory, accepted, 2005. | MR | Zbl
.[Gar06] A new lower-bound for the pattern maximin redundancy. IEEE-IT, submitted, 2006.
.[Gas02] Likelihood ratio inequalities with applications to various mixtures. Ann. Inst. H. Poincaré Probab. Statist., 38 :897-906, 2002. | MR | Zbl | Numdam | DOI
.[GB03] Optimal error exponents in hidden Markov models order estimation. IEEE Trans. Inform. Theory, 49 :964-980, 2003. | MR | Zbl | DOI
and .[GK97] From Ukkonen to McCreight and Weiner : A unifying view of linear-time suffix tree construction. Algorithmica, 19(3) :331-353, 1997. | MR | Zbl | DOI
and .[GPvdM94] There is no universal source code for an infinite source alphabet. IEEE Trans. Inform. Theory, 40(1) :267-271, 1994. | MR | Zbl | DOI
, , and .[GW04] On the entropy rate of pattern processes. Technical report hpl-2004-159, HP Laboratories Palo Alto, San Antonio, Texas, USA, sept 2004. | MR | Zbl
and .[Hau97] A general minimax result for relative entropy. IEEE Trans. Inform. Theory, 43(4) : 1276-1280, 1997. | MR | Zbl | DOI
.[Hen85] On estimating of the number of constituents of a finite mixture of continuous distributions. Ann. Inst. Statist. Math., 37 :235-240, 1985. | MR | Zbl
.[HG94] A class of stochastic models for relating synoptic atmospheric patterns to regional hydrologic phenomena. Water Ressources Research, 30 :1535-1546, 1994. | DOI
and .[HTF01] The elements of statistical learning. Springer-Verlag, New-York, 2001. | MR | Zbl
, , and .[HU79] Introduction to automata theory, languages, and computation. Addison-Wesley Publishing Co., Reading, Mass., 1979. Addison-Wesley Series in Computer Science. | MR | Zbl
and .[HY01] Model selection and the principle of minimum description length. JASA, 96 :746-774, 2001. | MR | Zbl | DOI
and .[IJS01] Bayesian model selection in finite mixtures by marginal density decompositions. J. Amer. Statist. Assoc., 96 :1316-1332, 2001. | MR | Zbl | DOI
, , and .[JB92] Ockam's razor and bayesian analysis. American Scientist, 80 :64-72, 1992.
and .[JOS05] A lower bound on compression of unknown alphabets. Theoret. Comput. Sci., 332(1-3) :293-311, 2005. | MR | Zbl | DOI
, , and .[JPM01] Consistent estimation of mixture complexity. Ann. Statist., 29 :1281-1296, 2001. | MR | Zbl
, , and .[JST01] Average profile of the Lempel-Ziv parsing scheme for a Markovian source. Algorithmica, 31(3) :318-360, 2001. Mathematical analysis of algorithms. | MR | Zbl | DOI
, , and .[Ker00] Consistent estimation of the order of mixture models. Sankhyā Ser. A, 62(1) :49-66, 2000. | MR | Zbl
.[Kie78] A unified approach to weak universal source coding. IEEE Trans. Inform. Theory, 24(6) :674-682, 1978. | MR | Zbl | DOI
.[Kie93] Strongly consistent code-based identification and order estimation for constrained finite-state model classes. IEEE Trans. Inform. Theory, 39 :893-902, 1993. | MR | Zbl | DOI
.[Kon98] Asymptotic recurrence and waiting times for stationary processes. J. Theoret. Probab., 11(3) :795-811, 1998. | MR | Zbl | DOI
.[Kos01] Hidden Markov Models For Bioinformatics. Kluwer Academic Publishers Group., 2001. | MR | Zbl
.[Kra49] A device for quantizing, grouping, and coding amplitude modulated pulses. Master's thesis, Dept, of Electrical Engineering, M.I.T., Cambridge, MA, 1949.
.[KSY04] Problems on sequences : information theory and computer science interface. IEEE Trans. Inform. Theory, 50(7) :1385-1392, 2004. | Zbl | MR | DOI
, , and .[KT81] The performance of universal encoding. IEEE Trans. Inform. Theory, 27(2) :199-207, Mar 1981. | MR | Zbl | DOI
and .[KV94] Joint parameter estimation and symbol detection for linear or nonlinear unknown channels. IEEE Trans. Commun., 42 :2406-2413, 1994. | Zbl | DOI
and .[KY00] Grammar-based codes : a new class of universal lossless source codes. IEEE Trans. Inform. Theory, 46(3) :737-754, 2000. | MR | Zbl | DOI
and .[Ler92a] Consistent estimation of a mixing distribution. Ann. Statist., 20(3) :1350-1360, 1992. | MR | Zbl
.[Ler92b] Maximum-likelihood estimation for hidden markov models. Stochastic Processes Their Applic., 40 :127-143, 1992. | MR | Zbl | DOI
.[LN94] Order estimation and sequential universal data compression of a hidden markov source by the method of mixtures. IEEE Trans. Inf. Theory, 40(4) :1167-1180, 1994. | Zbl | DOI
and .[LRS83] An introduction to the application of the theory of probabilistic functions of a Markov process to automatic speech recognition. The Bell System Technical Journal, 62 :1035-1074, 1983. | MR | Zbl | DOI
, , and .[LS97] On the average redundancy rate of the Lempel-Ziv code. IEEE Trans. Inform. Theory, 43(1) :2-8, 1997. | MR | Zbl | DOI
and .[MAS06] Concentration inequalities and model selection. Ecole d'été de Probabilités de Saint-Flour 2003. Lecture Notes in Mathematics, Springer, 2006. | MR | Zbl
.[MF95] A strong version of the redundancy-capacity theorem of universal coding. IEEE Trans. Inform. Theory, 41(3) :714-722, May 1995. | Zbl | DOI
and .[ML03] A default Bayesian test for the number of components in a mixture. J. Statist. Plann. Inference, 111(1-2) :129-142, 2003. | MR | Zbl | DOI
and .[MP00] Finite mixture models. Wiley-Interscience, New York, 2000. | MR | Zbl
and .[MR96] Testing for mixtures : a Bayesian entropy approach. In J. O. Berger, J. M. Bernardo, and A. P. Dawid, editors, Bayesian Statistics 5, pages 255-276. Oxford Science Publications, 1996. | MR
and .[OS04] Speaking of infinity. IEEE Trans. Inform. Theory, 50(10) :2215-2230, 2004. | MR | Zbl | DOI
and .[OSVZ04] Limit results on pattern entropy of stationary processes. In Proceedings of the 2004 IEEE Information Theory workshop, San Antonio, Texas, USA, oct 2004.
, , , and .[OSZ04] Universal compression of memoryless sources over unknown alphabets. IEEE Trans. Inform. Theory, 50(7) : 1469-1481, 2004. | MR | Zbl | DOI
, , and .[PS98] Problems and theorems in analysis. II. Classics in Mathematics. Springer-Verlag, Berlin, 1998. Theory of functions, zeros, polynomials, determinants, number theory, geometry, Translated from the German by C. E. Billigheimer, Reprint of the 1976 English translation. | MR | Zbl
and .[Ris76] Generalized Kraft inequality and arithmetic coding. IBM J. Res. Dev., 20(3), 1976. | MR | Zbl
.[Ris78] Modelling by shortest data description. Automatica, 14 :465-471, 1978. | Zbl | DOI
.[Ris83] A universal data compression system. IEEE Transactions on Information Theory, 29, September 1983. | MR | Zbl
.[Ris84] Universal coding, information, prediction, and estimation. IEEE Trans. Inform. Theory, 30(4) :629-636, 1984. | MR | Zbl | DOI
.[Ris86] Stochastic complexity and modeling. The Annals of Statistics, 14(3) :1080-1100, 1986. | MR | Zbl
.[Ris99] Fast universal coding with context models. IEEE Trans. Inform. Theory, 45(4) : 1065-1071, 1999. | MR | Zbl | DOI
.[RL81] Universal modeling and coding. IEEE Trans. Inform. Theory, 27(1) :12-23, Jan 1981. | MR | Zbl | DOI
and .[Rya79] Coding of a source with unknown but ordered probabilities. Problems Inform. Transmission, 15(2) :71-77, 1979. | Zbl | MR
.[Rya81] Comments on : "A source matching approach to finding minimax codes" [IEEE Trans. Inform. Theory 26 (1980), no. 2, 166-174; MR 81c :94021] by L. D. Davisson and A. Leon-Garcia. | MR | Zbl | DOI
.[Rya81] Comments on : "A source matching approach to finding minimax codes" IEEE Trans. Inform. Theory, 27(6) :780-781, 1981. | MR | Zbl | DOI
.[Rya84] Twice-universal coding. Problemy Peredachi Informatsii, 20(3) :24-28, 1984. | MR | Zbl
.[Sch78] Estimating the dimension of a model. Ann. Statist., 6(2) : 461-464, 1978. | MR | Zbl
.[Sha48] A mathematical theory of communication. Bell System Tech. J., 27 : 379-423, 623-656, 1948. | MR | Zbl | DOI
.[Sha03a] On the entropy of patterns of i.i.d. sequences. In Proceedings of 41st Annual Allerton Conference on Communication, Control and Computing, Monticello, IL, USA, oct 2003.
.[Sha03b] Universal lossless compression with unknown alphabets - the average case. IEEE Trans. Inform. Theory, submitted in June 2003. | MR | Zbl
.[Sha04] A new redudancy bound for universal lossless compression of unknown alphabets. In Proceedings of the 38th Annual Conference on Information Sciences and Systems - CISS, pages 1175-1179, Princeton, New-Jersey, USA, march 2004.
.[Sha06] On the MDL principle for i.i.d. sources with large alphabets. IEEE Trans. Inform. Theory, 52 :1939 - 1955, may 2006. | MR | Zbl | DOI
.[Shi93] Universal redundancy rates do not exist. IEEE Trans. Inform. Theory, 39(2) :520-524, 1993. | MR | Zbl | DOI
.[Sht87] Universal sequential coding of individual messages. Problemy Peredachi Informatsii, 23(3) :3-17, 1987. | MR | Zbl
.[Sio58] On general minimax theorems. Pacific J. Math., 8 :171-176, 1958. | MR | Zbl | DOI
.[SW95] Universal redundancy rates for the class of B-processes do not exist. IEEE Trans. Inform. Theory, 41(2) :508-512, 1995. | MR | Zbl | DOI
and .[Sze51] An asymptotic formula in the theory of partitions. Quart. J. Math. Oxford, 2 :85-108, 1951. | MR | Zbl | DOI
.[Szp01] Average case analysis of algorithms on sequences. Wiley-Interscience Series in Discrete Mathematics and Optimization. Wiley-Interscience, New York, 2001. With a foreword by Philippe Flajolet. | MR | Zbl | DOI
.[TSM85] Statistical analysis of finite mixture distributions. John Wiley & Sons Ltd., Chichester., 1985. | MR | Zbl
, , and .[Tsy04] Introduction à l'estimation non-paramétrique, volume 41 of Mathématiques & Applications (Berlin) [Mathematics & Applications]. Springer-Verlag, Berlin, 2004. | MR | Zbl
.[Var82] Nonparametric estimation in renewal processes. Ann. Statist., 10(3) :772-785, 1982. | MR | Zbl
.[vdV98] Asymptotic statistics. Cambridge University Press., 1998. | MR | Zbl
.[Wel84] A technique for high performance data compression. IEEE Computer, 17(6) :8-19, 1984. | DOI
.[Wil94] The Context-tree Weighting Method : Extensions. In IEEE Int. Symp. Information Theory, Trondheim, Norway, 1994.
.[WMF94] Optimal sequential probability assignment for individual sequences. IEEE Transactions on Information Theory, 40, Mar 1994. | MR | Zbl
, , and .[WT95] The Context-tree Weighting Method : basic properties. IEEE Transactions on Information Theory, 41, May 1995. | Zbl
, and .[XB97] Minimax redundancy for the class of memoryless sources. IEEE Trans. Inform. Theory, 43(2) :646-657, Mar 1997. | Zbl | DOI
and .[XB00] Asymptotic minimax regret for data compression, gambling and prediction. IEEE Trans. Inform. Theory, 46 :431-445, 2000. | MR | Zbl | DOI
and .[ZL77] A universal algorithm for sequential data compression. IEEE Trans. Information Theory, IT-23(3) :337-343, 1977. | MR | Zbl | DOI
and .[ZL78] Compression of individual sequences via variable-rate coding. IEEE Trans. Inform. Theory, 24(5) :530-536, 1978. | MR | Zbl | DOI
and .[ÄSMS97] Multialphabet coding with separate alphabet description. In IEEE Computer Society Press, editor, Proceedings of Compression and Complexity of SEQUENCES, pages 56-65, 1997.
, , and .