Taxicab Correspondence Analysis of Ratings and Rankings
[Analyse des Correspondances du Taxi de Notes et de Rangs]
Journal de la société française de statistique, Tome 155 (2014) no. 4, pp. 1-23.

Soit Y un tableau de notes sur I × Q ; I est un ensemble d’individus, et la i ème ligne représente les notes attribuées par l’individu i sur Q variables ou attributs. Dans cet article nous étudions deux codages du tableau Y avant de le traiter par analyse des correspondances (AC) ou analyse des correspondances du taxi (ACT), ACT étant une variante robuste de AC basée sur la norme L 1  : Le tableau dédoublé Y D de dimension I × 2 Q , et le tableau Y n e g a de dimension I × ( Q + 1 ) où une colonne nommée n e g a est ajoutée à Y representant la note complémentaire globale. L’interprétation des diagrammes du tableau Y D par AC ou ACT est basée sur le principe du bras de levier. Nous utilisons la loi de contradiction pour interpréter les diagrammes du tableau Y n e g a par AC ou ACT. Une condition nécessaire et suffisante pour que l’analyse du tableau Y n e g a par ACT et l’analyse du tableau Y D par ACT soient equivalentes est que le 1er facteur est une fonction affine du total de notes. Et si cette condition est satisfaite, suivant Cox, nous utilisons le 1er facteur comme un résumé de la variable latente. Cette inférence peut être de deux sortes, faible ou forte. Dans le cas de données des rangs représentant des préférences individuelles, la méthode correspond à la règle de Borda ou une version modifiée. Deux exemples de natures différentes sont exposés.

Let Y be an I × Q ratings data set, where Q represents the number of items, and I represents the number of rated objects or the number of individuals expressing their opinions on the Q items. This paper considers two kinds of data codings before the application of correspondence analysis (CA) or taxicab correspondence analysis (TCA), where TCA is a L 1 variant of CA: the doubled data set Y D of size I × 2 Q , and the data set Y n e g a of size I × ( Q + 1 ) where a column named nega is added representing the cumulative complementary columns. The interpretation of maps in CA of Y D is based on the lever principle. We use the law of contradiction to interpret maps of CA and TCA of Y n e g a . We provide necessary and sufficient conditions for TCA of Y n e g a or Y D so that the first factor score is an affine function of the sum score of the ratings; and, if this is true for a dataset, then following Cox we suggest the use of the sum score of ratings either to reduce the Q ratings into a single index, or to summarize the underlying latent variable. This ordinal inference can be of two types: weak or strong. In the case of a rankings dataset, the proposed approach corresponds to Borda count rule or modified Borda count rule. Examples are provided.

Keywords: Sum score, nega, doubling, lever principle, law of contradiction, personal equation, response styles, rogue items, strategic voters, Borda count, Nishisato mapping, taxicab correspondence analysis, IRT
Mot clés : Total de notes, nega, dédoublement, principe du bras de levier, loi de contradiction, équation personnelle, point aberrant, règle de Borda, codage de Nishisato, analyse des correspondances du taxi, analyse d’items
@article{JSFS_2014__155_4_1_0,
     author = {Choulakian, Vartan},
     title = {Taxicab {Correspondence} {Analysis} of {Ratings} and {Rankings}},
     journal = {Journal de la soci\'et\'e fran\c{c}aise de statistique},
     pages = {1--23},
     publisher = {Soci\'et\'e fran\c{c}aise de statistique},
     volume = {155},
     number = {4},
     year = {2014},
     zbl = {1316.62080},
     language = {en},
     url = {http://archive.numdam.org/item/JSFS_2014__155_4_1_0/}
}
TY  - JOUR
AU  - Choulakian, Vartan
TI  - Taxicab Correspondence Analysis of Ratings and Rankings
JO  - Journal de la société française de statistique
PY  - 2014
SP  - 1
EP  - 23
VL  - 155
IS  - 4
PB  - Société française de statistique
UR  - http://archive.numdam.org/item/JSFS_2014__155_4_1_0/
LA  - en
ID  - JSFS_2014__155_4_1_0
ER  - 
%0 Journal Article
%A Choulakian, Vartan
%T Taxicab Correspondence Analysis of Ratings and Rankings
%J Journal de la société française de statistique
%D 2014
%P 1-23
%V 155
%N 4
%I Société française de statistique
%U http://archive.numdam.org/item/JSFS_2014__155_4_1_0/
%G en
%F JSFS_2014__155_4_1_0
Choulakian, Vartan. Taxicab Correspondence Analysis of Ratings and Rankings. Journal de la société française de statistique, Tome 155 (2014) no. 4, pp. 1-23. http://archive.numdam.org/item/JSFS_2014__155_4_1_0/

[1] Adames, G Duck liver and tradition: analysis of ratings of a competition, Les Cahiers de L’Analyse des Données, Volume XVIII (1993), pp. 389-398

[2] Alon, N.; Naor, Ae Approximating the cut-norm via Grothendieck’s inequality, SIAM Journal on Computing , Volume 35 (2006), pp. 787-803 | Zbl

[3] Benzécri, J.P.; Benzécri, F. Codage linéaire par morceaux et équation personnelle., Les Cahiers de L’Analyse des Données, Volume XIV (1989), pp. 203-210

[4] Benzécri, J.P. L’Analyse des Données, L’Analyse des Correspondances, 2, Paris: Dunod, 1973 | Zbl

[5] Benzécri, J.P. On the analysis of a table with one heavyweight column (in french)., Les Cahiers de L’Analyse des Données, Volume IV (1979), pp. 413-416

[6] Benzécri, J.P. Correspondance Analysis Handbook, N.Y:Marcel Dekker, 1992 | Zbl

[7] Burt, C. The Distribution and Relations of Educational Abilities., London:P.S. King and Son, 1917

[8] Choulakian, V.; Allard, J.; Simonetti, B. Multiple taxicab correspondence analysis of a survey related to health services., Journal of Data Science, Volume 11(2) (2013), pp. 205-229

[9] Cazes, P. Some comments on correspondence analysis, www.youtube.com/watch?v=cisfaltVBTI (2011)

[10] Cazes, P. Codage d’une variable continue en vue de l’analyse des correspondances, Revue de Statistique Appliquée, Volume 38(3) (1990), pp. 35-51

[11] Choulakian, V; de Tibeiro, J. Graph partitionong by correspondence analysis and taxicab correspondence analysis., Journal of Classification, Volume 30 (2013), pp. 397-427 | Zbl

[12] Choulakian, V. The optimality of the centroid method, Psychometrika, Volume 68 (2003), pp. 473-475 | Zbl

[13] Choulakian, V. Transposition invariant principal component analysis in L1 for long tailed data., Statistics and Probability Letters, Volume 71 (2005), pp. 23-31 | Zbl

[14] Choulakian, V. L1 norm projection pursuit principal component analysis., Computational Statistics and Data Analysis, Volume 50 (2006), pp. 1441-1451 | Zbl

[15] Choulakian, V. Taxicab correspondence analysis, Psychometrika, Volume 71 (2006), pp. 333-345 | Zbl

[16] Choulakian, V. Multiple taxicab correspondence analysis., Advances in data Analysis and CLassification, Volume 2 (2008), pp. 177-206 | Zbl

[17] Choulakian, V. Taxicab correspondence analysis of contingency tables with one heavyweight column., Psychometrika, Volume 73 (2008), pp. 309-319 | Zbl

[18] Choulakian, V. The simple sum score statistic in taxicab correspondence analysis, Advances in Latent Variables, Vita e Pensiero, Milan, Italy (2013), 6 pages

[19] Choulakian, V.; Kasparian, S.; Miyake, M.; Akama, H.; Makoshi, N.; Nakagawa, M. A statistical analysis of synoptic gospels., JADT’2006 (2006), pp. 281-288

[20] Cox, D.R. In praise of the simple sum score., www.stat.unpg.it/forcina/shlav/.../Cox2.pdf (2006)

[21] Cox, D.R. On an internal method for deriving a summary measure., Biometrika, Volume 95 (2008), pp. 1002-1005 | Zbl

[22] Choulakian, V.; Simonetti, B.; Gia, T.P. Some new aspects of taxicab correspondence analysis., Statistical Methods and Applications, Volume 23 (2014), pp. 401-406 | Zbl

[23] Cox, D.R.; Wermuth, N. On some models for binary variables parallel in compexity with the multivariate Gaussian distribution., Biometrika, Volume 89 (2002), pp. 462-469 | Zbl

[24] de Borda, J.C. Mémoire sur les élections au scrutin., Histoire de l’Académie Royale des Sciences, Volume 102 (1781), pp. 657-665

[25] Deniau, C.; Oppenheim, G.; Benzécri, J.P. An effect of the refining of a partition on the eigenvalues arising from a correspondence table (in french), Les Cahiers de l’Analyse des Données, Volume IV(3) (1979), pp. 289-297

[26] Esmieu, D.; Gopalan, T.K.; Maiti, G.D. On the use of ratings in marketing studies for the introduction of a new product (in french), Les Cahiers de L’Analyse des Données, Volume XVIII (1993), pp. 399-426

[27] Eves, H. Foundations and Fundamental Concepts of Mathematics, N.Y. : Dover, 1997

[28] Fichet, B. Metrics of Lp-type and distributional equivalence principle., Advances in Data Analaysis and Classification, Volume 3 (2009), pp. 305-314 | Zbl

[29] Gifi, A. Nonlinear Multivariate Analysis, N.Y:Wiley, 1990 | Zbl

[30] Greenacre, M.J. Theory and Applications of Correspondence Analysis., London:Academic Press., 1984 | Zbl

[31] Gabriel, K.R.; Zamir, S. Lower rank approximation of matrices by least squares with any choice of weights., Technometrics, Volume 21 (1979), pp. 489-498 | Zbl

[32] Harman, H.H. Modern Factor Analysis, Chicago:The University of Chicago Press., 1967 | Zbl

[33] Horst, P. Factor Analysis of Data Matrices, Holt Rinehart and Winston, 1965 | Zbl

[34] Johnson, C. A characterization of Borda’s rule via optimization., IMA preprint, Volume 41 (1983)

[35] Lavialle, O.; Qannari, E.M.; Vidal, C. Order aggregation under constraints: Ordering of products by sensory ratings (in french), Revue de Statistique Appliquée, Volume 38(4) (1990), pp. 61-73

[36] Le Roux, B.; Rouanet, H. Geometric Data Analysis. From Correspondence Analysis to Structured Data Analysis, Dodrecht:Kluewer-Springer, 2004 | Zbl

[37] Meijer, R.; Sijtsma, K.; Smid, N. Theoretical and Empirical Comparison of the Mokken and the Rasch Approach to IRT., Applied Psychological Measurement, Volume 14 (1990), pp. 283-298

[38] Murtagh, F. Correspondence Analysis and Data Coding with Java and R, London: Chapman & Hall/CRC, 2005 | Zbl

[39] Nishisato, S. Analysis of Categorical Data: Dual Scaling and Its Applications., Toronto: University of Toronto Press., 1980 | Zbl

[40] Nishisato, S. Forced classification: A simple application of quantification method., Psychometrika, Volume 49 (1984), pp. 25-36

[41] Nishisato, S. Elements of Dual Scaling: An Introduction to Practical Data Analysis., Hillsdale NJ: Lawrence Erlbaum, 1994

[42] Saari, D.G. The Borda dictionary, Social Choice and Welfare, Volume 7 (1990), pp. 279-317 | MR | Zbl

[43] Sijtsma, K.; Junker, B. Item response theory: Past performance, present developments, and future expactations., Behaviormetrika, Volume 33(1) (2006), pp. 75-102 | MR | Zbl

[44] Tatsuoka, C. Data-analytic methods for latent partially ordered classification models., Jornal of the Royal Statistical Society Series C (Applied Statistics), Volume 51 (2002), pp. 337-350 | MR | Zbl

[45] Torres, A.; Greenacre, M. Dual scaling and correspondence analysis of preferences, paired comparisons and ratings,, International Journal of Research in Marketing, Volume 19(4) (2002), pp. 401-405

[46] Thurstone, L.L. Multiple factor analysis., Psychological Review, Volume 31 (1931), pp. 406-427

[47] Thurstone, L.L. Multiple factor analysis., Chicago: The University of Chicago Press., 1947 | Zbl

[48] Wold, H. Estimation of principal components and related models by iterative least squares, Multivariate Analysis, N.Y:Academic Press (1966), pp. 391-420 | MR | Zbl