Extensions of the algorithm of classification trees for data analysis categorized using multivariate coefficients of dissimilarity and entropy

Cesar Augusto TACONELI[1]

Silvio Sandoval ZOCCHI[2]

Carlos Tadeu dos Santos DIAS2

§    ABSTRACT: The statistical analysis of large datasets requires the use of flexible methodologies, that can provide insight and understanding even in the presence of difficulties such as large numbers of variables having variable levels of association between themselves, and missing data. The construction of classification and regression trees allows for modeling of a categorical or numerical response variable as a function a set of covariates, while bypassing many of the cited difficulties. Multivariate trees extend classification and regression techniques to allow for joint analysis of two or more response variables. In recent studies, application of multivariate classification and regression techniques has been most common in situations involving numerical response variables. In this work we propose alternatives for constructing multivariate classification trees for multiple categorized response variables. Such alternatives are based on dissimilarity and entropy measures. A simulation study was used to examine the effect of variable correlations and entropies on the performance of the proposed methodology (results are better for high correlations and entropies). Analysis of data on alcohol consumption and smoking among inhabitants from Botucatu (SP) complements the analysis by showing that factors as the education level, daily occupation and possibility of sharing problems with friends have an influence on the alcohol consumption and smoking.    

§    KEYWORDS: Classification trees; Dissimilarity; Entropy; Alcohol and smoking; Multivariate simulation

 



[1] Departamento de Estatística, Universidade Federal do Paraná – UFPR, CEP 81531-990, Curitiba, PR, Brasil. E-mail: taconeli@ufpr.br

[2] Departamento de Ciências Exatas, Escola Superior de Agricultura Luiz de Queiroz – ESALQ, Universidade de São Paulo ­ USP, CEP 13418-900, Piracicaba, SP, Brasil. E-mail: sszocchi@esalq.usp.br / ctsdias@esalq.usp.br