The Importance of Diversity in Neural Network Ensembles: An Empirical Investigation
2007 (English)Conference paper, Published paper (Refereed)
Abstract [en]
When designing ensembles, it is almost an axiom that the base classifiers must be diverse in order for the ensemble to generalize well. Unfortunately, there is no clear definition of the key term diversity, leading to several diversity measures and many, more or less ad hoc, methods for diversity creation in ensembles. In addition, no specific diversity measure has shown to have a high correlation with test set accuracy. The purpose of this paper is to empirically evaluate ten different diversity measures, using neural network ensembles and 11 publicly available data sets. The main result is that all diversity measures evaluated, in this study too, show low or very low correlation with test set accuracy. Having said that, two measures; double fault and difficulty show slightly higher correlations compared to the other measures. The study furthermore shows that the correlation between accuracy measured on training or validation data and test set accuracy also is rather low. These results challenge ensemble design techniques where diversity is explicitly maximized or where ensemble accuracy on a hold-out set is used for optimization.
Place, publisher, year, edition, pages
IEEE Press , 2007.
Keywords [en]
diversity, ensembles, neural networks, data mining
National Category
Information Systems
Identifiers
URN: urn:nbn:se:hb:diva-5833DOI: 10.1109/IJCNN.2007.4371035Local ID: 2320/3154ISBN: 1-4244-1380-X (print)OAI: oai:DiVA.org:hb-5833DiVA, id: diva2:886514
Conference
The International Joint Conference on Neural Networks
2015-12-222015-12-222020-01-29Bibliographically approved