In this paper, Genetic Programming is used to evolve ordered rule sets (also called decision lists) for a number of benchmark classification problems, with evaluation of both predictive performance and comprehensibility. The main purpose is to compare this approach to the standard decision list algorithm JRip and also to evaluate the use of different length penalties and fitness functions for evolving this type of model. The results, using 25 data sets from the UCI repository, show that genetic decision lists with accuracy-based fitness functions outperform JRip regarding accuracy. Indeed, the best setup was significantly better than JRip. JRip, however, held a slight advantage over these models when evaluating AUC. Furthermore, all genetic decision list setups produced models that were more compact than JRip models, and thus more readily comprehensible. The effect of using different fitness functions was very clear; in essence, models performed best on the evaluation criterion that was used in the fitness function, with a worsening of the performance for other criteria. Brier score fitness provided a middle ground, with acceptable performance on both accuracy and AUC. The main conclusion is that genetic programming solves the task of evolving decision lists very well, but that different length penalties and fitness functions have immediate effects on the results. Thus, these parameters can be used to control the trade-off between different aspects of predictive performance and comprehensibility.