In this paper, we study a generalization of a recently developed strategy for generating conformal predictor ensembles: out-of-bag calibration. The ensemble strategy is evaluated, both theoretically and empirically, against a commonly used alternative ensemble strategy, bootstrap conformal prediction, as well as common non-ensemble strategies. A thorough analysis is provided of out-of-bag calibration, with respect to theoretical validity, empirical validity (error rate), efficiency (prediction region size) and p-value stability (the degree of variance observed over multiple predictions for the same object). Empirical results show that out-of-bag calibration displays favorable characteristics with regard to these criteria, and we propose that out-of-bag calibration be adopted as a standard method for constructing conformal predictor ensembles. (C) 2019 Elsevier B.V. All rights reserved.