Abstract: Many previous studies have attempted to assess ecological niche modeling performance using receiver operating characteristic (ROC) approaches, even though diverse problems with this metric have been pointed out in the literature. We explored different evaluation metrics based on independent testing data using the Darwin's Fox (Lycalopex fulvipes) as a detailed case in point. Six ecological niche models (ENMs; generalized linear models, boosted regression trees, Maxent, GARP, multivariable kernel density estimation, and NicheA) were explored and tested using six evaluation metrics (partial ROC, Akaike information criterion, omission rate, cumulative binomial probability), including two novel metrics to quantify model extrapolation versus interpolation (E-space index I) and extent of extrapolation versus Jaccard similarity (E-space index II). Different ENMs showed diverse and mixed performance, depending on the evaluation metric used. Because ENMs performed differently according to the evaluation metric employed, model selection should be based on the data available, assumptions necessary, and the particular research question. The typical ROC AUC evaluation approach should be discontinued when only presence data are available, and evaluations in environmental dimensions should be adopted as part of the toolkit of ENM researchers. Our results suggest that selecting Maxent ENM based solely on previous reports of its performance is a questionable practice. Instead, model comparisons, including diverse algorithms and parameterizations, should be the sine qua non for every study using ecological niche modeling. ENM evaluations should be developed using metrics that assess desired model characteristics instead of single measurement of fit between model and data. The metrics proposed herein that assess model performance in environmental space (i.e., E-space indices I and II) may complement current methods for ENM evaluation.