Geographic distributions are a basic component of a species’ ecology, and predicting distributions is a fundamental task of conservation and resource management. Reliable prediction depends on identification of appropriate scales of effect for environmental data, and scale-optimization techniques are thus desirable to identify optimal scales for predictor variables. Recent statistical developments have also advanced methods of model selection based explicitly on predictive ability, which differ from commonly-used methods that regulate model structures via anticipated predictive performance. Such methods are beginning to permeate into species distribution models (SDMs), yet there remains no consensus methodology for developing optimally-predictive multi-scale SDMs when covariate data are collected over a range of scales. Thus, we compared the performance of common approaches for scale optimization and model selection in terms of their ability to produce optimally predictive multi-scale Bayesian occupancy models for predicting a species distribution, using models of the breeding distribution for King Rails (Rallus elegans) as a case study. Our results demonstrate sizable gains in predictive performance for hierarchical occupancy models selected explicitly via their ability to predict out-of-sample data using the logarithmic scoring rule, as compared to models selected using information criteria (DIC and WAIC). Information criteria commonly selected individual covariates, as well as scales of effect for those covariates, with suboptimal predictive performance. Performance of models selected using the logarithmic scoring rule was robust across method of scale optimization, which was not true for models selected using DIC and WAIC. Thus, we empirically demonstrate benefits of study designs that enable covariate and scale selection based explicitly on predictive ability. Our results also imply that more careful consideration of what constitutes an optimal scale is warranted in many ecological studies, as the meaning of optimal is not independent of the technique used for scale selection.