We compared modeled and observed streamflow trends from 1984–2016 using five statistical transfer models and one deterministic, distributed-parameter, process-based model, for 26 flow metrics at 502 basins in the United States that are minimally influenced by development. We also looked at a measure of overall model fit and average bias. A higher percentage of basins, for all models, had relatively low trend differences between modeled and observed mean/ median flows than for very high or low flows such as the annual 1-day high and 7-day low flows. Mean-flow metrics also had the largest percentage of basins with relatively good overall model fit and low bias. The five statistical transfer models performed better at more basins than the process-based model. The overall model fit for all models, for mean and/or high flows, was correlated with one or more measures of basin precipitation or aridity. Our study and previous studies generally observed good model performance for high flows up to 90th or 95th percentile flows. However, we found model performance was substantially worse for more extreme flows, including 99th percentile and annual 1-day high flows; this shows the importance of including more extreme high flows in analyses of model performance.