We recently published a paper in *Nature* in which we leveraged observations of the Earth’s radiative energy budget to statistically constrain 21^{st}-century climate model projections of global warming. We found that observations of the Earth’s energy budget allow us to infer generally greater central estimates of future global warming and smaller spreads about those central estimates than the raw model simulations indicate. More background on the paper can be obtained from our blog post on the research.

Last week, Nic Lewis published a critique of our work on several blogs titled A closer look shows global warming will not be greater than we thought. We welcome scientifically-grounded critiques of our work since this is the fundamental way in which science advances. In this spirit, we would like to thank Nic Lewis for his appraisal. However, we find Lewis’ central criticisms to be lacking merit. As we elaborate on below, his arguments do not undermine the findings of the study.

**Brief background**

Under the ‘emergent constraint’ paradigm, statistical relationships between model-simulated features of the current climate system (predictor variables), along with observations of those features, are used to constrain a predictand. In our work, the predictand is the magnitude of future global warming simulated by climate models.

We chose predictor variables that were as fundamental and comprehensive as possible while still offering the potential for a straight-forward physical connection to the magnitude of future warming. In particular, we chose the full global spatial distribution of fundamental components of Earth’s top-of-atmosphere energy budget—its outgoing (that is, reflected) shortwave radiation (OSR), outgoing longwave radiation (OLR) and net downward energy imbalance (N). We investigated three currently observable attributes of these variables—mean climatology, the magnitude of the seasonal cycle, and the magnitude of monthly variability. We chose these attributes because previous studies have indicated that behavior of the Earth’s radiative energy budget on each of these timescales can be used to infer information on fast feedbacks in the climate system. The combination of these three attributes and the three variables (OSR, OLR and N) result in a total of nine global “predictor fields”. See FAQ #3 of our previous blog post for more information on our choice of predictor variables.

We used Partial Least Squares Regression (PLSR) to relate our predictor fields to predictands of future global warming. In PLSR we can use each of the nine predictor fields individually, or we can use all nine predictor fields simultaneously (collectively). We quantified our main results with “Prediction Ratio” and “Spread Ratio” metrics. The Prediction Ratio is the ratio of our observationally-informed central estimate of warming to the previous raw model average and the Spread Ratio is the ratio of the magnitude of our constrained spread to the magnitude of the raw model spread. Prediction Ratios greater than 1 suggest greater future warming and Spread Ratios below 1 suggest a reduction in spread about the central estimate.

**Lewis’ criticism**

Lewis’ post expresses general skepticism of climate models and the ‘emergent constraint’ paradigm. There is much to say about both of these topics but we won’t go into them here. Instead, we will focus on Lewis’ criticism that applies specifically to our study.

We showed results associated with each of our nine predictor fields individually but we chose to emphasize the results associated with the influence of all of the predictor fields simultaneously. Lewis suggests that rather than focusing on the simultaneous predictor field, we should have focused on the results associated with the single predictor field that showed the most skill: The magnitude of the seasonal cycle in OLR. Lewis goes further to suggest that it would be useful to adjust our spatial domain in an attempt to search for an even stronger statistical relationship. Thus, Lewis is arguing that we actually *undersold* the strength of the constraints that we reported, not that we *oversold* their strength.

This is an unusual criticism for this type of analysis. Typically, criticisms in this vein would run in the opposite direction. Specifically, studies are often criticized for highlighting the single statistical relationship that appears to be the strongest while ignoring or downplaying weaker relationships that could have been discussed. Studies are correctly criticized for this tactic because the more relationships that are screened, the more likely it is that a researcher will be able to find a strong statistical association by chance, even if there is no true underlying relationship. Thus, we do not agree that it would have been more appropriate for us to highlight the results associated with the predictor field with the strongest statistical relationship (smallest Spread Ratio), rather than the results associated with the simultaneous predictor field. However, even if we were to follow this suggestion, it would not change our general conclusions regarding the magnitude of future warming.

We can use our full results, summarized in the table below (all utilizing 7 PLSR components), to look at how different choices, regarding the selection of predictor fields, would affect our conclusions.

Lewis’ post makes much of the fact that highlighting the results associated with the ‘magnitude of the seasonal cycle in OLR’, rather than the simultaneous predictor field, would reduce our central estimate of future warming in RCP8.5 from +14% to +6%. This is true but it is only one, very specific example. Asking more general questions gives a better sense of the big picture:

1) What is the mean Prediction Ratio across the end-of-century RCP predictands, if we use the OLR seasonal cycle predictor field exclusively? It is 1.15, implying a **15% increase** in the central estimate of warming.

2) What is the mean Prediction Ratio across the end-of-century RCP predictands, if we always use the individual predictor field that had the lowest Spread Ratio for that particular RCP (boxed values)? It is 1.13, implying a **13% increase** in the central estimate of warming.

3) What is the mean Prediction Ratio across the end-of-century RCP predictands, if we just average together the results from all the individual predictor fields? It is 1.16, implying a **16% increase** in the central estimate of warming.

4) What is the mean Prediction Ratio across the end-of-century RCP predictands, if we always use the simultaneous predictor field? It is 1.15, implying a **15% increase** in the central estimate of warming.

One point that is worth making here is that we do not use cross-validation in the multi-model average case (the denominator of the Spread Ratio). Each model’s own value *is included* in the multi-model average which gives the multi-model average an inherent advantage over the cross-validated PLSR estimate. We made this choice to be extra conservative but it means that PLSR is able to provide meaningful Prediction Ratios even when the Spread Ratio is near or slightly above 1. We have shown that when we supply the PLSR procedure with random data, Spread Ratios tend to be in the range of 1.1 to 1.3 (see FAQ #7 of our previous blog post, and Extended Data Fig. 4c of the paper). Nevertheless, it may be useful to ask the following question:

5) What is the mean Prediction Ratio across the end-of-century RCP predictands, if we average together the results from only those individual predictor fields with spread ratios below 1? It is 1.15, implying a **15% increase** in the central estimate of warming.

So, all five of these general methods produce about a **15% increase** in the central estimate of future warming.

Lewis also suggests that our results may be sensitive to choices of standardization technique. We standardized the predictors at the level of the predictor field because we wanted to retain information on across-model differences in the spatial structure of the magnitude of predictor variables. However, we can rerun the results when everything is standardized at the grid-level and ask the same questions as above.

1b) What is the mean Prediction Ratio across the end-of-century RCPs if we use the OLR seasonal cycle predictor field exclusively? It is 1.15, implying a **15% increase** in the central estimate of warming.

2b) What is the mean Prediction Ratio across the end-of-century RCPs if we always use the single predictor field that had the lowest Spread Ratio (boxed values)? It is 1.12, implying a **12% increase** in the central estimate of warming.

3b) What is the mean Prediction Ratio across the end-of-century RCPs if we just average together the results from all the predictor fields? It is 1.14, implying a **14% increase** in the central estimate of warming.

4b) What is the mean Prediction Ratio across the end-of-century RCPs if we always use the simultaneous predictor field? It is 1.14, implying a **14% increase** in the central estimate of warming.

5b) What is the mean Prediction Ratio across the end-of-century RCP predictands if we average together the results from only those individual predictor fields with Spread Ratios below 1? It is 1.14, implying a **14% increase** in the central estimate of warming.

**Conclusion**

There are several reasonable ways to summarize our results and they all imply greater future global warming in line with the values we highlighted in the paper. The only way to argue otherwise is to search out specific examples that run counter to the general results.

**Appendix: Example using synthetic data**

Despite the fact that our results are robust to various methodological choices, it is useful to expand upon why we used the simultaneous predictor instead of the particular predictor that happened to produce the lowest Spread Ratio on any given predictand. The general idea can be illustrated with an example using synthetic data in which the precise nature of the predictor-predictand relationships are defined ahead of time. For this purpose, I have created synthetic data with the same dimensions as the data discussed in our study and in Lewis’ blog post:

1) A synthetic predictand vector of 36 “future warming” values corresponding to imaginary output from 36 climate models. In this case, the “future warming” values are just 36 random numbers pulled from a Gaussian distribution.

2) A synthetic set of nine predictor fields (37 latitudes by 72 longitudes) associated with each of the 36 models. Each model’s nine synthetic predictor fields start with that model’s *predictand* value entered at every grid location. Thus, at this preliminary stage, every location in every predictor field is a perfect predictor of future warming. That is, the across-model correlation between the predictor and the “future warming” predictand is 1 and the regression slope is also 1.

The next step in creating the synthetic predictor fields is to add noise in order to obscure the predictor-predictand relationship somewhat. The first level of noise that is added is a spatially correlated field of weighing factors for each of the nine predictor maps. These weighing factor maps randomly enhance or damp the local magnitude of the map’s values (weighing factors can be positive or negative). After these weighing factors have been applied, every location for every predictor field still has a perfect across-model correlation (or perfect negative correlation) between the predictor and predictand but the regression slopes vary across space according to the magnitude of the weighing factors. The second level of noise that is added are spatially correlated fields of random numbers that are specific for each of the 9X36=324 predictor maps. At this point, everything is standardized to unit variance.

The synthetic data’s predictor-predictand relationship can be summarized in the plot below which shows the local across-model correlation coefficient (between predictor and predictand) for each of the nine predictor fields. These plots are similar to the type of thing that you would see using the real model data that we used in our study. Specifically, in both cases, there are swaths of relatively high correlations and anti-correlations with plenty of low-correlation area in between. All these predictor fields were produced the same way and the only differences arise from the two layers of random noise that were added. Thus, we know that any apparent differences between the predictor fields arose by random chance.

Next, we can feed this synthetic data into the same PLSR procedure that we used in our study to see what it produces. The Spread Ratios are shown in the bar graphs below. Spread Ratios are shown for each of the nine predictor fields individually as well for the case where all nine predictor fields are used simultaneously. The top plot shows results without the use of cross-validation while the bottom plot shows results with the use of cross-validation.

In the case without cross-validation, there is no guard against over-fitting. Thus, PLSR is able to utilize the many degrees of freedom in the predictor fields to create coefficients that fit predictors to the predictand exceptionally well. This is why the Spread Ratios are so small in the top bar plot. The mean Spread Ratio for the nine predictor fields in the top bar plot is 0.042, implying that the PLSR procedure was able to reduce the spread of the predictand by about 96%. Notably, using all the predictor fields simultaneously results in a three-orders-of-magnitude smaller Spread Ratio than using any of the predictor fields individually. This indicates that when there is no guard against over-fitting, *much* stronger relationships can be achieved by providing the PLSR procedure with more information.

However, PLSR is more than capable of over-fitting predictors to predictands and thus these small Spread Ratios are not to be taken seriously. In our work, we guard against over-fitting by using cross-validation (see FAQ #1 of our blog post). The Spread Ratios for the synthetic data using cross-validation are shown in the lower bar graph in the figure above. It is apparent that cross-validation makes a big difference. With cross-validation, the mean Spread Ratio across the nine individual predictor fields is 0.8, meaning that the average predictor field could help reduce the spread in the predictand by about 20%. Notably, a lower Spread Ratio of 0.54, is achieved when all nine predictor maps are used collectively (a 46% reduction in spread). Since there is much redundancy across the nine predictor fields, the simultaneous predictor field doesn’t increase skill very drastically but it is still better than the average of the individual predictor fields (this is a very consistent result when the entire exercise is re-run many times).

Importantly, we can even see that one particular predictor field (predictor field 2) achieved a lower Spread Ratio than the simultaneous predictor field. This brings us to the central question: Is predictor field 2 particularly special or inherently more useful as a predictor than the simultaneous predictor field? We created these nine synthetic predictor fields specifically so that they all contained roughly the same amount of information and any differences that arose, came about simply by random chance. There is an element of luck at play because the number of models (37) is small. Thus, cross-validation can produce appreciable Spread Ratio variability from predictor to predictor simply by chance. Combining the predictors reduces the Spread Ratio, but only marginally due to large redundancies in the predictors.

We apply this same logic to the results from our paper. As we stated above, our results showed that the simultaneous predictor field for the RCP 8.5 scenario shows a Spread Ratio of 0.67. Similar to the synthetic data case, eight of the nine individual predictor fields yielded Spread Ratios above this value but a single predictor field (the OLR seasonal cycle) yielded a smaller Spread Ratio. Lewis’ post argues that we should focus entirely on the OLR seasonal cycle because of this. However, just as in the synthetic data case, our interpretation is that the OLR seasonal cycle predictor may have just gotten lucky and we should not take its superior skill too seriously.

I’ve posted the notice on the WUWT thread about your response. Thanks for responding without the usual hurling of invective we get from people like Dr. Mann. I’ve also alerted Nic Lewis. There will likely be a follow-up, which I’ll notice you when/if it appears.

Pingback: Brown and Caldeira: A closer look shows global warming will not be greater than we thought | Watts Up With That?

What rate of warming do you predict for the next two or three decades? That is all that matters. If your prediction turns out to be right, you will have established the credibility of the models. If not, the models will not be credible. Models, all the way down, is not a credible argument for validation

Nice to see your post.

Not happy.

” our interpretation is that the OLR seasonal cycle predictor may have just gotten lucky and we should not take its superior skill too seriously.”

You find a gold nugget and dismiss it.

Science often is serendipity and running with it.

Others with this result would say see what we have found, a great indicator, and ask why.

Oh well.

The approach taken bears a bit of the data mining problems of meta searches.

You start out with a goal in mind and find a correlation.

Yes the correlation is there but the causality is not.

Sorry, that is a bit rough , maybe you got lucky in which case I would apologise.

The sad fact is there are periods of cold in there which do not fit with any of the forecasts , but you would have to want to look for them to find them.

Dr. Brown, thank you for responding to Nic Lewis’s criticism, and I think regardless of the outcome your education intention are not questioned. As you know the validity of the CMIP5 as a scientific tool is a matter of active debate. Among the top suspicions are that clouds are wrongly a positive rather than negative feedback to warming and the overestimation of aerosol forcings in the RCPs to falsely cool the past and present in order to reconcile a higher ECS to observations. I understand that your paper does not deal with these nor does your response. However, it may be impossible to make valid conclusions, (even statistically,) when built on false assumptions.

I want to point out your characterization of Nic’s criticism is inaccurate. He did

not suggest that using the sole predictor OLR Seasonal Cycle would provide a better or more accurate result. His claim was that having individual predictors outperform a group is a signal that the application of PLS in this instance is leading to inaccurate weighting of the predictors, which is an indication that the analysis not achieving its intention — detecting a signal that could uplift model estimates of ECS.If PLS is producing inaccurate weighting of the predictors I would propose the likely explanation is that the models that are most highly tuned to the 2000-2015 observation interval are producing a higher ECS. This makes certain predictors artificially more accurate but tells nothing of true model skill contrasting to the null hypothesis — gradual warming (ECS 1-1.5).

A typical indication of skill in a chart is where a bump in model prediction coincides with a bump in observation. In you last post you presented a chart plotting the models against historic temperature where the models had bumps coinciding with volcanic eruptions much larger than observation. Perhaps with the next major volcanic eruption your PLS analysis can be usefully applied to invalidate and discard some models (if that would be allowed).

Finally, you claimed your team constrained ECS to 3.7. How?

Pingback: Reply to Patrick Brown’s response to my article commenting on his Nature paper « Climate Audit

Pingback: Reply to Patrick Brown’s response to my article commenting on his Nature paper – Gaia Gazette

Nic Lewis has provided as extensive reply to this post here.

Thank you for the dialogue, and we look forward to your well thought response.

Pingback: Reply to Patrick Brown’s response to comments on his Nature article | Climate Etc.

Patrick: For a model to correctly predict climate sensitivity, it needs to correctly predict dOLR/dTs and dOSR/dTs from both clear and cloudy skies. Tsushima and Manabe (2013) clearly show us how poorly and mutually inconsistently models reproduce dOLR/dTs and dOSR/dTs during the seasonal cycle, when dTs is a massive 3.5 K and signal-to-noise. Monthly dOLR/dTs is exceptionally linear during seasonal warming, but dOSR/dTs from both clear and cloudy skies is not.

Do your other predictors have much to do with dOLR/dTs and dOSR/dTs from both clear and cloudy skies? I’m not sure that they do. Climatology is static and doesn’t reflect a change in Ts. Regional changes in Ts (and therefore OLR) reflect a model’s ability to simulate meridional transport of heat (and that could be tuned). Monthly variability could reflected the ability to reproduce ENSO, the biggest source of unforced variability.

If overturning of the atmosphere remained constant with global warming, latent heat would carry 7%/K more heat from the surface into the atmosphere. This is 5.6 W/m2/K, a value that is grossly incompatible with an ECS of 3 K/doubling or 1.2 W/m2/K. To get the ECS right, climate models must get the slow down in overturning right and therefore the change in precipitation with warming (typically 2%/K) right. So I think precipitation could be a very valuable predictor. Marine boundary layer clouds (which don’t produce precipitation) are independently important to climate sensitivity.

Pingback: Reply to Patrick Brown’s response to my article commenting on his Nature paper | Watts Up With That?