## energiewende

Germany’s energy revolution and its effect upon the world, in The New York Times.

Details here.

## “Expectations for a new climate agreement” (Jacoby, Chen, MIT, 2014)

Professors Jacoby and Chen have issued a report as part of the MIT Joint Program on the Science and Policy of Global Change which handicaps the outcomes of the negotiations which, by international agreement, need to take place in 2015 to put the world on a course to contain global warming. While they are not pessimistic, emphasizing that progress will probably be made, they are also realistic regarding what will be agreed and, most importantly, how far from what’s necessary governments are likely to agree and then implement. For example, Kyoto targets were not met.

(Click image for larger picture.)
They also (to me) introduce MAGICC, a simplified climate model, not as accurate as CMIP5, but more transparent in its projections.

(Click image for larger picture.)
MAGICC is more complicated than the pen-and-paper-and-simply-Python models which Professor Pierrehumbert offers in conjunction with his textbook and course and are necessarily very transparent, but, in MAGICC’s developers’ words:

MAGICC consists of a suite of coupled gas-cycle, climate and ice-melt models integrated into a single software package. The software allows the user to determine changes in greenhouse-gas concentrations, global-mean surface air temperature, and sea level resulting from anthropogenic emissions.

From my perspective, what the problem is that countries and people are talking about “progress” and “lowering emissions” when they actually need to be talking about a program and a path towards eliminating them. No doubt that will take a long time. Because it will, any delays do not make any sense. I suspect government leaders are waiting for major events tied to climate to give them political cover to do something severe. But, as the graphs show and as is well known, unless they also want to buy into capture and removal of carbon dioxide from free atmosphere, those effects will keep on coming. Professor John Englander estimates we’ve bought into 65 feel of sea level rise already, even if that will take centuries to fully manifest.

See the related “A Zero Emissions Manifesto for the Climate Justice Movement“, by Tom Weis and Reverend Lennox Yearwood. I agree: Two degrees Celsius is the wrong target.

## “People are too insignificant to affect climate”

Setting aside outright fabrications (1) such as those promulgated by the Representative Lamar Smith (R-Texas), laughingly selected as the Chair of the House Committee on Science, a common claim in the Comment sections at The Hill and elsewhere is that claims of human interference with climate are evidence of human hubris, and promulgating policies of mitigation arrogance, because humans are just too insignificant to affect the climate, and, no doubt, in every true, religious person’s mind, Christian, Jew, or Muslim (2), the glorious plans of The Creator for human life on Earth.

Skeptical Science devotes a major article to this question.

My take is a tad different.

There was once a bird called a Passenger Pigeon.

Generally accepted estimates place their numbers concurrent with European contact and colonization of North America at five to six billion individuals. They were easily the most numerous single type of bird in North America. It is entirely plausible, based upon contemporary estimates, they may have been the most numerous single species in the world.

Today, they are extinct. Why?

These birds were so plentiful, an industry arose to mechanically and systematically slaughter these birds to provide food for the poor and for slaves. Still, the rates of killing must have been amazing. Assuming 6 billion individuals, annihilated over 100 years, well, that requires killing 6800 birds per hour. Sure, it could have taken 200 years, so 3400 birds per hour then. It could have been 5 billion birds and 200 years, so 3000 birds per hour then. Big deal. What probably happened, as happens with many biological populations, was that a great number were killed in a short time, and, then, the natural sustaining mechanisms of the population, such as finding mates and nests, collapsed, since their ecosystems were destroyed, and natural forces did the rest. The end result, extinction, was the same nevertheless.

Oh, so right, don’t you feel so much better the last stroke was made by ‘natural forces’ rather than people? Of course you don’t: Anything to get off the hook.

Six billion to zero. That’s a pretty powerful human influence.

(1) The cited IPCC Report actually says (with the exception of footnotes, which I have removed for clarity, because they can be found in the original report, and because here they are superfluous)

Economic losses from weather- and climate-related disasters have increased, but with large spatial and interannual variability (high confidence, based on high agreement, medium evidence). Global weather- and climate-related disaster losses reported over the last few decades reflect mainly monetized direct damages to assets, and are unequally distributed. Estimates of annual losses have ranged since 1980 from a few US$billion to above 200 billion (in 2010 dollars), with the highest value for 2005 (the year of Hurricane Katrina). Loss estimates are lowerbound estimates because many impacts, such as loss of human lives, cultural heritage, and ecosystem services, are difficult to value and monetize, and thus they are poorly reflected in estimates of losses. Impacts on the informal or undocumented economy as well as indirect economic effects can be very important in some areas and sectors, but are generally not counted in reported estimates of losses. [4.5.1, 4.5.3, 4.5.4] What Smith took out of context and reported in isolation is the highlighted section of the following quote: Economic, including insured, disaster losses associated with weather, climate, and geophysical events are higher in developed countries. Fatality rates and economic losses expressed as a proportion of gross domestic product (GDP) are higher in developing countries (high confidence). During the period from 1970 to 2008, over 95% of deaths from natural disasters occurred in developing countries. Middle-income countries with rapidly expanding asset bases have borne the largest burden. During the period from 2001 to 2006, losses amounted to about 1% of GDP for middle-income countries, while this ratio has been about 0.3% of GDP for low-income countries and less than 0.1% of GDP for high-income countries, based on limited evidence. In small exposed countries, particularly small island developing states, losses expressed as a percentage of GDP have been particularly high, exceeding 1% in many cases and 8% in the most extreme cases, averaged over both disaster and non-disaster years for the period from 1970 to 2010. [4.5.2, 4.5.4] Increasing exposure of people and economic assets has been the major cause of long-term increases in economic losses from weather- and climate-related disasters (high confidence). Long-term trends in economic disaster losses adjusted for wealth and population increases have not been attributed to climate change, but a role for climate change has not been excluded (high agreement, medium evidence). These conclusions are subject to a number of limitations in studies to date. Vulnerability is a key factor in disaster losses, yet it is not well accounted for. Other limitations are: (i) data availability, as most data are available for standard economic sectors in developed countries; and (ii) type of hazards studied, as most studies focus on cyclones, where confidence in observed trends and attribution of changes to human influence is low. The second conclusion is subject to additional limitations: (iii) the processes used to adjust loss data over time, and (iv) record length. [4.5.3]. Note Smith conveniently skipped quoting “Economic, including insured, disaster losses associated with weather, climate, and geophysical events are higher in developed countries”, something which bears upon United States policy. (2) This narrative, that humans are chosen by a Creator or Higher Power from among all creatures for a special role in creation is one of the most damaging characteristics of developed and organized religion. No religious tradition, in my opinion, including Unitarian Universalism, can have a proper environmental sentiment until it condemns and distances itself from that idea. The official UU take on “interdependent web of all existence” in fact down plays its environmental connection, called it “a profound mistake” if it is exclusively applied to the environment. Maybe so, but do expand it beyond the environment minimizes the deep connection we have to the natural world, the deep obligation we have as caretakers, and the deep collective burden and guilt we should bear for initiating the latest planetary extinction event, not only through climate change, but in our treatment of innocent co-partners on the planet, like the Passenger Pigeon. Humanism must extend to beyond humans. ## The dp-means algorithm of Kulis and Jordan in R and Python dp-means algorithm. Think k-means but with the number of clusters calculated. By John Myles White, in R. (Github link off that page.) By Scott Hendrickson, in Python. (Github link off that page.) | Tagged | Leave a comment ## Speeding up your code Some of the actual as opposed to imagined needs for concurrent computing. Originally posted on Dr Climate: In today’s modern world of big data and high resolution numerical models, it’s pretty easy to write a data analysis script that would take days/weeks (or even longer) to run on your personal (or departmental) computer. With buzz words like high performance computing, cloud computing, vectorisation, supercomputing and parallel programming floating around, what’s not so easy is figuring out the best course of action for speeding up that code. This post is my attempt to make sense of all the options… Step 1: Vectorisation The first thing to do with any slow script is to use a profiling tool to locate exactly which part/s of the code are taking so long to run. All programming languages have profilers, and they’re normally pretty simple to use. If your code is written in a high-level language like Python, R or Matlab, then the bottleneck is most likely a loop of some sort… View original 1,435 more words Posted in Uncategorized | Leave a comment ## Blind Bayesian recovery of components of residential solid waste tonnage from totals data This is a sketch of how maths and statistics can do something called blind source separation, meaning to estimate the components of data given only their totals. Here, I use Bayesian techniques for the purpose, sometimes called Bayesian inversion, using standard tools like the JAGS Gibbs sampler, and the programming language, R. I’ll present the problem and the solution, and then talk about why it works. Getting residents of a town to reduce their solid waste production can be an important money saver for towns, many of which are paying$80 per ton or more for solid waste disposal and hauling. Much effort is spent convincing them to recycle materials that are eligible, and to do other things, like household composting which reduces weight a lot due primarily to the latent water in household garbage scraps.

But outreach programs only work if residents can be successfully contacted, and this can be difficult in suburban settings. Moreover, blanket advertising and promotions can themselves be costly and may be ineffective. As in other marketing, it is better to identify target populations and focus messaging on them. In this case, for a town, identifying households which produce more per unit tonnage is attractive, and outreach can be limited to them. Unfortunately, tonnage by household is not generally available to towns. Even tonnage by street is typically not reported by haulers. What is reported is total tonnage by route, or something comparable.

While this sketch is simplified for ease of understanding, and an actual analysis of trash routes would require more effort and details, it shows how such separation can be done. I also don’t have actual control tonnage for a set of houses. (I do have route tallies.) Thus, the technique used for illustration is simulation of synthetic data where the contribution of individual houses is known by design. The objective is to see if the Bayesian inversion can recover the components.

For the purpose I posit that there are two streets in a neighborhood, one have 5 homes, the other having 7 homes. The hauler reports weekly solid waste tonnage for the combination of both streets. There is a year’s worth of data. Generally speaking, larger homes produce more waste, where “larger” can mean more people or bigger houses or higher income per home member. Also, generally speaking, the variability in tonnage each week is higher the more trash is produced. The same kind of model can be applied to recycling tonnage, but here I’m limiting the study to trash. In particular, image that the two streets are arranged as in the following figure. The numbers indicate the amount of solid waste produced by the nearby home per week. The units are in tens of kilograms.

(Click on figure to see larger image.)
Mean values per week for the hours were chosen by modeling each street with a Poisson, and picking a different mean for each. Specifically,
 SevenHomeMeans<- rpois(n=7, lambda=1.6*10)/10 FiveHomeMeans<- rpois(n=5, lambda=2.3*10)/10 

I model variability by positing that the coefficient of variation for each home is constant, so the standard deviation in variability is proportional to their mean. In particular, I’ve picked a somewhat arbitrary 1.2 for the coefficient of variation, intending to capture the variation across major holidays. The model assumes no correlation week to week per home, either in the simulation or in the recovery. There is a town wide correlation easily seen across the November-December holidays, but how strong this is at a household level is not at all clear. The technique should work nevertheless, but these synthetic data do not demonstrate that.

52 observations are drawn for each of the houses, via
 SevenHomeMeans<- c(1.7, 1.7, 1.4, 1.0, 2.7, 1.8, 1.5) FiveHomeMeans<- c(2.2, 3.4, 2.7, 2.5, 1.6)

 # Assume a constant coefficient of variation for all: sdSevenHomes<- 1.2*SevenHomeMeans sdFiveHomes<- 1.2*FiveHomeMeans # Draw N == 52 observations from each. N<- 52 M<- 2 

ySevenHomes<- t(sapply(X=1:7, FUN=function(k) abs(rnorm(n=N, mean=SevenHomeMeans[k], sd=sdSevenHomes[k])))) yFiveHomes<- t(sapply(X=1:5, FUN=function(k) abs(rnorm(n=N, mean=FiveHomeMeans[k], sd=sdFiveHomes[k])))) 

Then the results are added to get each street’s tonnage for the year, and the tonnage for the streets added to get the composite, confounded tonnage, in y.
 y7<- colSums(ySevenHomes) y5<- colSums(yFiveHomes) y<- y5+y7 
The resulting dataset looks like:

(Click on figure to see larger image.)
If the data contributing to these totals from the individual streets are displayed on the same scatterplot, it looks like:

(Click on figure to see larger image.)
Of course, the Bayesian inversion does not know these … It’s trying to figure these out. And, in fact, it does that pretty well:

(Click on figure to see larger image.)
The maroon trace is the algorithm’s estimate of the higher street in terms of tonnage, the 5 home street. The aqua or skyblue trace is the estimate of the lower street’s tonnage. While these do not go through all the corresponding dots, especially in their extremes, the estimates follow the trends pretty well. The estimates also are sometimes early when there’s an uptick or downtick, and sometimes late, about one or two weeks at most.

The point is to find out which street should receive more educational materials and tutoring regarding solid waste reduction. Detailed predictions by week aren’t really needed for that. All that’s really needed are estimates of the overall means for each street separately. The algorithm finds that, but the week by week breakouts are a bonus.

(Click on figure to see larger image.)
Here the pinkish red dashed line corresponds to the estimate of the mean of the higher tonnage street, and the navy blue dashed line to the estimate of the mean of the lower tonnage street.

Lastly, it’s good to see that the sum of the estimates of the individual streets for each week match the data very closely.

(Click on figure to see larger image.)
Now, a bit of fun. We can watch the Gibbs sampler in action. First, upon startup, the sampler is initialized and that initialization is (typically) far from any solution. The movie shows how the algorithm explores and eventually finds the area of plausible solutions for breaking tonnage into components.

Once that region is found, the algorithm spends a lot of time trying various combinations that add up to the totals of the data, in order to estimate the best overall mix.

The code and data for these calculations are available in a gzip’d tarball. It includes the code for making the movies, but these require software not included, such as ImageMagick tools. Also, the code is not portable, offered as is from the version generating this example on my 64-bit Windows 7 system.

The technical details of the Bayesian hierarchical model are given below.

The value of the combined tonnage at each point is the sum of two components, both modeled as Gaussians having distinct means and standard deviations. Both standard deviations have a common hyperprior as their model, a Gamma distribution with shape 3 and rate 0.005. The means, however, are products of a $\lambda_{j}$, a random variable specific to component $j$, and another random, called $\text{rangeOfMasses}$ in the code, which represents how much total tonnage is produced at the time point. $\text{rangeOfMasses}$ is modeled as a uniform random variable having a value between 10 and 500 kilograms. The $\lambda_{j}$ random variables, are taken from a partial stick-breaking process. That means $\lambda_{1}$ is drawn from a uniform on the unit interval, and $\lambda_{2}$ is drawn on the interval zero to $1 - \lambda_{1}$. The remainder of the unit interval is ignored. This mechanism keeps $\lambda_{1}$ and $\lambda_{2}$ separated and, thereby, the means of the two components, identified in the code as $\mu_{1}$ and $\mu_{2}$. Finally,
the total at each time point is the sum of the components and the zero-crossings trick is used to constrain it close to the data $y_{i}$, by saying
 data { for (i in 1:N) { z[i]<- 0 } } model { . . . for (i in 1:N) { for (j in 1:M) { yc[i,j] ~ dnorm(mu[j], tau[j]) } tally[i]<- sum(yc[i,1:M]) z[i] ~ dnorm(y[i] - tally[i], 30) } . . . } 

So, why does this work?

The statistics of the waste production of the homes on the 5 house street are markedly different than those of the 7 house street. Because of that, there is a clear way to cluster and separate the two streets. Were the statistics more to overlap — and this would mean in both means and in variability — this separate would be more difficult. Note that if the means were the same but the standard deviations of waste tonnage markedly different between the two streets but consistent on the same street, that would suffice to separate.

There are ways of doing the separation using frequentist methods, notably using the SVMs of machine learning, but I think Bayesian separation is more straightforward and cleaner.