This week, TCS+ hosted a talk by Greg Valiant via a Google+ hangout. Valiant gave a talk on his work with brother, Paul, on an efficient estimator for entropy and support size of an unknown probability distribution requiring only O(n/log n) samples, where n is a bound on the support size of the distribution. This work diverges from the existing literature by demonstrating that the estimate can be obtained with a concrete linear program; an algorithm which outputs a distribution very similar to the unknown distribution with respect to certain statistical properties.
A second problem tackled algorithmically by the Valiants is the lower bound on sample size. Namely, they show that no algorithm on o(n/log n) samples can estimate these properties up to a small additive constant. The lower bound relies on two multivariate central limit theorems; one with respect to the earthmover distance and which lies foundation for the second, under L1 distance. (Earthmover distance is also known as the Wasserstein distance. This work on estimating unknown distributions is my first encounter with earthmover distance, and I must say I hope this metric – not only for the nomenclature – catches on!) The earthmover distance CLT is proved with Stein’s method. Using Stein’s method for central limit theorems has gained recent popularity as a powerful tool for showing convergence to a particular distribution. As Valiant himself explains, it is a way to “deal with strange settings gracefully.” Stein’s method is noteworthy in itself, and I would like to devote a few words on its behalf.
The Stein continuity theorem states that a sequence of real random variables with uniformly bounded second moment, Xn, converges to the standard normal distribution N(0,1) exactly when E[f'(Xn) – Xnf(Xn)] converges to 0. Here, f : R → R is continuously differentiable and both f, f’ are bounded. In its bare form, this theorem can be used to give Berry-Esséen-type bounds, and in fact, as noted in the wonderful blog post of Terry Tao on the central limit theorem, it can be used to prove the full Berry-Esséen theorem. In his 1991 article, Götze uses Stein’s method to derive such a multivariate CLT bound for the class of Borel convex subsets of Rk. His basic strategy for estimating the “distance” in expectation of two probability measures (nicely outlined in a technical report by Rabi Bhattacharya and Susan Holmes) is:
- find an invertible map L which maps functions in the space to the null space of the mean of one of the distributions
- find a perturbation of L
- estimate the distance of distributions by the expected change of L to its perturbation
This algorithmic approach is immediately appealing to me as a computer scientist, and introduces some new ways to think about central limit theorems. Valiant, for example, uses an inspired process to show that a distribution is (nearly) Gaussian. The basic idea behind this approach is to transform unusual or unfamiliar random variables into the multivariate Gaussian by a process of adding noise and rescaling a well-behaved set of test functions. At a high level, Valiant presents this procedure for showing a distribution is close to Gaussian (or any other distribution):
- find a transformation for which the Gaussian is a fixed point
- apply this transformation to your test functions
- look at the change incurred: if no change, the distribution is close to Gaussian/if big change, the distribution is far from Gaussian
We avoid working with the strange distribution by simulating a transformation on carefully chosen test functions.
Valiant’s application of the approach illustrates the versatility of Stein’s method, and its beauty in an algorithmic form. In my opinion, this is a tool that will be of great use to the computer science community because of its elegance and simplicity. And the fact that it prevents fussing around with messy random variables cannot be ignored.
As a closing note, I wanted to comment on the format of the talk. Promptly at 10am PST on Wednesday morning, I and a few other students and faculty met in a conference room where a laptop was hooked up to the projector and lights were dimmed to watch Greg Valiant and a number of other faculty, groups, and perhaps just enthusiasts appropriately outfitted with headsets and coffee mugs, projected onto tiny cells on the screen before us. These cells surrounded the slides accompanying the talk so that the observer had an up-close view of the speaker and attendees. Those who were added to the hangout could interact with verbal questions, and those who missed out on G+ seats were able to contribute with comments. While the scenario was initially jarring, as the talk progressed it became evident that an air of casualness was cast. Something about being seated at your own desk, behind the security of a webcam and a screen, really encourages relaxation and interaction at a comfortable pace. The talk, now posted on YouTube, for instance, spanned a full hour and a half, while the G+ event was scheduled to last only an hour. The talk in no way felt dragged out or verbose, merely paced and relaxed. Further, the recorded stream is now immortalized (for the time being) on YouTube, so the viewer has the freedom to revisit any part of the talk later on. I think this is a trend that will gain some momentum in the future. Following in the footsteps of online open courseware, online venues such as these make important talks available to the general public, and yet maintains the caliber (sanctity) of the talk. I look forward to tuning into some TCS+ streams in the coming weeks!
I’m curious: from your post it’s not completely clear if your group joined from inside the hangout, or if you were watching the live stream from “outside”? If you were outside, did you feel you were “missing out” in any way? (Had you asked for a spot?) Was the video quality good enough? One of the main advantages of joining the hangout is that the quality tends to be much higher inside. The limit of 10 groups is unfortunate, and we’re trying to get it increased to 15, which Google will allow in some cases.
In any case, I’m glad that you enjoyed the experience and hope you can join us in the future! I’ve been both inside and outside the hangout, and when watching from outside, a part of a group, I had the same feeling as you describe: it’s very nice to be able to watch in a relaxed manner, chat a bit with the other watchers, pull up the speaker’s slides on another laptop, have a coffee, etc. It’s a different experience, but quite pleasant.
One thing we’re trying to encourage is interaction and participation. If you have any comments as to how to further that, they’d be welcome. Somehow “outside” watchers tend to be pretty silent; maybe the option of asking questions in the comments below the live stream is not good enough?
We were not in the hangout (though we tried!), and so watched from the “outside.” I think it would have been easier to interact from within the hangout, especially in a group setting. For example, if someone in the room has a comment or question there has to be some sort of moderation involved in getting the question from the participant to a typed comment. The laptop either has to be shuffled around, or the question dictated, etc. etc. In this case it is a lot easier to chime in verbally with a microphone and so actually being in the hangout would be a huge benefit. Despite this, at least for this talk, most of the audience interaction came from parties of one, so to speak, and not groups.
I think if I were to watch alone in the future it would be a simple matter to interact with comments. I see this as analogous to news casts which take live comments via Twitter or the like. Maybe it’s just a matter of getting the ball rolling.
I didn’t know where to find this info then kooabm it was here.
Hi! The end result looks great! I just tried to foollw this tutorial but it didn’t seem to work for me. I think GIMP has updated their menus since this was created so it was a little hard to foollw. For example the “copy visible “ is in the edit tab not the file tab. It seemed to stop working for me during the layering bit. Any chance you can update the steps.Thank you!Angela