Here’s comments on what I most enjoyed reading this week. And yes, a lot of this post is on educational attainment genomics paper number three (nickname EA3). Point #1 below. But plenty of other good stuff too!
1. Genomic prediction of educational attainment. New paper analyzes DNA of 1.1M people of European descent to find 1271 genetic variants associated with educational attainment. From these variants they built a polygenic score, which creates a single predictive scoring number for each individual using their DNA. The paper’s polygenic score explains 11% of the population wide variation in years of schooling. I don’t have the chops to dissect the paper, but read quite a few commentaries. So here’s my bullet points:
- The predictor explains 11% of variation. 11% seems low. True! But as a compare, that low 11% number is similar to the college entrance SAT test. In other words, the polygenic score has the same predictive power of you finishing college as taking the SAT.
- Household income explains 7% of variation of educational attainment, which is worse than the polygenic score.
- Parental educational level is a slightly better predictor than the polygenic score
- Since the predictor was built from those of European descent, it does not work when tried against different ancestries (in particular African-Americans) as they may have different genetic variants.
- People scoring in top polygenic quintile are 5x more likely to complete college than those in lowest quintile
- quote: But for any given score, there are huge variations in years of schooling. “Should we use the score to put some people into more advanced classes and others into more remedial classes?” says Benjamin. “That’s a total nonstarter because of the low predictive power for any given individual.”
- Going to larger sample sizes beyond 1 million people will enable building better predictors. But won’t gain a lot more in terms of predictive power. Why? Because the max may be around 15% of variation explained (I took that max number from this podcast). Again why? Because that’s the max predictive value you get when using test scores and grade transcripts. That is to say, directly measuring how well people did in high school and taking the SAT test is better than looking at genes. This is more obvious if you think about height. What’s a better predictor of height? Your genes or simply measuring how tall you are? Life has a random edge. And within limits, free will. Genes have statistical predictive power, but are not fate.
- Many of the genetic variants involve neurons and synapses. Probably best understood by noting if this were not the case, it would call into question the entire paper.
- Useful side point. The number of humans alive was on the order of 250k during most of Homo sapiens existence as hunter gatherers. That number is too small for the methods in this paper to work. So it’s not possible to do this analysis on other primates. For example the total chimpanzee population is estimated at 170-300k. Just not enough of them. But nowadays we’ve got lots of humans. This puts some perspective of just how small the effect size is for any individual gene. Tiny tiny tiny. You need huge sample sizes to get the statistical power necessary to discover what’s going on.
- The idea of many genes of very small effect for complex traits is an expectation set in Fisher’s 1930 book. This is not a new idea.
I think this study is both more and less than it seems. It’s less because the amount of variation explained is roughly what you get by taking an SAT test. And while polygenic scoring will continue to improve, it’s already relatively close to the expected max. Yet it’s more as well. Because the genomic data revolution in biology is continuing apace, and is starting to create tectonic shifts in how society thinks about genetic luck. The dystopia of Brave New World was and is wrong, but it’s less impossible than it appeared to be just a few decades back.
More reading. Ed Yong: “The team is essentially studying genes so they can more thoroughly ignore them.” Carl Zimmer: “Indeed, the latest study is just the newest in what promises to be a tide of huge genetic studies.” Steve Hsu: “Years ago I predicted….” Kathryn Paige Harden: “Why Progressives Should Embrace the Genetics of Education” Razib Khan and Spencer Wells podcast: Good podcast explainer on the study, interviewing one of the lead authors James Lee.
2. Amazon, Google, Apple, Microsoft invest in their own proprietary tech, moving them farther and farther ahead. Nice write up by Christopher Mims on the work of economist James Bessen. Sample sentences: “When new technologies were developed in the past, they would diffuse to other firms fast enough so that productivity rose across entire industries.” “But imagine instead of power looms, someone is trying to copy and reproduce Google’s cloud infrastructure itself. ” And what we see now is “a slowdown in what we call the ‘diffusion machine.’ ” link.
Also see Robin Hanson’s post on whether compulsory licensing is the solution. I think not, but Hanson is always interesting and original.
3. Most revenue from web degrees goes not to their providers but to middlemen. I have a rule of thumb that the subtitle of articles tends to be less clickbaity and more informative than the title. The title of that Economist piece is Universities withstood MOOCs but risk being outwitted by OPMs. Yawn. So I used subtitle in bold above for this point #3. The middlemen are OPMs (online program managers). Here’s my favorite paragraph:
When the web started to shake up higher education a decade or more ago, it was widely expected that the Massive Open Online Courses (MOOCs) it spawned would disrupt universities in the same way that digital media undermined newspapers and music firms. But that assumption rested on a misunderstanding of what students are paying for. They are not buying education for its own sake, but rather a certificate from a respected institution.
Credentials. Universities naively assumed the story they like to tell themselves is true. People go to college to learn. Not to get credentials so employers know who is smart and willing to follow orders. So universities gave all their online profits to middlemen who now determine who gets a credential. This is not a stable equilibrium. People giving out credentials have market power. Eventually they should wise up and take their money. But who knows how long that might take? Or perhaps the OPMs will innovate their online credential system fast enough to hit escape velocity, since they own the invaluable customer relationship. One last good quote: “a third of graduate education in America is now online.”
4. Housing Costs Reduce the Return to Education. Here’s my tweet summarizing Alex Tabarrok’s excellent post.
5. Great interactive map to drill down into 2016 voting of US presidential election. What I liked is how this map allows you drill down level after level, and see how the pattern remains fractal. Higher population density votes Democratic, lower density Republican. At all levels. Bill Bishop’s 2004 book has unfortunately been proven more prophetic than one might wish The Big Sort: Why the Clustering of Like-Minded American is Tearing Us Apart.
6. Underground water detected on Mars. Of course the subtitle (not the title) of Lee Billings’ piece captures it: Radar observations have revealed what appears to be a buried lake on Mars, the first-ever stable reservoir of liquid water found on the Red Planet. What’s not to like? Cool data analysis of years worth of radar data from the European Space Agency spacecraft Mars Express, reveals liquid water under the polar ice caps of Mars. Lots of good details in the Lee Billings piece, or try the shorter New York Times version.
And that’s all for this week.
another great blog entry.
>> It’s less because the amount of variation explained is roughly what you get by taking an SAT test.
Can you get 20 embryos to take an SAT test and implant the one with the highest score?
Yep. See my #4 point from January post. This post was already long enough, so didn’t get to embryo selection aspect.