Read our COVID-19 research and news.

A Young Tag Team Detects a Major Pipeline Leak


Last week, a Proceedings of the National Academy of Sciences (PNAS) article created quite a stir. The paper showed that elite male scientists do a significantly worse job than other men and elite women at hiring women as postdocs and graduate students.

Almost as striking as the article's key result is the makeup of its authorship team. Jason Sheltzer, the article’s first author, is a graduate student, studying cancer biology at the Massachusetts Institute of Technology (MIT) in Cambridge. Joan C. Smith, the article’s last author, works at Twitter. They are partners—a couple. Detecting in their partnership (and in its tangible result) a rich stew of career-related themes—workforce diversity, elitism, dual-career couples, data science, academia–IT industry crossover, and more—we asked Sheltzer and Smith to tell us about their partnership. The interview was conducted by e-mail. It has been edited for brevity and clarity.

I think it's quite striking how a small number of 'elite' labs function as gateways to the professoriate.

—Jason Sheltzer

Q: Please tell me about your backgrounds. Also, how did you meet?

Jason: I majored in molecular biology at Princeton University and then went to grad school in biology at MIT. Joan and I met on OKCupid. We started chatting about our mutual love of Richard Feynman and hydrogen atoms. We first met up at a local cafe and haven’t looked back since!

Joan: I have a Bachelor of Science in physics from MIT. From just a few dates, it was obvious that we valued the same things (work, science, feminism), had complementary personality quirks, and got along exceptionally well.

Q: Jason, tell me about your cancer research.

Jason: Normal cells have 46 chromosomes, but nearly all cancer cells have the wrong number of chromosomes. I'm studying how changes in chromosome number (a condition known as aneuploidy) affect cell physiology and cancer progression.

Q: Joan, what do you do at Twitter?

Joan: I'm a software engineer. I joined the company in May, working on the Crashlytics team (a company that was acquired by Twitter about 18 months ago). I build tools for other developers; the things I build help other programmers do their jobs better and more efficiently. It's incredibly rewarding to have a charter to build things to make your own job easier.

Q: How did you first decide to work together, and what's the nature of the partnership?

Joan: My first job after graduation left me my fair share of free time. I started fishing around for a side project, but didn't come up with anything particularly compelling until Jason and I started thinking about collaborating. Once we figured out that it was possible to combine our skills, it was obvious that the best side project I could find would be one where we worked together.

Jason: I was analyzing some microarray data, and I reached the limit of what I knew how to do, in terms of data analysis. So I described the scientific question to Joan, and in about 30 minutes, she had set up a Python script to answer it. After that, I asked her more questions, and for each one, she was able to get back to me with an answer. It was terrific! I have some insight into biological topics worth pursuing, and collaborating with Joan really expands the range of questions that I’m able to address.

Q: Where did the idea for this PNAS article come from?

Jason: One of our very close friends is a graduate student in physics. She found out that she was going to be the first female graduate student that her professor had ever trained in a career lasting more than 20 years. We thought that was astounding. I didn't know very much about lab choice or career structure in physics, but I knew a little about these issues in biology, so we started looking into gender in biology labs.

Q: Where did you get the data? Was this a result of "scraping" websites, or did it involve more traditional data collection methods?

Jason: Joan and I started counting the grad students and postdocs from the websites of a few biology labs. We soon found a striking pattern—elite male faculty in the life sciences hire particularly few women—but we also found that it would be difficult to get a large enough sample size to make the results robust and representative of the life sciences as a whole. I had recently sold my car, so we ended up spending the money I had made to hire freelance data scrapers to collect more lab information than we could.

Joan C. Smith and Jason Sheltzer

Joan C. Smith and Jason Sheltzer

Courtesy of Jason Sheltzer

Q: What's buried in there that those who focus on the lede—elite male scientists hire fewer female grad students and postdocs than elite women and non-elite men—are likely to miss?

Jason: I think it's quite striking how a small number of "elite" labs function as gateways to the professoriate. We found that about 10% of all faculty members are members of the National Academy of Sciences, but about 60% of new faculty members did a postdoc with a member of the National Academy. I think that this says something about the insular nature of academic science: "If I don't know who your principal investigator (PI) is, you're not likely to get a job interview with me." It probably limits the scope of scientific questions that new PI’s investigate. They're mostly coming from established labs working on established topics.

Q: This isn't even your main field, for either of you. It seems to me that submitting to such a prestigious journal takes some audacity.

Joan: By the time we had a paper that we were ready to submit, we thought that the results were reasonably important. We decided that the best way forward was to present the data we found without too much speculation about why’s and how’s. A scientific journal seemed the best bet. It was far more important to us to have biology professors read our study than anyone else, and so to make that happen, we wanted to put the paper in a place that was likely to be found by our target audience.

Jason: We found that membership in the National Academy of Sciences for male PIs was highly correlated with hiring fewer women. We thought that this fact might make the paper interesting to the readership of PNAS.

Q: In different ways, you've both done this work part-time: For Jason, it's a break from cancer research; for Joan, it comes after a workday at Twitter. How did this work logistically?

Joan: We worked on this during evenings, weekends, and spare minutes. Sometimes we sat together at home and worked, and sometimes I'd head into the biology lab after work. We'd work until midnight together—him switching between bench work and counting trainees, and me typing away on analysis.

Jason: Counting websites was something that I could do whenever I found a spare 10-minute block. We were also lucky to get some really helpful freelancers working with us. Once the data collection was done, I spent a bunch of nights and weekends writing the paper and making the figures.

Q: Joan, this one's just for you. We seem to live in an age in which a smart person with strong coding and data-analysis skills can do all sorts of things with the sea of data that's increasingly available. This facilitates things like having a moonlighting Twitter employee contribute important articles to leading scientific journals. Please offer a perspective on the opportunities—professional, scientific, or just in terms of gaining insight—available to people with the skills to extract them.

Joan: The amount of data that exists in the world and on the Internet is amazing and sometimes overwhelming. The real constraint to doing interesting things with that data is finding a productive question that can actually be answered with what's available. Since the data is effectively limitless, the primary bounds need to be imposed by your scientific question.

Q: Another one for you, Joan. If I'm interested in doing data-intensive research, what skills, techniques, or languages should I learn?

Joan: The most important "soft" skill is a sense for how data is organized. Figuring out which pieces of data need to go next to each other in a table is a good chunk of the hard part of data analysis. I think that once you can figure out how data connect in one or two datasets, it starts to make more sense overall. And once you have a feel for how the data fits, you can start building things to analyze that data.

There are a few "harder" skills too. You have to know at least the basics of programming. (I like Python, but language only matters a little.) You have to have enough of a background in math that you have the confidence to figure out some new statistical method or tool that you haven't seen before. But in the end, the only things you really need are a computer, Google, time, and the confidence it takes to figure stuff out.

Q: How do you envision the future of your partnership, scientifically and otherwise?

Jason: Professionally, we’re working on another paper together using Joan’s software and data-analysis ability to parse through gene-expression data from more than 20,000 cancer patients. Personally, I'm hoping that we get a cat soon.