Cloud-Based Data Science

Cloud-Based Data Science is a free, online, educational program that sets out to help anyone who can read, write, and use a computer move into the field of data science. We’ve developed content and have built and carried out an in-person tutoring program to ensure we reach those who historically have not had access to data science education. With each scholar that completes the program we’re working to improve the economic conditions for people locally (in Baltimore, MD) and ultimately around the world.

The recount project

RNA-seq data for ~70,000 human samples have been aligned using a single analytic pipeline called Rail-RNA, developed and implemented by Abhi Nellore. Spearheaded by Leo Collado-Torres and including the efforts of many in our group, these data have been processed and made available in a resource called recount. While the expression data are publicly available, we lack critical phenotype information for many of the samples included in this resource. In addition to identifying technical artifacts to be removed across these data, I’m developing phenotype predictors (ie sex, tissue, etc.) from the gene expression data to make important sample information availabe across all samples within recount.