Sunday, June 15, 2014

How to get people to do boring stuff they don’t want to do in lab

Lately, we’ve been working on a lot of infrastructure and process-oriented aspects of our work in lab, like a complete overhaul of our RNA FISH analysis software (now in sufficiently good shape to be publicly available to everyone), a probe database, and thinking about how best to organize our growing RNA-seq datasets. Once we have established what we believe to be best practice, though, the next issue is compliance. It’s one thing to tell people what they should do, quite another to actually get them to do it. For instance, we can all say “if you’re going to analyze your RNA-seq data, you should use this data organization scheme”, but there’s a natural entropy at play when people actually do work in the lab, and non-compliance is a natural by-product. How can you enforce best practice?

Well, actually, before starting to think about enforcement, I think it’s worth making sure that whatever scheme you put in place has actual, real benefits to people in the lab. I’ve come to realize that process, while it can enable science, is not science in and of itself, and it’s not always worth the effort. It’s a fine line, and perhaps somewhat a matter of personal taste; I think some folks are just fussier about stuff than others.

So what are the benefits? For our lab, I feel like there are three main benefits to building process infrastructure:
  1. Error reduction: To me, the most useful benefit to having a standardized and robust data pipeline is that it can greatly reduce errors. The consequences of mixing up your datasets or applying the wrong algorithm can be absolutely devastating in a number of ways.
  2. Reproducibility/documentation: For data, I feel, as do many others, that it is imperative to be able to reliably (and understandably) reproduce the graphs and figures in your paper from your raw data. Frankly, in this day and age, there’s no excuse not to be able to do this. Documentation is just as important for other things we do in lab, whether it’s how we designed a particular probe or what the part number is for some kit we ordered 3 years ago and is about to run out.
  3. Saving people time and facilitating their work: Good infrastructure can save time in a number of ways. Firstly, it hopefully leads to less wheel-reinvention, which I’ve seen all the time in other labs. Another way it saves time is by (hopefully) leaving a data trail; i.e., “That data point looks funny, can you show me the image it came from?” Good infrastructure makes it easy to answer that question, and makes it much easier to explore your data in general. If getting answers is easier, you will ask more questions, which is always a good thing.
So what’s the problem? Well, for points 1 and 2, the issue is that error reduction, reproducibility and documentation are just not that exciting, at least not to people who are more interested in doing science. That, and the payoff is typically a sigh of relief a couple years down the line. My experience thus far has been that most systems for documenting lab stuff, no matter how sound the rationale, just don’t stick without some serious effort. For instance, we have a probe “database” (i.e., spreadsheet) that is woefully out of date. And we have a number of protocols that are fairly out of date, and an orders spreadsheet that is out of date, you get the idea. Same for RNA-seq and RNA FISH datasets, at least at high level data organization. You know the feeling: “No, not that transcription inhibition dataset, that’s the one that came out funny because of the cells acting weird, use this one instead…” The only way to enforce in these cases is to create a punitive rule, something like "no more orders placed until you update the ordering sheet". Sucks, but I guess that works.

But point 3, saving time and facilitating work, that’s something everyone can all get behind without any prodding. And then there's never any issue of compliance. For instance, our software provides all the backend to make sure that our data is fully traceable from funny outlier data point to the raw images of a particular cell. But it also provides all the tools to analyze data and use all the latest tricks and tools for image analysis that we have developed in the lab. For this reason, it's essentially inconceivable that anyone would spend any time writing their own software and doing anything else, the benefits are big and, importantly, immediately realizable future.

So what I’m thinking is that we somehow have to structure all the boring lab documentation tasks so that there is some immediate gratification for doing so. What can that be? I’m not sure. But here’s an example from the lab. We’re working on having our probe database automatically generate identifiers and little labels that we can print out and stick on the tube. Not a huge deal, but it’s sort of fun and certainly convenient. And it’s something you can enjoy right away and only get if you access the probe database. So I’m hoping that will drive the use of the database. A more ambitious plan is to develop similar databases for experiments and consequent datasets that would enable automatic data loading. This would be both important for reproducibility, but would also be enormously convenient, so I’m hoping people in the lab would be excited to give it a whirl.

Anybody else have any thoughts about how to encourage people to participate in lab best practices?

No comments:

Post a Comment