Annotated high-throughput microscopy image sets for validation
Choosing among algorithms for analyzing biological images can be a daunting task, especially for nonexperts. Software toolboxes such as CellProfiler1,2 and ImageJ3 make it easy to try out algorithms on a researcher’s own data, but it can still be difficult to assess whether an algorithm will be robust across an entire experiment based on the small subset of images that is practical to examine or annotate. Even if controls are available, a pilot high-throughput experiment may be insufficient to show that an algorithm will robustly identify rare phenotypes and handle the experimental artifacts that will invariably be present in a high-throughput experiment. It is therefore useful to know that a particular algorithm has proven superior on several similar image sets. The performance comparisons presented in papers that introduce new algorithms are often not very helpful for assessing this because each study typically relies on a different test image set (often to the advantage of the proposed algorithm), thealgorithms compared may not be the ones the researcher is mostinterested in and the authors may not have implemented otheralgorithms as optimally as their own. Although biologists shouldalways also validate algorithms on their own images, it wouldbe useful if developers would quantitatively test new algorithmsagainst a publicly available established collection of image sets. Inthis way, objective comparison can be made to other algorithms,as tested by the developers of those algorithms. We see a need forsuch a collection of image of image sets, together with ground truth andwell-defined performance metrics.Here we present the Broad Bioimage Benchmark Collection(BBBC), a publicly available collection of microscopy images intendedas a resource for testing and validating automated image-analysisalgorithms. The BBBC is particularly useful for high-throughputexperiments and for providing biological ground truth for evaluatingimage-analysis algorithms. If an algorithm is sufficiently robustacross samples to handle high-throughput experiments, lowthoughputapplications also benefit because tolerance to variabilityin sample preparation and imaging makes the algorithm more likelyto generalize to new image sets.Each image set in the BBBC is accompanied by a brief descriptionof its motivating biological application and a set of groundtruthdata against which algorithms can be evaluated. The groundtruth sets can consist of cell or nucleus counts, foreground andbackground pixels, outlines of individual objects, or biologicallabels based on treatment conditions or orthogonal assays (such asa dose-response curve or positive- and negative-control images).We describe canonical ways to measure an algorithm’s performanceso that algorithms can be compared against each other fairly, andwe provide an optional framework to do so conveniently withinCellProfiler. For each image set, we list any published results ofwhich we are aware.
The BBBC is freely available from http://www.broadinstitute. org/bbbc/. The collection currently contains 18 image sets, including images of cells (Homo sapiens and Drosophila melanogaster) as well as of whole organisms (Caenorhabditis elegans) assayed in high throughput. We are continuing to extend the collection during the course of our research, and we encourage the submission of additional image sets, ground truth and published results of algorithms.
Carpenter et al, Broad Institute of Massachusetts Institute of Technology and Harvard, Cambridge,Massachusetts, USA, nature methods | VOL.9 NO.7 | JULY 2012 | pp 637