CCD is a Java software package for the selection of core collections of diverse taxa (e.g. from germplasm collections) that are intended to capture the genetic diversity of the input dataset.

The software takes as input a binary matrix which transcribes the different alleles (genetic forms) for each taxon. It then tries to find a most diverse collection of alleles under various constraints (e.g. sample size, geographic location of samples).

The software implements many of the most popular methods for determining core collection. In addition it includes novel greedy methods for selection based on polymorphism information content, Shannon entropy and phylogenetic trees/networks. In particular, for the latter approach we construct either a Neighbor-joining tree or Neighbor-net network from a distance matrix computed in PHYLIP distance matrix format by CCD. The resulting structure (tree or network) in tabbed text or NEXUS format and can then be loaded into CCD to determine the core collection. The phylogenetic tree/network methods we implement are based on a greedy algorithm for computing high diversity collections that is known to be optimal for trees.


CCD is written in Java, and the most recent version, version 1.0, is compiled using J2SE 5.0 - the application is standalone but if you are having problems running it please ensure you have Java Virtual Machine (VM) version 5.0 or above installed.

Once you have a suitable Java VM installed, download the ZIP file containing CCD (ZIP,36KB) and its dependencies. The CCD JAR file within this ZIP file includes a manifest enabling it to be run from the command line using java -jar ccd.jar.

CCD is Copyright © 2006-2010 Martin Lott, Jo Dicks & Vincent Moulton.


For technical questions or feature requests, e-mail Martin Lott.

Related papers

Jo Dicks, Germplasm collections: Gaining new knowledge from old data sets, UK Knowledge Discovery and Data Mining symposium, 2006.
M Steel, Phylogenetic diversity and the greedy algorithm, Systematic Biology, 2005.
Daniel J. Schoen, Anthony H. D. Brown, Conservation of allelic richness in wild crop relatives is aided by assessment of genetic markers, Proc. Natl. Acad. Sci. USA, 1993.
Chris Thachuk1, José Crossa, Jorge Franco, Susanne Dreisigacker, Marilyn Warburton and Guy F Davenport, Core Hunter: an algorithm for sampling genetic resources based on multiple genetic measures, BMC Bioinformatics 10:243, 2009.


A manual for the software is available (pdf 43 KB).


The work was funded in part by BBSRC grant BB/E004105/1.


This software is supplied as-is, with no warranty of any kind expressed or implied. We have made every effort to avoid errors in design and execution of this software, but we will not be liable for its use or misuse. The user is solely responsible for the validity and consequences of any results generated.

Research Team

Martin Lott, Prof. Vincent MoultonDr. Jo Dicks