Can massively parallel DNA sequencing paired with ubiquitous utility computing save the world?
Location: D'Arcy Thompson Room (School of Computing Sciences)
Date: 13.00-14.00 9 Dec 2011
Speaker: Dr. William Spooner
Organiser: Dr. Katharina Huber
Institution: Eagle Genomics Ltd, Cambridge, UK
Abstract: In the decade following the publication of the human genome, DNA sequencing costs have fallen a million-fold. This has stimulated research into epidemiology, public health, drug discovery and plant breeding driven by next-generation sequencing (NGS). Speculation is rife on how these techniques will help "cure disease" or "feed the world". Even though utility computing (the "cloud") has made high performance computing (HPC) widely available, the bottleneck to research involving NGS has become data analysis. As sequencing costs continue to plummet at a rate faster the Moore's law, how will information technology rise to the challenge? Is this simply a big data problem, or a challenge to the very fabric of informatics research?

Diatom genomics: lessons from the cold
Location: D'Arcy Thompson room (School of Computing Sciences, UEA)
Date: 14.00-15.00 25 Nov 2011
Speaker: Dr. Thomas Mock
Organiser: Dr. Katharina Huber
Institution: School of Environmental Sciences, UEA
Abstract: Diatoms contribute 25% of global carbon fixation but their contribution to the carbon cycle is even higher in polar systems where they are the major primary producers in the ocean. None of the other phytoplankton groups have been so successful in occupying the polar marine environment characterized by overall low temperatures and strong seasonality of solar irradiance in addition to nutrient limitations such as limitation by iron in the Southern Ocean. However, our knowledge about the evolution and adaptation of polar diatoms is still in its infancy despite their significance for the whole polar ecosystem. Fragilariopsis cylindrus is regarded as a key diatom species found in the Arctic and Southern Ocean seawater and sea ice. F. cylindrus is obligate psychrophilic and the genus appears to have arisen about 25 million years ago, in agreement with the isolation of the Antarctic continent. The genome sequence of F. cylindrus recently became available and this talk will present some of the most significant and unanticipated findings based mainly on a comparative analysis to genomes of the temperate diatoms Phaeodactylum tricornutum and Thalassiosira pseudonana. Metatranscriptome and metagenome sequences from Southern Ocean phytoplankton communities facilitated the identification of those F. cylindrus genes most likely responsible for adaptation to polar environmental conditions.

Individual-based models and the study of microbial systems
Location: D'Arcy Thompson room (School of Computing Sciences, UEA)
Date: 14.00-15.00 11 Nov 2011
Speaker: Prof. Marta Ginovart
Organiser: Dr. Katharina Huber
Institution: Departament de Matemàtica Aplicada III Universitat Politècnica de Catalunya
Abstract: The choice of basic modelling approach to study a microbial system, population-level (top-down) or individual-based (bottom-up), is an important decision that needs to be addressed at the beginning of a modelling project.
There are arguments for and against either approach, and the right choice depends on project-specific aspects, the characteristics of the system to be simulated and the questions to be asked of the model. In general, arguments for the population-level approach are its simplicity, computational efficiency, that it has been used and tested widely, the availability of established and peer-reviewed modelling frameworks, and the ready relation to ecological theory.Arguments for the individual-based approach include the ability to simulate intra-population variability (population heterogeneity), complete life cycles, behaviour adapted to internal and external conditions, the ability to link mechanism at the individual level to population level behaviour, and the inapplicability of the continuum (the population response can be affected by the fate of a single individual).
The advances in molecular biology and biochemistry that have increased our knowledge of the inner workings of microbes and the better availability of individual-based observations at cell level have brought about an increase in the application of individual-based modelling to microbes. These discrete models can be a complementary tool to check the relations between theoretical assumptions and experimental observations.
INDISIM (INDividual DIScrete SIMulations) is an individual-based methodology developed by my research group to deal with microbial systems. Firstly, INDISIM was specifically designed for simulating the behaviour and growth of bacterial communities. It must be stated that the cellular model presented was very schematic. It might seem an extreme over-simplification of microorganisms, but even with such a simple model for each individual, various system-level behaviours were reproduced. Later, two adaptations of INDISIM were carried out to study:
i) yeast cultures and fermentations (INDISIM-YEAST) and, ii) the dynamics and evolution of C and N associated with organic matter in soils and microbial activity (INDISIM-SOM). Some representative simulation results will be shown to illustrate the way that these models work and their possibilities in research.

Constructing an unbiased mapping ...
Location: D'Arcy Thompson room (School of Computing Sciences, UEA)
Date: 13.00-14.00 4 Nov 2011
Speaker: Leonard B Hearne
Organiser: Dr. Katharina Huber
Institution: Department of Statistics, University of Missouri, USA
Abstract: Constructing an unbiased mapping from a high dimensional space to a high dimensional space using multivariate density estimates based on geometric methods.
The scientific question motivating this research is to understand the relationship between the phenotypic and genotypic spaces in maze. The phenotypic space is visually measurable and is approximately 15 dimensions, one for each trait we are including in our study. The genotypic space is approximately 150 dimensions, one for each gene or exon sequence that we believe may be of interest. Each observation in the phenotypic space has a corresponding point in the genotypic space. We can use this to construct a mapping between a density estimate on the phenotypic space and a density estimate on the genotypic space. Given a point in one of the spaces we can estimate the probability of seeing a point in the other space. This will allow us to understand the relationships between the genotypic and phenotypic spaces in a non-parametric and unbiased way.
In this talk I focus on a geometric density estimator. This estimator uses the Delaunay tessellation to partition the support for an estimator into tiles. The proportion of probabilistic mass from observations on each tile divided by the content of the tile is used to estimate the probability density on each tile. Under suitable regularity conditions, geometric density estimators can be shown to be unbiased and consistent multivariate estimators. The level of density specificity is proportional to the number of tiles in the tessellation. To refine a density estimate, re-sampling methods can be employed. This partitions the support for the estimator into successively smaller tiles. The resulting refined density estimate is biased. By allowing some probability mass to be allocated beyond the convex hull in a Delaunay tessellation, these refined density estimators can again be made unbiased and consistent. This work has application in a broad class of multivariate estimation settings like finance, genetics, and medicine where geometric methods can be employed.

The UEA sRNA Workbench Version 1.0
Location: D'Arcy Thompson room (School of Computing Sciences, UEA)
Date: 14.00-15.00 28 Oct 2011
Speaker: Dr. Matthew Stocks
Organiser: Dr. Katharina Huber
Institution: School of Computing Sciences, UEA
Abstract: RNA silencing is a complex, highly conserved, transcriptional and post-transcriptional mechanism for tuning gene expression. RNA silencing is mediated by small molecules known as small RNAs (sRNAs).
Recent advances in sequencing technologies have made it possible to capture a high-resolution snapshot of the complete sRNA content of an organism or tissue. Typically, this data is used for the identification of known classes of sRNAs or detecting correlations between sRNAs expression and gene expression in different tissues at different time points. This technology has also revealed new classes of sRNA and may yet allow the identification of more.
As sequencing technologies improve, they rapidly produce ever more sRNA reads in a shorter space of time. Many bioinformatics tools fail to scale with the size of the data. Therefore, the sRNA community has a need for user-friendly, robust tools capable of handling large datasets, in a reasonable time. This talk is focused on Version 1.0 the UEA sRNA Workbench, a suite of such tools which addresses this need.

SimGenex: A System for Concisely Specifying Simulation of Biological Processes and Experimentation
Location: D'Arcy Thompson Room (School of Computing Sciences)
Date: 14.00-15.00 14 Oct 2011
Speaker: Dr. Anyela Valentina Camargo-Rodriguez
Organiser: Dr. Katharina Huber
Institution: School of Computing Sciences, UEA
Abstract: Computational models enable advances in understanding essential features of living systems. Such models can be used to simulate data that can also be measured empirically. Generating such simulated data is frequently a key step in developing and validating models.However, precisely specifying a complex procedure of simulating data is notoriously difficult. The SimGenex language we are reporting is designed to simplify this task as it is applicable in research scenarios where several candidate models are considered, the mathematical details of regulatory interactions are only known partially or described semiquantitatively, the majority of kinetic parameters are not empirically measured, and a gene expression matrix is available as a basis of identifying the best model.
SimGenex: Enables succinct and flexible descriptions of simulating the biological processes and experimental procedures that are the building blocks of most current wet lab experimental protocols.
It enables specification of reproducibly executable workflows for validating computational models of biological systems, it facilitates pre-processing and transformation of data as it is frequently applied in gene expression data analysis and it provides support for comparing and discriminating alternative candidate models based on their ability to approximate the empirical dataset. The result of applying a SimGenex program to a computational model is a simulated dataset that can directly be compared to empirically measured omic data through the specification of a distance measure which can be used to discriminate the best model among a number of candidates.
This is joint work with Jan T. Kim

EMBO Multi-level Modelling of Morphogenesis
Location: G34/35 Watson & Crick Rooms, JIC Conference Centre
Date: 11.00-15.00 hours each day 19 Jul 2011 – 29 Jul 2011
Organiser: John Innes Centre
Institution: John Innes Centre
Materials: Programme (pdf 34 KB)

Midsummer Phylogenetics at UEA 2011
Location: D'Arcy Thompson Room, School of Computing Sciences, UEA
Date: 13.00-16.30 17 Jun 2011
Organiser: Dr. Katharina Huber and Dr. Sven Herrmann
Institution: UEA
Materials: Programme (pdf 28 KB)

Short Reads Assemblies for Large Genomes Seminar
Date: 15:30-17:00 8 Feb 2011
Speaker: Dr Mario Caccarno
Organiser: Dr Jo Dicks
Institution: John Innes Centre

Short reads assemblies for large genomes seminars (test)
Date: 15.30-17.00 8 Feb 2011
Speaker: Dr. Mario Caccarno
Organiser: Dr. Jo Dicks
Institution: JIC