The BioComputing group develops computational methods to explore the large volumes of data generated by our projects and disseminate the results to the wider scientific community.
In order for our data to be informative, it is essential that it is effectively assessed and managed before being made publicly available. We have three main groups that coordinate this, who focus on analysing our mouse phenotyping data, applications of computational biology and systems imaging. Together, they work to ensure that all data made available is of high quality and presented in a form that can be used by our own research programmes and external groups for further research.
We have also developed and maintain a bespoke Laboratory Information Management System (LIMS), AnonyMus, to effectively handle the large quantities of data we produce.
Mouse phenotyping data
Our two main large-scale projects, the Harwell Ageing Screen and IMPC, produce extremely large mouse phenotyping datasets which need to be integrated, analysed and compared. MRC Harwell is the IMPC data coordination centre and is therefore responsible for coordinating the acquisition, analysis and visualisation of data from all collaborators in the IMPC. All data is collected, annotated, validated and quality controlled before it undergoes rigorous statistical analysis. This requires extensive standardisation, and we have refined a set of standard operating procedures (SOPs) first developed in the pilot project EUMODIC to achieve this.
We continue to manage this process through the Mouse Phenotyping Informatics Infrastructure (MPI2) consortium, which consists of MRC Harwell, the European Bioinformatics Institute (EBI) and the Wellcome Trust Sanger Institute (WTSI). In addition, we manage the capture and analysis of data from the Harwell Ageing Screen via the Mary Lyon Centre’s AnonyMus system. The IMPC data is disseminated to the wider scientific community through the IMPC web portal, for which we have developed heatmap and phenoview tools to display interesting phenotypes for each mouse. All data from the Harwell Aging Screen is made available on MouseBook.
We are now in the process of developing tools to integrate and compare our mouse phenotype data with human data, which could be used to generate hypotheses for genetic or clinical studies. This involves working closely with experts in specific diseases, including those in our MRC Mouse Networks, who provide a rich source of additional knowledge and information. In this manner, our data could provide unprecedented quantities of information that could be used to predict new therapeutic targets in patients.
The computational biology group analyse data from the Harwell Ageing Screen using a variety of computational approaches, including next generation sequencing (NGS) and molecular modelling techniques. We work alongside the research programmes to uncover key findings in their data.
ENU mutagenesis creates random mutations all across the genome. This means that, after an unusual phenotype is discovered in the ageing screen, we are faced with the task of locating the causative mutation. Traditional sequencing techniques require extensive breeding programmes over multiple generations, and locating a single ENU mutation by positional cloning can take three or four years. By contrast, NGS enables 100% of single nucleotide polymorphism (SNP) mutations in the genome of a mouse to be located in just 2 months. We therefore use NGS to identify which genes or non-coding regions the mutations lie within, so that a link can begin to be made to connect the mutation with the observed phenotype.
One of our major projects as been to compare the SNPs inherent in the mouse strains that we use, including C57BL/6N and C57BL/6J, in order to determine which mutations these mice already have. This has allowed us to identify phenotypes which these mice are more at risk of developing, so that these are not mistakenly interpreted as the result of ENU mutations. Another element of our NGS work is transcriptomics, where we sequence mRNA products to analyse the gene expression levels in the mutant mouse in comparison with wild-type mice lacking the mutation. We have, for example, worked with Pat Nolan’s Neurobehavioural genetics group to determine expression levels in the mutant mouse Short Circuit, and with Andy Greenfield’s Disorders of sex development group to determine levels in the knockout mouse Gabba45. We use molecular modelling to further investigate how a specific mutation in a gene affects the function of the protein. This involves modelling the protein structure, interactions and ability to bind ligands, which can reveal the complexes and mechanisms it is involved in.
In the future, we intend to create tools to explore epigenomics and CRISPR/Cas9 data. We have already begun this process, investigating the methylation patterns in one mutant, and are looking to expand on this. We intend to create tools for the CRISPR/cas9 project to locate the CRISPR construct and reduce off-target effects, and are investigating the possibility of introducing mutations associated with human disease to study the effect of these mutations and the resultant traits.
We analyse data from cutting-edge 2D and 3D imaging techniques to support MRC Harwell research programmes, external collaborations, and the IMPC. Our eventual aim is to incorporate imaging techniques (tracking, segmentation, registration, shape analysis) into our programmes and large-scale projects to cover all aspects of mouse phenotyping and developmental biology.
MRC Harwell generates large quantities of embryonic image data for the IMPC, with every embryonic lethal homozygous knockout mouse strain imaged at either stage E9.5, E14.5, or E18.5 to provide a record of any developmental defects in the embryo. We support the management and processing of 3D images taken by Micro CT (µCT) and Optical Projection Tomography as part of our role in MPI2. We aim to implement an automated pipeline for processing these images, using open source software and our own custom-made applications. Part of this will involve creating a Harwell E14.5 Atlas, an average embryo from a combined collection of embryo scans, which will then be segmented into an ‘atlas’ of specific organs that can be highlighted and their size measured. Knockout mouse embryos will be compared against this average E14.5 mouse, with the ability to overlay the two images and use a variety of displays to highlight different phenotypes.
We also assist with the analysis of images from both our own internal research programmes and external collaborators. We provide training and employ our own expertise to highlight aspects of particular interest to the researchers. For example, we have measured the volume of brain ventricles for the Neurobehavioural genetics group and developed methods to count cilia for the Cilia, development and disease group.
Left to right: A wild-type fourteen and a half day old embryo mice imaged (by Genome Engineering) with µCT showing the exterior of the embryo. The same mouse but with transparency enabled in the rendering to reveal the internal organs and structure. An “average” fourteen and a half day old embryo, where multiple specimens have been registered together to create a combined dataset. A false-coloured image to highlight gradients (changes in intensity) within the average embryo. An iso-surface rendering of the average embryo where the liver has been segmented.