As part of my Biochemistry degree at Oxford, I had to spend a year focusing on a single research project. My obsession with bioinformatics was already firmly established when Iain Campbell, a leading NMR spectroscopist and structural biologist, took me under his wing. At the time, structural biology was definitely the most computational area of molecular biology, so I was looking forward to getting stuck into a computational project.
|Type III fibronectin determined by NMR,
from I. Campbell’s group
It was great to be immersed in this technical world – the COSY and NOESY spectra, triggered by a series of radio pulses to link up atoms; the rather impressive cooling process, with liquid helium being poured into huge superconducting magnets sunk into the ground; the somewhat scary signs warning people with pacemakers to turn back (the magnetic fields are insanely strong in and around an NMR machine)…
I learned a lot about NMR and structural biology, from the technical aspects of chemical shifts, coupling constants, distance restraints, disordered regions and hydrophobic cores to the elegance of protein structures that manage to fold perfectly to do something so absolutely specific.
Seeing is believing
But … I had already heard the siren call of simpler, linear protein and DNA sequences, and there was this wonderful new institute going to be set up – the Sanger Centre – and I had a chance to work there on sequencing the human genome…
… fast forward 20 years, when I had the pleasure of sitting in the back of the room for the Protein Data Bank in Europe (PDBe) Scientific Advisory Board, now as Associate Director of the EBI. I was still just in awe of the incredible beauty and precision of protein structures, and the skills of structural biologists in uncovering their details.
|Cryo electron tomography of sensory cilia|
Some things had not changed in 20 years: dihedral angles are still important, transitions from ordered to disordered are still being explored and the methods are still extremely technically detailed. But other areas have progressed so much they are almost unrecognisable: the ability to look at larger complexes, with electron microscopy (EM) techniques – single-particle averaging and, even more impressive to me, electron tomography. Electron tomography allows you to reconstruct a single 3D sample to ~40 Å from images taken at a series of sample tilts – no crystal, no averaging, just for this particular sample – like a high-resolution 3D microscopy image. These are spine-tingling images.
So often we have to conceptualise and imagine what is going on in cells. Electron tomography is the closest thing I’ve seen to actually seeing molecular biology in action. One can see little ribosomes, microtubules and proteasomes and complex membrane-associated structures in a bacterial cell, in a single 3D volume.
The wealth of structural data has grown incredibly over the past decade. New techniques such as EM are constantly emerging and structural biology’s workhorse, X-ray crystallography, is continually being refined with better production and crystallisation techniques and tuneable high-energy X-rays from synchrotrons. Light microscopy has also improved vastly, with techniques such as super-resolution technology.
Integrating all the data being produced with these techniques to gain an overall view is an impressive task. It involves fitting X-ray structures into EM maps and then into tomograms, with NMR measurements to provide the dynamics at the atomic level and light microscopy to illuminate the dynamics at the complex and organelle level. There are still so many more protein structures to determine and integrate, and endless discoveries to be made.
Bringing structure to genomics?
All this progress is not just for the benefit of structural biologists. Gerard Kleywegt, who leads PDBe, has a passion for making this information accessible to the broader biology community. Molecular biologists, developmental biologists, geneticists and systems biologists can all make use (or more use…) of structural data.
All too often we forget that linear sequence shows only how information is encoded, not how it is used. The majority of things that happen outside of the nucleus, and certainly the vast majority of the “doing” of life, is executed by either proteins or RNAs folded up into specific structures and collaborating in specific complexes. We know a lot about these structures and complexes – 4,717 proteins (23% of protein coding genes) have at least one structure (many proteins have far more than one structure), and this accounts for 42% of residues in these proteins (around 11% of protein residues overall). When we expand this to things we can confidently model, this goes up.
I am sure that in my own research area – genomics – we’re not taking enough advantage of this information. We might think about structural biology as the final mechanistic determination of why one allele has an effect or not, but can we integrate structural information to make our statistical genetic tests more powerful? Can we use the collection protein structures of transcription factors (often bound to DNA) to help interpret DNaseI footprinting results? Or use protein-complex information to inform epistasis models, potentially at a residue/patch-of-protein, not just at the gene level?
Many fields use structural information in all sorts of ways but I am sure the integration of different structural techniques, and the integration of that structural information with other experiments and knowledge – chemistry, pathways, gene expression, proteomics – is going to be amazing.
Part of me wonders why I chose to stick with the “boring” world of linear, four-base DNA sequence some 20 years ago. I guess there’s always time to learn some new tricks…