1) Only handful of amino acids in a enzyme structures were highly conserved. (Out of hundreds, generally less than ten.)
2) Those were generally in the reaction center.
3) Almost all single sequence replacements had no measurable effect on protein structure and function.
4) Across species the "same" protein can diverge in sequence by up to 40%, while keeping the same structure. Sometimes this goes as far as 80%.
Given these basic facts, the findings in the paper aren't really surprising to anyone who studies proteins.
[Note: As with everything in biology, you can find counter examples. The histone proteins involved in DNA packing have an incredibly conserved sequence.]
- that structure is as/more important than sequence ?
- that "reaction centers" are what matter, and the rest is just "protection" ?
What do you mean by "reaction center" - surely not physically central within the folded structure (isn't it the surface shape that determines reactivity) ?
Structure is determined by sequence, so they are equally important. Structure is more conserved than sequence, mainly due to the physicochemical constraints that govern protein folding.
> that "reaction centers" are what matter, and the rest is just "protection"?
Sometimes not even protection. Many enzymes can have plenty of its sequence/structure removed and still be functional. Natural proteins carry lots of evolutionary cruft.
> What do you mean by "reaction center" - surely not physically central within the folded structure
I think they borrowed the term from photosystems/photosynthesis. But, to be more precise, what they actually meant is the active site of an enzyme; the location where the catalyzed reaction takes place.
> (isn't it the surface shape that determines reactivity) ?
Shape is not enough, the chemical nature of the amino acid residues involved is also important. A single mutation in a key catalytic residue will shut down the enzyme even if the shape stays the same.
1. The number of residues actively involved in catalysis might be small and 2. Most other residues can be safely replaced with something else either similar if part of the structure or anything if the side chain is pointing out on the surface.
However, the point the article is making is that for different functions the same basic folds seem to be used again and again.
Is that because the stable protein fold structural space is actually small ( due to the limited secondard structure patterns used etc ), or is that because evolution hasn't had time to to search the enormous available structural space?
ie is it a sampling problem or an instrinic property of protein space.
The fact that some of the ML approaches mentioned can now design completely novel folds suggests it is at least partially a sampling problem.
This to me isn't surprising - the idea that evolution is somehow complete and all possible solutions have already been explored seems to me to be unlikely - a lot of evolution happens via gene duplication and then gradual functional drift - which would favour reuse of existing folds over the generation of completely new ones.
That's a basic fact in bio. Check the rossman fold page for example: https://en.wikipedia.org/wiki/Rossmann_fold it's a template used for many functions.
Are there any folds and patterns that evolution evolution has not discovered that are also useful? I think Baker Group created a bunch of new folds. I'm not sure if they are as useful as the one discovered by Evolution. After all, Evolution had more compute power than us.
The most famous is the prion protein which can misfold in ways to cause a variety of contagious diseases. Like mad cow disease, chronic wasting disease, scrapie and in humans CJD and vCJD, fatal familial insomnia, Kuru, GSS.
Perhaps because misfoldings of the prion protein can convert others but why is it all affecting that same protein? Always baffled me why aren't other/many proteins suspitible to becoming a prion?
There are others we call "prionoid" because they can have shades of the catetrosphic misfolding prion can.
Our compute capacity isn't deployed to brute force Monte Carlo sims (mostly). So it's apples and oranges.
The thinking is that evolution created error correction for the critical proteins to account for mutations.
Fascinating stuff.
―François Jacob, “Evolution and Tinkering” (https://web.mit.edu/~tkonkle/www/BrainEvolution/Meeting9/Jac...)
https://pmc.ncbi.nlm.nih.gov/articles/PMC7072414/
Oh ok, I misremembered:
"This review has focused only on small fragments of fold space with examples given for folds generated from a single secondary structure string consisting of around ten SSEs. Even in this small corner, the number of possible folds, under the current constraints, is of the order of 1000"
Who knows what might be possible if you designed a cell from scratch - perhaps you could rework all the machinery to access other parts of fold space. After all, there are some weird and wonderful machines out there like the 'Vault' (https://en.wikipedia.org/wiki/Vault_(organelle)) that can fit whole proteins inside them. Possibly a different cage-like structure could help fold designed proteins into as-before unseen structures.
i did neuroscience for grad school, and i was always amazed by how often complex neural activity could be well represented by lower dimensional representations--clean manifolds, attractor dynamics, etc. i think, in general, biology (evolution) doesn't penalize against redundancy too hard (hence things like genetic drift, neutral theory of evolution, etc.).
anyway, super cool stuff. agree with you that probs more useful to explore the search space via 'less natural' structures, given how forgiving evolution is to redundancy. probs where the most information can be found
(note: there are bigger proteins, including ones so big you can see them with the naked eye (e.g. a hair) but they consists of multiple repeats of the same small building block. There are many such building blocks. And the very few exceptions to that are "not really" part of eukaryot cells, but of cell organelles that have their own DNA)
But even if you just take the first 4 amino acids, there's half a million possible combinations. Life uses less than 1000 of those.
In other words: DNA and evolution, even with billions of years to think about it, is really a bit of a beginner when it comes to protein design. Or at least, it is pretty obvious that it's possible to do A LOT better than natural selection.
Thinking more about the question of protein _length_ - I'm also not convinced that longer proteins (more than say 750aa) would produce more novel folds. Larger proteins tend to be multi-domain; that is, a longer chain will fold into multiple compact domains, each one a separate fold.
I suppose there could be 'megafolds' out there in fold space, beyond 1000aa - like a 12-bladed beta propeller, or a beta-helix with alpha helices on the outside or some other wacky thing. Whether that would substantially increase the numbers of total folds, I doubt, but that is of course a guess.
(ref - https://pmc.ncbi.nlm.nih.gov/articles/PMC10251718/ for protein lengths)
And really? Just any random sequence gets you a new fold. I mean, it won't be very useful if you pick a random one, but it'll work and be a new one.
I think this is just an artifact of natural selection basing new proteins on existing ones, not an actual useful ("rational" if you can call natural selection rational) selection limit. I don't think that if you designed proteins from first principles you'd see this limitation in your results.
The nice thing about stable folds, is that 'nearby' sequences in sequence space - as in, point mutations - are the same fold. If each sequence had a completely different fold, then mutation would be much more destructive. Surprisingly, however, sequences that are far apart in sequence space can also adopt the same fold (convergent evolution).
Note: the rest of the protein being so massive has the huge problem that it results in the chlorophyll protein being toxic (even to plants). Several angles of the protein reflect the light ... away from the energy collector (it has sections that are like putting a mirror above a solar panel). Also: it's extremely INefficient. Inefficiency gets solved "the DNA way" (or should I say the Zapp Brannigan way): it's efficiency sucks, but if I just use very extremely large armies of chloroplasts I can compensate for the inefficiency by stacking them ... This sounds totally insane but yes, it works. Oh and the exact right amount of inefficiency can warm op the plant, protecting it (a little bit) from ice ages.
Now I have my suspicions on why chlorophyll + chloroplasts won (it's not actually the only photosynthesis protein or system): it's because by tuning a few amino acids you can change the depth of the ion trap, and so switch to different metals to capture, changing the color (which plants do, even just to have a particular color). It's pretty easy to accidentally adapt to either different metals or different solar frequencies (ie. using natural selection). Plus there was no need to design chlorophyll: plants "stole" the design from bacteria. So it was incredibly cheap in terms of how much computation (ie. generations of plants) had to die to make it. Of course, for the place it was stolen from the length of the protein was a very important factor so the biggest of chlorophyll's advantages (1 big protein, 10 functions that would have required 5x more space in DNA with small proteins) don't actually matter to plants. So why did it win? It was on sale!
So it works. But there has got to be a simpler/better/non-toxic way to create an ion trap using proteins and make plants work better ... I get that part of the problem is that I'm an engineer, a scientist. If one needs a design to catch energy and warm up a plant, I'd expect to create one thing for catching energy, and one plant warmer, both efficient. So there's an expectation problem. But a single mechanism to mostly randomly warm plants and catch energy at the cost of absurd inefficiency (both in warming and in energy production) ... is just not a sane way to go about this problem.
> So it works. But there has got to be a simpler/better/non-toxic way to create an ion trap using proteins and make plants work better ...
I remember reading about designed minimal photosystem-like systems. I cannot find the actual paper now, though.
I like how you say evolution is able to think when in reality it's just a mysterious function of variation, selection, and time.
It's all so complex, and our verbs that more literally describe the billions of nanosecond operations going on in the cells feel inadequate. "When a protein molecule in an appropriate folded shape and orientation happens to be bounced by kinetic energy into the attractive region of a corresponding protease..." versus "The protease grabs the protein and cuts it into..."
But then, this thead is all about proteins incorporating structural cliches, isn't it?
Now it's the Unreasonable Effectiveness of "The Unreasonable Effectiveness of X".
It seems like "X is All You Need" is All You Need.
[1]: https://web.archive.org/web/20090320002214/http://www.ecn.pu...
It seems to have originated with Eugene Wigner's 1960 "The Unreasonable Effectiveness of Mathematics in the Natural Sciences".