一切都在代码中，蛋白质生产效率可以通过基因序列预测 It's all in the code: Protein production efficiency can be predicted by gene sequence

　

一切都在代码中：蛋白质生产效率可以通过基因序列预测

It's all in the code: Protein production efficiency can be predicted by gene sequence

　

　

科学日报 2019年2月4日

　

Instituto NacionaldeCiênciaeTecnologia de Biologia Estrutural e Bioimagem（INBEB）

摘要：

科学家探索了mRNA和蛋白质公共数据库，以揭示遗传密码的隐含意义。使用源自mRNA密码子组成的度量，他们发现基因序列选择如何预测蛋白质合成的不同方面，例如蛋白质生产效率。该研究有助于开发基因和蛋白质的新生物技术应用。

今天，数以千计的具有生物数据的数据库是公开的。它们包括基因和蛋白质序列的数据以及不同细胞参数的详细测量，例如在各种实验条件下由给定细胞产生和降解的所有蛋白质的确切量。巴西研究人员探索了mRNA和蛋白质公共数据库，并发现基因序列选择如何预测蛋白质合成的不同方面，如蛋白质生产效率。该研究发表在Nucleic Acids Research上，可以帮助开发基因和蛋白质的新生物技术应用。

以DNA形式包含在细胞核中的遗传信息被复制在信使RNA（mRNA）中。与DNA不同，mRNA是动态和不稳定的分子，离开细胞核并被核糖体翻译，这些分子机器能够将一系列核苷酸转化为能够将RNA（和DNA）转化为形成蛋白质的氨基酸序列。每个氨基酸对应于3个核苷酸 - 或密码子的一种或多种组合。因为相同的氨基酸可以从不同的密码子翻译，遗传密码被描述为简并（或冗余）。

科学家们已经知道，尽管可以从替代基因序列中产生相同的蛋白质，但某些组合可以产生更高的蛋白质产量。他们还知道最佳密码子和非最佳密码子可分别减少或增加mRNA降解。不同的研究组测量了mRNA的产生和降解速率，但令人惊讶的是，数据存在许多偏差。

巴西科学家合成了明显不同的数据，并扩展了我们对基因序列选择如何预测蛋白质合成的不同方面的知识，例如mRNA稳定性和生产效率。由里约热内卢联邦大学的Fernando Palhano和Tatiana Domitrovic领导的一个研究小组使用来自mRNA密码子组成的度量来比较现有数据与不同的细胞参数。他们发现该指标与蛋白质丰度和蛋白质生产效率相关，表明最连贯的mRNA衰变数据集。他们的研究重申，mRNA降解在某种程度上与蛋白质生产效率有关。 “甚至在特定条件下需要高水平的蛋白质，例如应激反应，其基因序列也被优化用于有效翻译，”Fernando Palhano说。

Fernando和Tatiana与Rodolfo Carneiro和其他同事合作，他们鉴定了一组由非最佳密码子编码的低丰度蛋白质。正如他们在Nucleic Acids Research发表的论文中所表明的那样，密码子的选择不仅对保证高蛋白质产量至关重要，而且对于调节应以最小量产生的蛋白质（如调节蛋白质）的产量至关重要。

细胞中产生的蛋白质含量对于维持机体功能至关重要 - “许多人类疾病是由蛋白质生成效率低下或不平衡引起的，如囊性纤维化和癌症，”塔蒂亚娜说。她补充说，“从实际角度来看，理解基因序列和蛋白质生产之间的关系可以对医学和生物工程产生深远的影响。”

作者指出，许多“沉默的”DNA突变，即改变密码子序列但不改变编码氨基酸的突变，可导致蛋白质产生率的显着改变，这可能导致疾病。通过仔细选择基因序列，可以精细调整蛋白质产量，促进基因和蛋白质的生物技术应用。

　

延伸阅读

地球上的所有现代生命都使用三种不同类型的生物分子，每种生物分子都在细胞中起关键作用。蛋白质是细胞的主力，具有多种催化和结构作用，而核酸，DNA和RNA携带的遗传信息可以从一代遗传到下一代。

RNA代表核糖核酸，是由一个或多个核苷酸组成的聚合分子。 RNA链可以被认为是在每个链节处具有核苷酸的链。每个核苷酸由碱（腺嘌呤，胞嘧啶，鸟嘌呤和尿嘧啶，通常缩写为A，C，G和U），核糖和磷酸盐组成。

RNA核苷酸的结构与DNA核苷酸的结构非常相似，主要区别在于RNA中的核糖主链具有DNA不具有的羟基（-OH）基团。这给DNA起了名字：DNA代表脱氧核糖核酸。另一个细微差别是DNA使用碱基胸腺嘧啶（T）代替尿嘧啶（U）。尽管结构相似，但DNA和RNA在现代细胞中的作用却截然不同。

RNA在从DNA到蛋白质的途径中起着重要作用，被称为分子生物学的“中心法则”。有机体的遗传信息被编码为细胞DNA中碱基的线性序列。在称为转录的过程中，制备DNA片段或信使RNA（mRNA）的RNA拷贝。然后可以通过核糖体读取该RNA链以形成蛋白质。 RNAs在蛋白质合成中也起重要作用，如核酶部分以及基因调控中所述。

DNA和RNA之间的另一个主要区别是DNA通常在细胞中以双链形式存在，而RNA通常以单链形式存在，如上图所示。缺乏配对链允许RNA折叠成复杂的三维结构。 RNA折叠通常由在DNA中发现的相同类型的碱基相互作用介导，不同之处在于在RNA的情况下在单链内形成键，而在DNA的情况下在两条链之间形成键。

　

It's all in the code: Protein production efficiency can be predicted by gene sequence

Date:

February 4, 2019

Source:

Instituto Nacional de Ciência e Tecnologia de Biologia Estrutural e Bioimagem (INBEB)

Summary:

Scientists explored mRNA and protein public databases to unravel hidden meanings of the genetic code. Using a metric derived from mRNA codon composition, they found out how gene sequence choice can predict different aspects of protein synthesis, such as protein production efficiency. The study could help the development of new biotechnological applications of genes and proteins.

Scientists explored mRNA and protein public databases to unravel hidden meanings of the genetic code. Using a metric derived from mRNA codon composition, they found out how gene sequence choice can predict different aspects of protein synthesis, such as protein production efficiency. The study could help the development of new biotechnological applications of genes and proteins.

Today, thousands of databases with biological data are publicly available. They include data on gene and protein sequences and detailed measurements of different cellular parameters, such as the exact quantities of all proteins produced and degraded by a given cell in various experimental conditions. Brazilian researchers explored mRNA and protein public databases and found out how gene sequence choice can predict different aspects of protein synthesis, such as protein production efficiency. The study, published in Nucleic Acids Research, could help the development of new biotechnological applications of genes and proteins.

The genetic information contained in the cell nucleus in the form of DNA is copied in messenger RNAs (mRNAs). Different from the DNA, mRNAs are dynamic and unstable molecules that leave the nucleus and are translated by the ribosomes, the molecular machines able to convert a sequence of nucleotides that make RNA (and DNA) into a sequence of amino acids that form proteins. Each amino acid corresponds to one or more combinations of 3 nucleotides -- or codon. Because the same amino acid can be translated from different codons, the genetic code is described as degenerate (or redundant).

Scientists already know that even though the same protein can be produced from alternative gene sequences, some combinations result in higher protein yields. They also know that optimal codons and non-optimal codons can decrease or enhance mRNA degradation, respectively. Different groups have measured mRNA production and degradation rates, but, surprisingly, there are many deviations in the data.

Brazilian scientists synthesized apparently disparate pieces of data and extended our knowledge of how gene sequence choice can predict different aspects of protein synthesis, such as mRNA stability and production efficiency. A research group led by Fernando Palhano and Tatiana Domitrovic at the Federal University of Rio de Janeiro used a metric derived from mRNA codon composition to compare the existing data to different cellular parameters. They found that this metric correlated well with protein abundance and protein production efficiency, indicating the most coherent mRNA decay datasets. Their work reiterated that mRNA degradation is somehow connected to protein production efficiency. "Even proteins needed in high levels under specific conditions, such as stress response, have their gene sequence optimized for efficient translation," says Fernando Palhano.

Fernando and Tatiana worked with Rodolfo Carneiro and other colleagues, who identified a group of low abundance proteins coded by a non-optimal subset of codons. As they show in their paper published in Nucleic Acids Research, codon choice is vital not only to guarantee high protein production but also to tune down the output of proteins that should be produced in minimum amounts, such as regulatory proteins.

The amount of protein produced in a cell is crucial to maintaining the organism function -- "Many human diseases are caused by inefficient or unbalanced protein production, such as cystic fibrosis and cancer," says Tatiana. She adds that "from a practical perspective, understanding the relationship between the genetic sequence and protein production can have a profound effect both on medicine and bioengineering."

The authors note that many "silent" DNA mutations, that is, mutations that alter the codon sequence, but not the coded amino acid, can lead to significant modifications on protein production rates, which could lead to disease. By carefully selecting the gene sequence one can finely tune the protein production and boost biotechnological applications of genes and proteins.

It's all in the code: Protein production efficiency can be predicted by gene sequence -- ScienceDaily https://www.sciencedaily.com/releases/2019/02/190204140624.htm

Journal Reference:

Rodolfo L Carneiro, Rodrigo D Requião, Silvana Rossetto, Tatiana Domitrovic, Fernando L Palhano. Codon stabilization coefficient as a metric to gain insights into mRNA stability and codon bias and their relationships with translation. Nucleic Acids Research, 2019; DOI: 10.1093/nar/gkz033

　

　

What is RNA?

　

All modern life on Earth uses three different types of biological molecules that each serve critical functions in the cell. Proteins are the workhorse of the cell and carry out diverse catalytic and structural roles, while the nucleic acids, DNA and RNA, carry the genetic information that can be inherited from one generation to the next.

RNA, which stands for ribonucleic acid, is a polymeric molecule made up of one or more nucleotides. A strand of RNA can be thought of as a chain with a nucleotide at each chain link. Each nucleotide is made up of a base (adenine, cytosine, guanine, and uracil, typically abbreviated as A, C, G and U), a ribose sugar, and a phosphate.

The structure of RNA nucleotides is very similar to that of DNA nucleotides, with the main difference being that the ribose sugar backbone in RNA has a hydroxyl (-OH) group that DNA does not. This gives DNA its name: DNA stands for deoxyribonucleic acid. Another minor difference is that DNA uses the base thymine (T) in place of uracil (U). Despite great structural similarities, DNA and RNA play very different roles from one another in modern cells.

RNA plays a central role in the pathway from DNA to proteins, known as the "Central Dogma" of molecular biology. An organism's genetic information is encoded as a linear sequence of bases in the cell's DNA. During the process known as transcription, a RNA copy of a segment of DNA, or messenger RNA (mRNA), is made. This strand of RNA can then be read by a ribosome to form a protein. RNAs also play important roles in protein synthesis, as will be discussed in the ribozyme section, as well as in gene regulation.

Another major difference between DNA and RNA is that DNA is usually found in a double-stranded form in cells, while RNA is typically found in a single-stranded form, as shown in the illustration above. The lack of a paired strand allows RNA to fold into complex, three-dimensional structures. RNA folding is typically mediated by the same type of base-base interactions that are found in DNA, with the difference being that bonds are formed within a single strand in the case of RNA, rather than between two strands, in the case of DNA.