An ambiguity principle for assigning protein structural domains

Ambiguity is the quality of being open to several interpretations. Database online For an image, it arises when the contained elements can be delimited in two or more distinct ways, which may cause confusion. Database high availability We postulate that it also applies to the analysis of protein three-dimensional structure, which consists in dividing the molecule into subunits called domains. Database utility Because different definitions of what constitutes a domain can be used to partition a given structure, the same protein may have different but equally valid domain annotations.

However, knowledge and experience generally displace our ability to accept more than one way to decompose the structure of an object—in this case, a protein. This human bias in structure analysis is particularly harmful because it leads to ignoring potential avenues of research. We present an automated method capable of producing multiple alternative decompositions of protein structure (web server and source code available at Our innovative algorithm assigns structural domains through the hierarchical merging of protein units, which are evolutionarily preserved substructures that describe protein architecture at an intermediate level, between domain and secondary structure. To validate the use of these protein units for decomposing protein structures into domains, we set up an extensive benchmark made of expert annotations of structural domains and including state-of-the-art domain parsing algorithms. The relevance of our "multipartitioning" approach is shown through numerous examples of applications covering protein function, evolution, folding, and structure prediction. Finally, we introduce a measure for the structural ambiguity of protein molecules.

Analysis is the process of separating a whole into its constituent parts to gain a better understanding of it. Cost of data recovery from hard drive Applied to the three-dimensional (3D) structure of proteins, it often consists in dividing a macromolecule into simpler yet informative subunits, called domains, which can be studied independently. S cerevisiae database Thus, investigating protein function, folding, or evolution often starts by delineating structural domains. Snl database This strategy also helps overcome challenges associated with structural studies of full-length proteins by molecular dynamics or de novo predictions. Database 4500 In addition, the classifications of protein structural domains are at the basis of every protein structure prediction method relying on fold recognition.

The idea of dividing protein structure into domains was introduced more than four decades ago by Wetlaufer ( 1), who defined protein domains as structurally compact and separate regions of the macromolecule. After this geometrical definition, many manual and automated methods for assigning structural domains have been based on additional criteria, such as folding autonomy, function, thermodynamic stability, or domain motions ( 2). As a result, many proteins are annotated differently from one domain database to another, depending on the methods and criteria used for structure partitioning ( 3). Paradoxically, although protein structure partitioning is a multiple-criteria problem—which, by its definition, can often accept more than one solution—different domain decompositions of the same protein are still considered to be mutually exclusive, rather than compatible or complementary. This issue inherent to human perception has been previously raised ( 4, 5) and continues to be a challenge ( 6), because it biases the analysis of protein molecules and restricts the number of avenues to explore, by not allowing more than one way to decompose their 3D structure. A domain partitioning based on a particular criterion, for example, geometry, may be useful for studying certain properties of the protein, such as function or dynamics, while being irrelevant regarding other characteristics, such as evolution or folding. This is well illustrated by the actin structure, which is divided into either two functional and evolutionary domains in the Structural Classification of Proteins (SCOP) ( 7) and Evolutionary Classification of Protein Domains (ECOD) ( 8) databases, or four domains, based on secondary structure elements, in the CATH (Class, Architecture, Topology, Homology) database ( Fig. 1A) ( 9). Moreover, the delineation into two domains made by the authors of the structure ( 10), who used spatial separation of the domains as a criterion, differs from the function-based partitioning in terms of boundaries.