The role of a protein depends heavily on its 3D shape, which is composed of semi-independent three-dimensional blocks called domains. Domains fold independently and constitute units of evolution. Most proteins contain multiple domains that are associated with a particular functions; moreover, the same domain can be found in different proteins. Automated recognition of domains can make prediction of proteins function easier and can support the analysis of proteins. Here, we propose a novel algorithm designed for domain recognition by identification of domain boundaries in the protein structure. The proposed algorithm uses a contact graph and an iterative approach to find meaningful clusters corresponding to the protein domains. The distinctive feature of the method is its effective complexity, that improves over other well-known methods, while holding a comparable level of correct domain assignments.
Accepté le :
DOI : 10.1051/ro/2015040
Mots-clés : Graph theory, computational biology, protein structure
@article{RO_2016__50_2_363_0, author = {Milostan, Maciej and Lukasiak, Piotr}, title = {DomGen-Graph based method for protein domain delineation}, journal = {RAIRO - Operations Research - Recherche Op\'erationnelle}, pages = {363--374}, publisher = {EDP-Sciences}, volume = {50}, number = {2}, year = {2016}, doi = {10.1051/ro/2015040}, mrnumber = {3479876}, language = {en}, url = {http://archive.numdam.org/articles/10.1051/ro/2015040/} }
TY - JOUR AU - Milostan, Maciej AU - Lukasiak, Piotr TI - DomGen-Graph based method for protein domain delineation JO - RAIRO - Operations Research - Recherche Opérationnelle PY - 2016 SP - 363 EP - 374 VL - 50 IS - 2 PB - EDP-Sciences UR - http://archive.numdam.org/articles/10.1051/ro/2015040/ DO - 10.1051/ro/2015040 LA - en ID - RO_2016__50_2_363_0 ER -
%0 Journal Article %A Milostan, Maciej %A Lukasiak, Piotr %T DomGen-Graph based method for protein domain delineation %J RAIRO - Operations Research - Recherche Opérationnelle %D 2016 %P 363-374 %V 50 %N 2 %I EDP-Sciences %U http://archive.numdam.org/articles/10.1051/ro/2015040/ %R 10.1051/ro/2015040 %G en %F RO_2016__50_2_363_0
Milostan, Maciej; Lukasiak, Piotr. DomGen-Graph based method for protein domain delineation. RAIRO - Operations Research - Recherche Opérationnelle, Special issue: Research on Optimization and Graph Theory dedicated to COSI 2013 / Special issue: Recent Advances in Operations Research in Computational Biology, Bioinformatics and Medicine, Tome 50 (2016) no. 2, pp. 363-374. doi : 10.1051/ro/2015040. http://archive.numdam.org/articles/10.1051/ro/2015040/
Scop database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 32 (2004) D226–D229. | DOI
, , , , and ,Domanspattern based method for protein domain boundaries prediction and analysis. Found. Comput. Decision Sci. 36 (2011) 99.
, , , , and ,The protein data bank. Nucleic Acids Res. 28 (2000) 235–242. | DOI
, , , , , , and ,Complexity issues in computational biology. Fund. Inform. 118 (2012) 385–401. | MR | Zbl
and ,Predicting secondary structures of proteins. IEEE Eng. Med. Biol. Mag. 24 (2005) 88–94. | DOI
, and ,Application of tabu search strategy for finding low energy structure of protein. Computational Intelligence Techniques in Bioinformatics. Artif. Intell. Med. 35 (2005) 135–145. | DOI
, and ,New machine learning methods for prediction of protein secondary structures. Control and Cybernet. 36 (2007) 183–201.
, and ,A threading-based method (findsite) for ligand-binding site prediction and functional annotation. Proc. Natl. Acad. Sci. USA 105 (2008) 129–34. | DOI
and ,A novel method to compare protein structures using local descriptors. BMC Bioinform. 12 (2011) 344. | DOI
and ,I. Dhillon, Y. Guan and B. Kulis, A fast kernel-based multilevel algorithm for graph clustering. In Proc. of the eleventh ACM SIGKDD international conference on Knowledge discovery in Data Mining. ACM (2005) 629–634.
An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30 (2002) 1575–1584. | DOI
, and ,Assessment of domain boundary predictions and the prediction of intramolecular contacts in casp8. Proteins: Structure, Function and Bioinform. 77 (2009) 196–209. | DOI
, , and ,L.R. Ford and D.R. Fulkerson, Flows in Networks, Vol. 1962. Princeton Princeton University Press (1962). | MR | Zbl
G-pas 2.0–an improved version of protein alignment tool with an efficient backtracking routine on multiple gpus. Bull. Pol. Acad. Sci.: Tech. Sci. 60 (2012) 491–494.
, , and ,G-dna–a highly efficient multi-gpu/mpi tool for aligning nucleotide reads. Bull. Pol. Acad. Sci.: Tech. Sci. 61 (2013) 989–992.
, , , and ,Improving the performance of domainparser for structural domain partition using neural network. Nucleic Acids Res. 31 (2003) 944–952. | DOI
, , and ,A systematic comparison of protein structure classifications: Scop, cath and fssp. Structure 7 (1999) 1099–1112. | DOI
and ,Protein structure comparison by alignment of distance matrices. J. Molec. Biol. 233 (1993) 123–138. | DOI
and ,The fssp database of structurally aligned protein fold families. Nucleic Acids Res. 22 (1994) 3600–9.
and ,Parser for protein folding units. Proteins: Structure, Function and Bioinform. 19 (1994) 256–268.
and ,Mapping the protein universe. Science 273 (1996) 595–602. | DOI
and ,A novel approach to fold recognition using sequence-derived properties from sets of structurally similar local fragments of proteins. Bioinform. 19 (2003) ii81–ii91. | DOI
, , and ,Domain assignment for protein structures using a consensus approach: characterization and analysis. Protein Sci. 7 (1998) 233–242. | DOI
, , , , and ,Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22 (1983) 2577–2637. | DOI
and ,Web and grid technologies in bioinformatics, computational and systems biology: A review. Curr. Bioinform. 3 (2008) 10–31. | DOI
, , , and ,Sequence-based prediction of protein domains. Nucleic Acids Res. 32 (2004) 3522–3530. | DOI
and ,Some operations research methods for analyzing protein sequences and structures. Ann. Oper. Res. 175 (2010) 9–35. | DOI | MR | Zbl
, and ,Scop: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247 (1995) 536–540. | DOI
, , and ,Spectral methods for graph clustering–a survey. Eur. J. Oper. Res. 211 (2011) 221–231. | DOI | MR | Zbl
and ,Protein-binding site prediction based on three-dimensional protein modeling. Proteins: Structure, Function, and Bioinform. 77 (2009) 152–156. | DOI
, and ,The cath domain structure database and related resources gene3d and dhs provide comprehensive domain family information for genome analysis. Nucleic Acids Res. 33 (2004) D247–D251. | DOI
, , , , , , , , , , , , , , , , , , , , and ,The cath domain structure database and related resources gene3d and dhs provide comprehensive domain family information for genome analysis. Nucleic Acids Res. 33 (2005) D247–D251. | DOI
, et al.Computing communities in large networks using random walks. J. Graph Algorithms Appl. 10 (2006) 191–218. | DOI | MR | Zbl
and ,Graph clustering. Comput. Sci. Rev. 1 (2007) 27–64. | DOI | Zbl
,Assessment of ligand-binding residue predictions in casp9. Proteins: Structure, Function, and Bioinform. 79 (2011) 126–136. | DOI
, , , and ,Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Sci. 4 (1995) 872–884. | DOI
and ,A procedure for detecting structural domains in proteins. Protein science: a Publication of the Protein Society 4 (1995) 103. | DOI
,The orderly colored longest path problem – a survey of applications and new algorithms. RAIRO: RO 48 (2014) 25–51. | DOI | Numdam | MR | Zbl
, , and ,Protein structure: insights from graph theory. J. Theoret. Comput. Chem. 1 (2002) 187–211. | DOI
, and ,Multi-agent model of hepatitis c virus infection. Art. Intell. Med. 60 (2014) 123–131. | DOI
, , and ,Identification of structural domains in proteins by a graph heuristic. Proteins: Structure, Function, and Bioinform. 35 (1999) 338–352. | DOI
, and ,Protein domain decomposition using a graph-theoretic approach. Bioinform. 16 (2000) 1091–1104. | DOI
, and ,Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 100 (1971) 68–86. | DOI | Zbl
,I-tasser server for protein 3d structure prediction. BMC Bioinform. 9 (2008) 40. | DOI
,Protein structure prediction: when is it useful? Curr. Opin. Struct. Biol. 19 (2009) 145–155. | DOI
,Minimum spanning tree based split-and-merge: A hierarchical clustering method. Inform. Sci. 181 (2011) 3397–3410. | DOI
, and ,Cité par Sources :