DomGen-Graph based method for protein domain delineation

Milostan, Maciej; Lukasiak, Piotr

doi:10.1051/ro/2015040

Milostan, Maciej ¹ ; Lukasiak, Piotr ^1,²

¹ Poznan University of Technology, Institute of Computing Science, ul. Piotrowo 2, 60-965 Poznan, Poland
² Institute of Bioorganic Chemistry, Polish Academy of Sciences, ul. Noskowskiego 12/14, 61-704 Poznan, Poland

RAIRO - Operations Research - Recherche Opérationnelle, Tome 50 (2016) no. 2, pp. 363-374.

Résumé

The role of a protein depends heavily on its 3D shape, which is composed of semi-independent three-dimensional blocks called domains. Domains fold independently and constitute units of evolution. Most proteins contain multiple domains that are associated with a particular functions; moreover, the same domain can be found in different proteins. Automated recognition of domains can make prediction of proteins function easier and can support the analysis of proteins. Here, we propose a novel algorithm designed for domain recognition by identification of domain boundaries in the protein structure. The proposed algorithm uses a contact graph and an iterative approach to find meaningful clusters corresponding to the protein domains. The distinctive feature of the method is its effective complexity, that improves over other well-known methods, while holding a comparable level of correct domain assignments.

Reçu le : 2015-09-08
Accepté le : 2015-09-21

MR | 1 citation dans Numdam

DOI : 10.1051/ro/2015040

Classification : 68R10, 92-08
Mots clés : Graph theory, computational biology, protein structure

Affiliations des auteurs :

Milostan, Maciej ¹ ; Lukasiak, Piotr ^{1, 2}

@article{RO_2016__50_2_363_0,
     author = {Milostan, Maciej and Lukasiak, Piotr},
     title = {DomGen-Graph based method for protein domain delineation},
     journal = {RAIRO - Operations Research - Recherche Op\'erationnelle},
     pages = {363--374},
     publisher = {EDP-Sciences},
     volume = {50},
     number = {2},
     year = {2016},
     doi = {10.1051/ro/2015040},
     mrnumber = {3479876},
     language = {en},
     url = {http://archive.numdam.org/articles/10.1051/ro/2015040/}
}

TY  - JOUR
AU  - Milostan, Maciej
AU  - Lukasiak, Piotr
TI  - DomGen-Graph based method for protein domain delineation
JO  - RAIRO - Operations Research - Recherche Opérationnelle
PY  - 2016
SP  - 363
EP  - 374
VL  - 50
IS  - 2
PB  - EDP-Sciences
UR  - http://archive.numdam.org/articles/10.1051/ro/2015040/
DO  - 10.1051/ro/2015040
LA  - en
ID  - RO_2016__50_2_363_0
ER  -

%0 Journal Article
%A Milostan, Maciej
%A Lukasiak, Piotr
%T DomGen-Graph based method for protein domain delineation
%J RAIRO - Operations Research - Recherche Opérationnelle
%D 2016
%P 363-374
%V 50
%N 2
%I EDP-Sciences
%U http://archive.numdam.org/articles/10.1051/ro/2015040/
%R 10.1051/ro/2015040
%G en
%F RO_2016__50_2_363_0

Milostan, Maciej; Lukasiak, Piotr. DomGen-Graph based method for protein domain delineation. RAIRO - Operations Research - Recherche Opérationnelle, Tome 50 (2016) no. 2, pp. 363-374. doi : 10.1051/ro/2015040. http://archive.numdam.org/articles/10.1051/ro/2015040/

Bibliographie
Cité par

A. Andreeva, D. Howorth, S.E. Brenner, T.J.P. Hubbard, C. Chothia and A.G. Murzin, Scop database in 2004: refinements integrate structure and sequence family data. Nucleic Acids Res. 32 (2004) D226–D229. | DOI

M. Antczak, J. Blazewicz, P. Lukasiak, M. Milostan, N. Krasnogor and G. Palik, Domanspattern based method for protein domain boundaries prediction and analysis. Found. Comput. Decision Sci. 36 (2011) 99.

H.M. Berman, J. Westbrook, Z. Feng, G. Gilliland, T.N. Bhat, H. Weissig, I.N. Shindyalov and P.E. Bourne, The protein data bank. Nucleic Acids Res. 28 (2000) 235–242. | DOI

J. Blazewicz and M. Kasprzak, Complexity issues in computational biology. Fund. Inform. 118 (2012) 385–401. | MR | Zbl

J. Blazewicz, P.L. Hammer and P. Lukasiak, Predicting secondary structures of proteins. IEEE Eng. Med. Biol. Mag. 24 (2005) 88–94. | DOI

J. Blazewicz, P. Lukasiak and M. Milostan, Application of tabu search strategy for finding low energy structure of protein. Computational Intelligence Techniques in Bioinformatics. Artif. Intell. Med. 35 (2005) 135–145. | DOI

J. Blazewicz, P. Lukasiak and S. Wilk, New machine learning methods for prediction of protein secondary structures. Control and Cybernet. 36 (2007) 183–201.

M. Brylinski and J. Skolnick, A threading-based method (findsite) for ligand-binding site prediction and functional annotation. Proc. Natl. Acad. Sci. USA 105 (2008) 129–34. | DOI

P. Daniluk and B. Lesyng, A novel method to compare protein structures using local descriptors. BMC Bioinform. 12 (2011) 344. | DOI

I. Dhillon, Y. Guan and B. Kulis, A fast kernel-based multilevel algorithm for graph clustering. In Proc. of the eleventh ACM SIGKDD international conference on Knowledge discovery in Data Mining. ACM (2005) 629–634.

A.J. Enright, S. Van Dongen and C.A. Ouzounis, An efficient algorithm for large-scale detection of protein families. Nucleic Acids Res. 30 (2002) 1575–1584. | DOI

I. Ezkurdia, O. Grana, J.M.G. Izarzugaza and M.L. Tress, Assessment of domain boundary predictions and the prediction of intramolecular contacts in casp8. Proteins: Structure, Function and Bioinform. 77 (2009) 196–209. | DOI

L.R. Ford and D.R. Fulkerson, Flows in Networks, Vol. 1962. Princeton Princeton University Press (1962). | MR | Zbl

W. Frohmberg, M. Kierzynka, J. Blazewicz and P. Wojciechowski, G-pas 2.0–an improved version of protein alignment tool with an efficient backtracking routine on multiple gpus. Bull. Pol. Acad. Sci.: Tech. Sci. 60 (2012) 491–494.

W. Frohmberg, M. Kierzynka, J. Blazewicz, P. Gawron and P. Wojciechowski, G-dna–a highly efficient multi-gpu/mpi tool for aligning nucleotide reads. Bull. Pol. Acad. Sci.: Tech. Sci. 61 (2013) 989–992.

J. Guo, D. Xu, D. Kim and Y. Xu, Improving the performance of domainparser for structural domain partition using neural network. Nucleic Acids Res. 31 (2003) 944–952. | DOI

C. Hadley and D.T. Jones, A systematic comparison of protein structure classifications: Scop, cath and fssp. Structure 7 (1999) 1099–1112. | DOI

L. Holm and C. Sander, Protein structure comparison by alignment of distance matrices. J. Molec. Biol. 233 (1993) 123–138. | DOI

L. Holm and C. Sander, The fssp database of structurally aligned protein fold families. Nucleic Acids Res. 22 (1994) 3600–9.

L. Holm and C. Sander, Parser for protein folding units. Proteins: Structure, Function and Bioinform. 19 (1994) 256–268.

L. Holm and C. Sander, Mapping the protein universe. Science 273 (1996) 595–602. | DOI

T.R. Hvidsten, A. Kryshtafovych, J. Komorowski and K. Fidelis, A novel approach to fold recognition using sequence-derived properties from sets of structurally similar local fragments of proteins. Bioinform. 19 (2003) ii81–ii91. | DOI

S. Jones, M. Stewart, A. Michie, M.B. Swindells, C. Orengo and J.M. Thornton, Domain assignment for protein structures using a consensus approach: characterization and analysis. Protein Sci. 7 (1998) 233–242. | DOI

W. Kabsch and C. Sander, Dictionary of protein secondary structure: pattern recognition of hydrogen-bonded and geometrical features. Biopolymers 22 (1983) 2577–2637. | DOI

N. Krasnogor, A.A. Shah, D. Barthel, P. Lukasiak and J. Blazewicz, Web and grid technologies in bioinformatics, computational and systems biology: A review. Curr. Bioinform. 3 (2008) 10–31. | DOI

J. Liu and B. Rost, Sequence-based prediction of protein domains. Nucleic Acids Res. 32 (2004) 3522–3530. | DOI

P. Lukasiak, J. Blazewicz and M. Milostan, Some operations research methods for analyzing protein sequences and structures. Ann. Oper. Res. 175 (2010) 9–35. | DOI | MR | Zbl

A.G. Murzin, S.E. Brenner, T. Hubbard and C. Chothia, Scop: a structural classification of proteins database for the investigation of sequences and structures. J. Mol. Biol. 247 (1995) 536–540. | DOI

M.C.V. Nascimento and A.C.P.L.F. De Carvalho, Spectral methods for graph clustering–a survey. Eur. J. Oper. Res. 211 (2011) 221–231. | DOI | MR | Zbl

M. Oh, K. Joo and J. Lee, Protein-binding site prediction based on three-dimensional protein modeling. Proteins: Structure, Function, and Bioinform. 77 (2009) 152–156. | DOI

F. Pearl, A. Todd, I. Sillitoe, M. Dibley, O. Redfern, T. Lewis, C. Bennett, R. Marsden, A. Grant, D. Lee, A. Akpor, M. Maibaum, A. Harrison, T. Dallman, G. Reeves, I. Diboun, S. Addou, S. Lise, C. Johnston, A. Sillero, J. Thornton and C. Orengo, The cath domain structure database and related resources gene3d and dhs provide comprehensive domain family information for genome analysis. Nucleic Acids Res. 33 (2004) D247–D251. | DOI

F. Pearl, et al. The cath domain structure database and related resources gene3d and dhs provide comprehensive domain family information for genome analysis. Nucleic Acids Res. 33 (2005) D247–D251. | DOI

P. Pons and M. Latapy, Computing communities in large networks using random walks. J. Graph Algorithms Appl. 10 (2006) 191–218. | DOI | MR | Zbl

S.E. Schaeffer, Graph clustering. Comput. Sci. Rev. 1 (2007) 27–64. | DOI | Zbl

T. Schmidt, J. Haas, T.G. Cassarino, and T. Schwede, Assessment of ligand-binding residue predictions in casp9. Proteins: Structure, Function, and Bioinform. 79 (2011) 126–136. | DOI

A.S. Siddiqui and G.J. Barton, Continuous and discontinuous domains: an algorithm for the automatic generation of reliable protein domain definitions. Protein Sci. 4 (1995) 872–884. | DOI

M.B. Swindells, A procedure for detecting structural domains in proteins. Protein science: a Publication of the Protein Society 4 (1995) 103. | DOI

M. Szachniuk, C.M. De Cola, G. Felici and J. Blazewicz, The orderly colored longest path problem – a survey of applications and new algorithms. RAIRO: RO 48 (2014) 25–51. | DOI | Numdam | MR | Zbl

S. Vishveshwara, K.V. Brinda and N. Kannan, Protein structure: insights from graph theory. J. Theoret. Comput. Chem. 1 (2002) 187–211. | DOI

S. Wasik, P. Jackowiak, M. Figlerowicz and J. Blazewicz, Multi-agent model of hepatitis c virus infection. Art. Intell. Med. 60 (2014) 123–131. | DOI

L. Wernisch, M. Hunting and S.J. Wodak, Identification of structural domains in proteins by a graph heuristic. Proteins: Structure, Function, and Bioinform. 35 (1999) 338–352. | DOI

Y. Xu, D. Xu and H.N. Gabow, Protein domain decomposition using a graph-theoretic approach. Bioinform. 16 (2000) 1091–1104. | DOI

C.T. Zahn, Graph-theoretical methods for detecting and describing gestalt clusters. IEEE Trans. Comput. 100 (1971) 68–86. | DOI | Zbl

Y. Zhang, I-tasser server for protein 3d structure prediction. BMC Bioinform. 9 (2008) 40. | DOI

Y. Zhang, Protein structure prediction: when is it useful? Curr. Opin. Struct. Biol. 19 (2009) 145–155. | DOI

C. Zhong, D. Miao and P. Franti, Minimum spanning tree based split-and-merge: A hierarchical clustering method. Inform. Sci. 181 (2011) 3397–3410. | DOI

Cité par Sources :