Updating the sequence based classification of glycosyl hydrolases

This sequence then serves as a seed for the family that is gradually extended with sequences that share statistically significant similarity.

At present, CAZy covers approximately 300 protein families in the following classes of enzyme activities: In addition to protein families that are well curated by the CAZy database, CAZymes are known to contain domains not acting on carbohydrates, including other enzymes—such as proteases, myosin motors or phosphatases, etc.—and a variety of protein–protein or protein–cell wall binding domains—cohesins, SLHs, TPR, etc.

Significantly, the CAZy families, originally created following hydrophobic cluster analysis in the 1990s from very limited number of sequences available (2–6) and later complemented by BLAST- and HMMer-based sequence similarity approaches, are globally surviving the challenge of time in spite of a hundred-fold increase in the number of sequences.

The CAZy database contains information from (i) sequence annotations from publicly available sources, namely the NCBI, including taxonomical, sequence and reference information, (ii) family classification and (iii) known functional information.

Because there is a shortage of EC numbers, relative to the number of functions characterized experimentally, some incomplete EC numbers such as 3.2.1.-, 2.4.1.-, 2.4.2.- and 2.4.99.- are also included in the database.

In addition, as the described functions in CAZy are only of enzymatic nature, additional and complementary binding and inhibitory functions known to be associated with several CAZy proteins will be curated and explored in the near future.

The CAZy family classification system covers all taxonomic groups, and provides the ground for common nomenclature for CAZymes across different glycobiologists (11,12) generally specialized only in some specific groups of organisms.

Day-to-day inspection of new enzyme characterizations reported in the literature regularly led and continues to lead to the definition of new enzyme families.Because of this continuous effort of data addition, new families are frequently added and reflect the advances in experimental characterization of CAZymes.New families are exclusively created based on the availability of at least one biochemically-characterized member for which a sequence is available and the information published in peer-reviewed scientific literature.Collectively designated as Carbohydrate-Active en Zymes (CAZymes), these enzymes build and breakdown complex carbohydrates and glycoconjugates for a large body of biological roles (collectively studied under the term of Glycobiology).Therefore, CAZymes have to perform their function usually with high specificity.Presently only genome released through these Gen Bank releases are analyzed regularly, whereas other genomes protein predictions are analyzed upon request as part of collaborative efforts ().

