Cysteines Track

Description

This track shows the location of cysteines as blue blocks along the peptide chain.

Refer to the Number of Cysteines histogram to compare this protein's cysteine count to a genome-wide statistical distribution of cysteine abundance in all proteins found in this assembly.

Discussion

Cysteine is not a common amino acid, with an abundance of 2.3% genome-wide, being most often found paired in disulfides. Disulfides in human proteins are found almost exclusively in oxidative environments, such as on the cell surface or in lysosomes, but not in the cytoplasm or the nucleus. Therefore, high cysteine content, after discounting for iron-sulfur cluster proteins, is contra-indicative of a cytoplasmic location. Similarly, the presence of glycosylation sites, as cell-surface indicators, suggests that cysteines in a protein will usually be paired.

The Cysteines track shows where cysteines occur along the peptide chain, but does not predict whether a given cysteine occurs in a disulfide or where its partner is located. The two cysteines linked in a disulfide are not necessarily located near one another in the amino acid sequence or even in the same chain. Rather, the correct cysteine residues are brought into proximity during protein folding, which may bring remote cysteine residues together. Mammals have an extensive disulfide isomerase activity in the endoplasmic reticulum believed to chaperon disulfides toward the correct pairing (and thus correct folding).

Cystine-pair information is available when the 3-D structure is known, when the selected protein can be threaded to a known structure, when a satisfactory ab initio model exists, or by sequence alignment homology. UniProtKB has also collected experimentally-determined disulfides from the literature.

Various web tools can predict disulfide bonds with varying degrees of success [1-4]. Intermolecular disulfides are fairly common, yet the partners often unknown and so very difficult to take into account. Where reliably predictable, disulfides provide a strong geometrical constraint on ab initio structure prediction to applicable proteins.

Disulfide bonds are fairly well-conserved evolutionarily in many protein families, even as the percent identity drops below 35%. However, in some deeply diverged families such as sulfatases, new disulfides have emerged in subfamily lineages and no ancient disulfides are retained. Conserved cysteines that are not part of an active site are distinguishable from sporadic cysteines and are likely in a disulfide. If the family is large enough and the number of cysteines fairly small, the pairing pattern can sometimes be inferred, starting with the two best-conserved cysteines found in the deepest alignment. Other proteins, with complex disulfide knots, are intractable to homology methods.

Certain protein domains have characteristic disulfide motifs, for example, the CxxxCxxC 4Fe­4S clusters in radical SAM enzymes. These are often preserved even as the domain finds itself in a much larger protein with many additional cysteines. The domain tool Pfam provides 28 domain listings under disulfide.

Another special case occurs in transmembrane proteins. For example, ectodomain cysteines will not be paired with transmembrane or endodomain cysteines. Being external to the cell, they are likely in disulfides although the pairing is not always resolved and indeed may be intermolecular.

Disulfide bond prediction references:

  1. Fariselli P, Casadio R. Prediction of disulfide connectivity in proteins . Bioinformatics 2001 Oct;17(10):957-64.
  2. Fariselli P, Riccobelli P, Casadio R. Role of evolutionary information in predicting the disulfide-bonding state of cysteine in proteins. Proteins 1999 Aug 15;36(3):340-6.
  3. Martelli PL, Fariselli P, Malaguti L, Casadio R. Prediction of the disulfide bonding state of cysteines in proteins with hidden neural networks. Protein Eng. 2002 Dec;15(12):951-3.
  4. Mucchielli-Giorgi MH, Hazout S, Tuffery P. Predicting the disulfide bonding state of cysteines using protein descriptors. Proteins 2002 Feb 15;46(3):243-9.