CpG islands are associated with genes, particularly housekeeping
genes, in vertebrates. CpG islands are typically common near
transcription start sites, and may be associated with promoter
regions. Normally a C (cytosine) base followed immediately by a
G (guanine) base (a CpG) is rare in
vertebrate DNA because the Cs in such an arrangement tend to be
methylated. This methylation helps distinguish the newly synthesized
DNA strand from the parent strand, which aids in the final stages of
DNA proofreading after duplication. However, over evolutionary time,
methylated Cs tend to turn into Ts because of spontaneous
deamination. The result is that CpGs are relatively rare unless
there is selective pressure to keep them or a region is not methylated
for some other reason, perhaps having to do with the regulation of gene
expression. CpG islands are regions where CpGs are present at
significantly higher levels than is typical for the genome as a whole.
Methods
CpG islands were predicted by searching the sequence one base at a
time, scoring each dinucleotide (+17 for CG and -1 for others) and
identifying maximally scoring segments. Each segment was then
evaluated for the following criteria:
GC content of 50% or greater
length greater than 200 bp
ratio greater than 0.6 of observed number of CG dinucleotides to the
expected number on the basis of the number of Gs and Cs in the segment
The CpG count is the number of CG dinucleotides in the island.
The Percentage CpG is the ratio of CpG nucleotide bases
(twice the CpG count) to the length. The ratio of observed to expected
CpG is calculated according to the formula cited in
Gardiner-Garden et al. (1987) in the References section below:
Obs/Exp CpG = Number of CpG * N / (Number of C * Number of G)
where N = length of sequence.
Credits
This track was generated using a
modification of a program developed by G. Miklem and L. Hillier (unpublished).