Schema for Human Proteins - Human Proteins Mapped by Chained tBLASTn

Schema for Human Proteins - Human Proteins Mapped by Chained tBLASTn

Database: bosTau4 Primary Table: blastHg18KG Row Count: 89,273
Format description: Summary info about a patSpace alignment

field	example	SQL type	info	description
`bin`	585	`smallint unsigned`	range	Indexing field to speed chromosome range queries.
`matches`	147	`int unsigned`	range	Number of bases that match that aren't repeats
`misMatches`	79	`int unsigned`	range	Number of bases that don't match
`repMatches`	0	`int unsigned`	range	Number of bases that match but are part of repeats
`nCount`	0	`int unsigned`	range	Number of 'N' bases
`qNumInsert`	0	`int unsigned`	range	Number of inserts in query
`qBaseInsert`	0	`int unsigned`	range	Number of bases inserted in query
`tNumInsert`	0	`int unsigned`	range	Number of inserts in target
`tBaseInsert`	0	`int unsigned`	range	Number of bases inserted in target
`strand`	+-	`char(2)`	values	+ or - for strand. First character query, second target (optional)
`qName`	NM_001288	`varchar(255)`	values	Query sequence name
`qSize`	240	`int unsigned`	range	Query sequence size
`qStart`	12	`int unsigned`	range	Alignment start position in query
`qEnd`	240	`int unsigned`	range	Alignment end position in query
`tName`	chr1	`varchar(255)`	values	Target sequence name
`tSize`	161106243	`int unsigned`	range	Target sequence size
`tStart`	103398	`int unsigned`	range	Alignment start position in target
`tEnd`	117528	`int unsigned`	range	Alignment end position in target
`blockCount`	5	`int unsigned`	range	Number of blocks in alignment
`blockSizes`	36,42,36,58,54,	`longblob`		Size of each block
`qStarts`	12,48,90,128,186,	`longblob`		Start of each block in query.
`tStarts`	160988715,160989738,1609904...	`longblob`		Start of each block in target.

Sample Rows

bin	matches	misMatches	qNumInsert	qBaseInsert	tNumInsert	tBaseInsert	strand	qName	qSize	qStart	qEnd	tName	tSize	tStart	tEnd	blockCount	blockSizes	qStarts	tStarts
585	147	79	0	0	0	0	+-	NM_001288	240	12	240	chr1	161106243	103398	117528	5	36,42,36,58,54,	12,48,90,128,186,	160988715,160989738,160990434,160991704,161002683,
585	147	79	0	0	0	0	+-	X87689	240	12	240	chr1	161106243	103398	117528	5	36,42,36,58,54,	12,48,90,128,186,	160988715,160989738,160990434,160991704,161002683,
73	427	202	6	56	4	75	+-	AF448439	704	0	704	chr1	161106243	103398	158259	16	67,25,3,40,6,4,53,75,35,42,52,36,42,36,60,53,	0,67,93,101,149,174,191,254,329,364,406,476,512,554,591,651,	160947984,160948242,160948317,160948326,160948446,160948464,160948476,160948635,160948869,160948977,160949109,160988715,16098973 ...
73	427	202	6	56	4	75	+-	NM_053277	704	0	704	chr1	161106243	103398	158259	16	67,25,3,40,6,4,53,75,35,42,52,36,42,36,60,53,	0,67,93,101,149,174,191,254,329,364,406,476,512,554,591,651,	160947984,160948242,160948317,160948326,160948446,160948464,160948476,160948635,160948869,160948977,160949109,160988715,16098973 ...
585	172	50	0	0	0	0	+-	NM_016929	251	21	248	chr1	161106243	103401	117528	5	36,41,34,59,52,	21,57,100,137,196,	160988715,160989738,160990437,160991704,161002686,
585	176	49	0	0	0	0	+-	NM_013943	253	24	251	chr1	161106243	103401	117528	5	36,43,35,59,52,	24,60,103,140,199,	160988715,160989738,160990437,160991704,161002686,
585	150	74	0	0	0	0	+-	BC005367	247	19	245	chr1	161106243	103404	117528	5	35,42,36,60,51,	19,55,97,134,194,	160988715,160989738,160990434,160991701,161002686,
585	150	74	0	0	0	0	+-	NM_001289	247	19	245	chr1	161106243	103404	117528	5	35,42,36,60,51,	19,55,97,134,194,	160988715,160989738,160990434,160991701,161002686,
585	133	37	0	0	0	0	+-	AK075144	205	21	196	chr1	161106243	114362	117528	4	36,41,34,59,	21,57,100,137,	160988715,160989738,160990437,160991704,
586	62	66	1	3	14	165	++	BC111984	201	4	135	chr1	161106243	157288	157837	16	9,13,8,5,6,2,5,4,5,11,7,7,14,10,5,17,	4,13,26,34,39,45,47,52,56,61,75,82,89,103,113,118,	157288,157324,157384,157414,157432,157471,157507,157525,157540,157570,157603,157642,157672,157717,157765,157786,

Note: all start coordinates in our database are 0-based, not 1-based. See explanation here.

Human Proteins (blastHg18KG) Track Description

Description

This track contains tBLASTn alignments of the peptides from the predicted and known genes identified in the hg18 UCSC Genes track.

Methods

First, the predicted proteins from the human Known Genes track were aligned with the human genome using the blat program to discover exon boundaries. Next, the amino acid sequences that make up each exon were aligned with the cow sequence using the tBLASTn program. Finally, the putative cow exons were chained together using an organism-specific maximum gap size but no gap penalty. The single best exon chains extending over more than 60% of the query protein were included. Exon chains that extended over 60% of the query and matched at least 60% of the protein's amino acids were also included.

Credits

tBLASTn is part of the NCBI Blast tool set. For more information on Blast, see Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ. Basic local alignment search tool. J Mol Biol. 1990 Oct 5;215(3):403-410.

Blat was written by Jim Kent. The remaining utilities used to produce this track were written by Jim Kent or Brian Raney.