The bigChain format describes a pairwise alignment that allow gaps in both sequences simultaneously,
just as chain files do; however, bigChain files are compressed and indexed
as bigBeds. Chain files are converted to bigChain files using the program bedToBigBed
,
run with the -as
option to pull in a special
autoSql (.as) file
that defines the fields of the bigChain.
The bigChain files are in an indexed binary format. The main advantage of this format is that only those portions of the file needed to display a particular region are transferred to the Genome Browser server. Because of this, bigChain files have considerably faster display performance than regular chain files when working with large data sets. The bigChain file remains on your local web-accessible server (http, https or ftp), not on the UCSC server, and only the portion needed for the currently displayed chromosomal position is locally cached as a "sparse file". If you do not have access to a web-accessible server and need hosting space for your bigChain files, please see the Hosting section of the Track Hub Help documentation.
The following autoSql definition is used to specify bigChain pairwise alignment files. This
definition, contained in the file bigChain.as, will be
pulled in when the bedToBigBed
utility is run with the -as=bigChain.as
option.
table bigChain
"bigChain pairwise alignment"
(
string chrom; "Reference sequence chromosome or scaffold"
uint chromStart; "Start position in chromosome"
uint chromEnd; "End position in chromosome"
string name; "Name or ID of item, ideally both human readable and unique"
uint score; "Score (0-1000)"
char[1] strand; "+ or - for strand"
uint tSize; "size of target sequence"
string qName; "name of query sequence"
uint qSize; "size of query sequence"
uint qStart; "start of alignment on query sequence"
uint qEnd; "end of alignment on query sequence"
uint chainScore; "score from chain"
)
Note that the bedToBigBed
utility uses a substantial amount of memory: approximately
25% more RAM than the uncompressed BED input file.
To create a bigChain track, follow these steps:
Step 1. If you already have a chain file you would like to convert to a bigChain, skip to Step 3. Otherwise download this example chain file for the human GRCh38 (hg38) assembly.
Step 2.
Download these autoSql files needed by bedToBigBed
:
bigChain.as and
bigLink.as.
Step 3.
Download the bedToBigBed
and hgLoadChain
programs from the UCSC
binary utilities directory.
Step 4.
Use the fetchChromSizes
script from the
same directory to create a
chrom.sizes file for the UCSC database with which you are working (e.g., hg38).
Alternatively, you can download the
chrom.sizes file for any assembly hosted at UCSC from our
downloads page (click on "Full
data set" for any assembly). For example, the hg38.chrom.sizes file for the hg38
database is located at
http://hgdownload.soe.ucsc.edu/goldenPath/hg38/bigZips/hg38.chrom.sizes.
Step 5.
Use the hgLoadChain
utility to generate the chain.tab and link.tab
files needed to create the bigChain file:
hgLoadChain -noBin -test hg38 bigChain chr22_KI2707731v1_random.hg38.mm10.rbest.chain
Step 6.
Create the bigChain file from your input chain file using a combination of sed
,
awk
and the bedToBigBed
utility:
sed 's/.000000//' chain.tab | awk 'BEGIN {OFS="\t"} {print $2, $4, $5, $11, 1000, $8, $3, $6, $7, $9, $10, $1}' > chr22_KI270731v1_random.hg38.mm10.rbest.bigChain
bedToBigBed -type=bed6+6 -as=bigChain.as -tab chr22_KI270731v1_random.hg38.mm10.rbest.bigChain hg38.chrom.sizes bigChain.bb
Step 7. To display your date in the Genome Browser, you must also create a binary indexed link file to accompany your bigChain file:
awk 'BEGIN {OFS="\t"} {print $1, $2, $3, $5, $4}' link.tab | sort -k1,1 -k2,2n > bigChain.bigLink
bedToBigBed -type=bed4+1 -as=bigLink.as -tab bigChain.bigLink hg38.chrom.sizes bigChain.link.bb
Step 8. Move the newly created bigChain (bigChain.bb) and bigLink (bigChain.link.bb) files to a web-accessible http, https or ftp location.
Step 9. Construct a custom track using a single track line. Note that any of the track attributes listed here are applicable to tracks of type bigBed. The most basic version of the track line will look something like this:
track type=bigChain name="My Big Chain" bigDataUrl=http://myorg.edu/mylab/bigChain.bb linkDataUrl=http://myorg.edu/mylab/bigChain.link.bb
Step 10. Paste the custom track line into the text box on the custom track management page.
The bedToBigBed
program can be run with several additional options. For a full
list of the available options, type bedToBigBed
(with no arguments) on the command line
to display the usage message.
In this example, you will create a bigChain custom track using an existing bigChain file, bigChain.bb, located on the UCSC Genome Browser http server. This file contains data for the hg38 assembly.
To create a custom track using this bigChain file:
track type=bigChain name="bigChain Example One" description="A bigChain file" bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb
Custom tracks can also be loaded via one URL line. This link loads the same bigChain.bb track and sets additional display parameters in the URL:
http://genome.ucsc.edu/cgi-bin/hgTracks?db=hg38&position=chr22_KI270731v1_random &hgct_customText=track%20type=bigChain%20name=Example %20bigDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.bb %20linkDataUrl=http://genome.ucsc.edu/goldenPath/help/examples/bigChain.link.bb%20visibility=pack
After this example bigChain is loaded in the Genome Browser, click into a chain on the browser's track display. Note that the details page displays information about the individual chains, similar to that which is available for a standard chain track.
In this example, you will create your own bigChain file from an existing chain input file.
bedToBigBed
and hgLoadChain
utilities (Step 3, above).If you would like to share your bigChain data track with a colleague, learn how to create a URL by looking at Example 11 on this page.
Because the bigChain files are an extension of bigBed files, which are indexed binary files, it can be difficult to extract data from them. UCSC has developed the following programs to assist in working with bigBed formats, available from the binary utilities directory.
bigBedToBed
— converts a bigBed file to ASCII BED format.bigBedSummary
— extracts summary information from a bigBed file.bigBedInfo
— prints out information about a bigBed file.As with all UCSC Genome Browser programs, simply type the program name (with no parameters) at the command line to view the usage statement.
If you encounter an error when you run the bedToBigBed
program, check your input
file for data coordinates that extend past the the end of the chromosome. If these are present, run
the bedClip
program
(available here) to remove the problematic
row(s) in your input file before running the bedToBigBed
program.