Browse the Genome Browser mailing list.
Questions and feedback are welcome.
As vertebrate genome sequences near completion and research re-focuses on their analysis, the issue of effective sequence display becomes critical: it is not helpful to have 3 billion letters of genomic DNA shown as plain text! As an alternative, the UCSC Genome Browser provides a rapid and reliable display of any requested portion of genomes at any scale, together with dozens of aligned annotation tracks (known genes, predicted genes, ESTs, mRNAs, CpG islands, assembly gaps and coverage, chromosomal bands, mouse homologies, and more). Half of the annotation tracks are computed at UCSC from publicly available sequence data. The remaining tracks are provided by collaborators worldwide. Users can also add their own custom tracks to the browser for educational or research purposes.
The Genome Browser stacks annotation tracks beneath genome coordinate positions, allowing rapid visual correlation of different types of information. The user can look at a whole chromosome to get a feel for gene density, open a specific cytogenetic band to see a positionally mapped disease gene candidate, or zoom in to a particular gene to view its spliced ESTs and possible alternative splicing. The Genome Browser itself does not draw conclusions; rather, it collates all relevant information in one location, leaving the exploration and interpretation to the user.
The Genome Browser supports text and sequence based searches that provide quick, precise access to any region of specific interest. Secondary links from individual entries within annotation tracks lead to sequence details and supplementary off-site databases. To control information overload, tracks need not be displayed in full. Tracks can be hidden, collapsed into a condensed or single-line display, or filtered according to the user's criteria. Zooming and scrolling controls help to narrow or broaden the displayed chromosomal range to focus on the exact region of interest. Clicking on an individual item within a track opens a details page containing a summary of properties and links to off-site repositories such as PubMed, GenBank, Entrez, and OMIM. The page provides item-specific information on position, cytoband, strand, data source, and encoded protein, mRNA, genomic sequence and alignment, as appropriate to the nature of the track.
A blue navigation bar at the top of the browser provides links to several other tools and data sources. For instance, under the "View" menu, the "DNA" link enables the user to view the raw genomic DNA sequence for the coordinate range displayed in the browser window. This DNA can encode track features via elaborate text formatting options. Other links tie the Genome Browser to the BLAT alignment tool, provide access to the underlying relational database via the Table Browser, convert coordinates across different assembly dates, and open the window at the complementary Ensembl or NCBI Genome Data Viewer annotation.
The browser data represents an immense collaborative effort involving thousands of people from the international biomedical research community. The UCSC Bioinformatics Group itself does no sequencing. Although it creates the majority of the annotation tracks in-house, the annotations are based on publicly available data contributed by many labs and research groups throughout the world. Several of the Genome Browser annotations are generated in collaboration with outside individuals or are contributed wholly by external research groups. UCSC's other major roles include building genome assemblies, creating the Genome Browser work environment, and serving it online. The majority of the sequence data, annotation tracks, and even software are in the public domain and are available for anyone to download.
In addition to the Genome Browser, the UCSC Genome Bioinformatics group provides several other tools for viewing and interpreting genome data:
The UCSC Genome Bioinformatics home page provides access to Genome Browsers on several different genome assemblies. To get started, click the Browser link on the blue sidebar. This will take you to a Gateway page where you can select which genome to display. Note that there are also official mirror sites in Europe and Asia for users who are geographically closer to those continents than to the western United States.
To get oriented in using the Genome Browser, try viewing a gene or region of the genome with which you are already familiar, or use the default position. To open the Genome Browser window:
Occasionally the Gateway page returns a list of several matches in response to a search, rather than immediately displaying the Genome Browser window. When this occurs, click on the item in which you're interested and the Genome Browser will open to that location.
The search mechanism is not a site-wide search engine. Instead, it primarily searches GenBank mRNA records whose text annotations can include gene names, gene symbols, journal title words, author names, and RefSeq mRNAs. Searches on other selected identifiers, such as NP and NM accession numbers, OMIM identifiers, and Entrez Gene IDs are supported. However, some types of queries will return an error, e.g. post-assembly GenBank entries, withdrawn gene names, and abandoned synonyms. If your initial query is unsuccessful, try entering a different related term that may produce the same location. For example, if a query on a gene symbol produces no results, try entering an mRNA accession, gene ID number, or descriptive words associated with the gene.
If you have genomic, mRNA, or protein sequence, but don't know the name or the location to which it maps in the genome, the BLAT tool will rapidly locate the position by homology alignment, provided that the region has been sequenced. This search will find close members of the gene family, as well as assembly duplication artifacts. An entire set of query sequences can be looked up simultaneously when provided in fasta format.
A successful BLAT search returns a list of one or more genome locations that match the input sequence. To view one of the alignments in the Genome Browser, click the browser link for the match. The details link can be used to preview the alignment to determine if it is of sufficient match quality to merit viewing in the Genome Browser. If too many BLAT hits occur, try narrowing the search by filtering the sequence in slow mode with RepeatMasker, then rerunning the BLAT search.
For more information on conducting and fine-tuning BLAT searches, refer to the BLAT section of this document.
You can open the Genome Browser window with a custom annotation track displayed by using the Add Custom Tracks feature available from the gateway and annotation tracks pages. For more information on creating and using custom annotation tracks, refer to the Creating custom annotation tracks section.
Annotation track data can be entered in one of three ways:
Once you've entered the annotation information, click the submit button at the top of the Gateway page to open up the Genome Browser with the annotation track displayed.
The Genome Browser also provides a collection of custom annotation tracks contributed by the UCSC Genome Bioinformatics group and the research community.
NOTE: If an annotation track does not display correctly when you attempt to upload it, you may need to reset the Genome Browser to its default settings, then reload the track. For information on troubleshooting display problems with custom annotation tracks, refer to the troubleshooting section in the Creating custom annotation tracks section.
The Table Browser, a portal to the underlying open source MySQL relational database driving the Genome Browser, displays genomic data as columns of text rather than as graphical tracks. For more information on using the Table Browser, see the section Getting started: on the Table Browser.
Several external gateways provide direct links into the Genome Browser. Examples include: Entrez Gene, AceView, Ensembl, SuperFamily, and GeneCards. Journal articles can also link to the browser and provide custom tracks. Be sure to use the assembly date appropriate to the provided coordinates when using data from a journal source.
To facilitate your return to regions of interest within the Genome Browser, save the coordinate range or bookmark the page of displays that you plan to revisit or wish to share with others.
It is usually best to work with the most recent assembly even though a full set of tracks might not yet be ready. Be aware that the coordinates of a given feature on an unfinished chromosome may change from one assembly to the next as gaps are filled, artifactual duplications are reduced, and strand orientations are corrected. The Genome Browser offers multiple tools that can correctly convert coordinates between different assembly releases. For more information on conversion tools, see the section Converting data between assemblies.
To ensure uninterrupted browser services for your research during UCSC server maintenance and power outages, bookmark a mirror site that replicates the UCSC genome browser.
Bear in mind that the Genome Browser cannot outperform the underlying quality of the draft genome. Assembly errors and sequence gaps may still occur well into the sequencing process due to regions that are intrinsically difficult to sequence. Artifactual duplications arise as unavoidable compromises during a build, causing misleading matches in genome coordinates found by alignment.
The Genome Browser annotation tracks page displays a genome location specified through a Gateway search, a BLAT search, or an uploaded custom annotation track. There are five main features on this page: a set of navigation controls, a chromosome ideogram, the annotations tracks image, display configuration buttons, and a set of track display controls.
The first time you open the Genome Browser, it will use the application default values to configure the annotation tracks display. By manipulating the navigation, configuration and display controls, you can customize the annotation tracks display to suit your needs. For a complete description of the annotation tracks available in all assembly versions supported by the Genome Browser, see the Annotation Track Descriptions section.
The Genome Browser retains user preferences from session to session within the same web browser, although it never monitors or records user activities or submitted data. To restore the default settings, click the "Click here to reset" link on the Genome Browser Gateway page. To return the display to the default set of tracks (but retain custom tracks and other configured Genome Browser settings), click the default tracks button on the Genome Browser page.
Annotation track descriptions: Each annotation track has an associated description page that contains a discussion of the track, the methods used to create the annotation, the data sources and credits for the track, and (in some cases) filter and configuration options to fine-tune the information displayed in the track. To view the description page, click on the mini-button to the left of a displayed track or on the label for the track in the Track Controls section.
Annotation track details pages: When an annotation track is displayed in full, pack, or squish mode, each line item within the track has an associated details page that can be displayed by clicking on the item or its label. The information contained in the details page varies by annotation track, but may include basic position information about the item, related links to outside sites and databases, links to genomic alignments, or links to corresponding mRNA, genomic, and protein sequences.
Gene prediction tracks: Coding exons are represented by blocks connected by horizontal lines representing introns. The 5' and 3' untranslated regions (UTRs) are displayed as thinner blocks on the leading and trailing ends of the aligning regions. In full display mode, arrowheads on the connecting intron lines indicate the direction of transcription. In situations where no intron is visible (e.g. single-exon genes, extremely zoomed-in displays), the arrowheads are displayed on the exon block itself.
Pattern Space Layout (PSL) alignment tracks: Aligning regions (usually exons when the query is cDNA) are shown as black blocks. In dense display mode, the degree of darkness corresponds to the number of features aligning to the region or the degree of quality of the match. In pack or full display mode, the aligning regions are connected by lines representing gaps in the alignment (typically spliced-out introns), with arrowheads indicating the orientation of the alignment, pointing right if the query sequence was aligned to the forward strand of the genome and left if aligned to the reverse strand. Two parallel lines are drawn over double-sided alignment gaps, which skip over unalignable sequence in both target and query. For alignments of ESTs, the arrows may be reversed to show the apparent direction of transcription deduced from splice junction sequences. In situations where no gap lines are visible, the arrowheads are displayed on the block itself. To prevent display problems, the Genome Browser imposes an upper limit on the number of alignments that can be viewed simultaneously within the tracks image. When this limit is exceeded, the Browser displays the best several hundred alignments in a condensed display mode, then lists the number of undisplayed alignments in the last row of the track. In this situation, try zooming in to display more entries or to return the track to full display mode. For some PSL tracks, extra coloring to indicate mismatching bases and query-only gaps may be available.
Chain tracks (2-species alignment): Chain tracks display boxes joined together by either single or double lines. The boxes represent aligning regions. Single lines indicate gaps that are largely due to a deletion in the genome of the first species or an insertion in the genome of the second species. Double lines represent more complex gaps that involve substantial sequence in both species. This may result from inversions, overlapping deletions, an abundance of local mutation, or an unsequenced gap in one species. In cases where there are multiple chains over a particular portion of the genome, chains with single-lined gaps are often due to processed pseudogenes, while chains with double-lined gaps are more often due to paralogs and unprocessed pseudogenes. In the fuller display modes, the individual feature names indicate the chromosome, strand, and location (in thousands) of the match for each matching alignment.
Net tracks (2-species alignment): Boxes represent ungapped alignments, while lines represent gaps. Clicking on a box displays detailed information about the chain as a whole, while clicking on a line shows information on the gap. The detailed information is useful in determining the cause of the gap or, for lower level chains, the genomic rearrangement. Individual items in the display are categorized as one of four types (other than gap):
Snake tracks: The snake alignment track (or snake track) shows the relationship between the chosen Browser genome (reference genome) and another genome (query genome). A snake is a way of viewing a set of pairwise gapless alignments that may overlap on both the reference and query genomes. Alignments are always represented as being on the positive strand of the reference species, but can be on either strand on the query sequence.
In full display mode, a snake track can be decomposed into two drawing elements: segments (colored rectangles) and adjacencies (lines connecting the segments). Segments represent subsequences of the target genome aligned to the given portion of the reference genome. Adjacencies represent the covalent bonds between the aligned subsequences of the target genome.
Red tick-marks within segments represent substitutions with respect to the reference, shown in windows of the reference of (by default) up to 50 Kb. Zoomed in to the base level, these substitutions are labeled with the non-reference base.
An insertion in the reference relative to the query creates a gap between abutting segment sides that is connected by an adjacency. An insertion in the query relative to the reference is represented by an orange tick-mark that splits a segment at the location the extra bases would be inserted. Simultaneous independent insertions in both query and reference look like an insertion in the reference relative to the target, except that the corresponding adjacency connecting the two segments is colored orange. More complex structural rearrangements create adjacencies that connect the sides of non-abutting segments in a natural fashion.
Pack mode can be used to display a larger number of snake tracks in the limited vertical browser. This mode eliminates the adjacencies from the display and forces the segments onto as few rows as possible, given the constraint of still showing duplications in the query sequence.
Dense mode further eliminates these duplications so that each snake track is compactly represented along just one row.
Wiggle tracks: These tracks plot a continuous function along a chromosome. Data is displayed in windows of a set number of base pairs in width. The score for each window displays as "mountain ranges" The display characteristics vary among the tracks in this group. See the individual track descriptions for more information on interpreting the display. If the peak is taller or shorter than what can be shown in the display, it is clipped and colored magenta.
Each annotation track within the window may have up to five display modes:
The track display controls are grouped into categories that reflect the type of data in the track, e.g., Gene Prediction Tracks, mRNA and EST tracks, etc. To change the display mode for a track, find the track's controller in the Track Controls section at the bottom of the Genome Browser page, select the desired mode from the control's display menu, and then click the refresh button. Alternatively, you can change the display mode by using the Genome Browser's right-click navigation feature, or can toggle between dense and full modes for a displayed track (or pack mode when available) by clicking on the optional center label for the track.
Track display modes may be set individually or as a group on the Genome Browser Track Configuration page. To access the configuration page, click the configure button on the annotation tracks page or the configure tracks and display button on the Gateway page. Exercise caution when using the show all buttons on track groups or assemblies that contain a large number tracks; this may seriously impact the display performance of the Genome Browser or cause your Internet browser to time out.
The entire set of track display controls at the bottom of the annotation tracks page may be hidden from view by checking the Show track controls under main graphic option in the Configure Image section of the Track Configuration page.
Some tracks have additional filter and configuration capabilities, e.g., EST tracks, mRNA tracks, NC160, etc. These options let the user modify the color or restrict the data displayed within an annotation track. Filters are useful for focusing attention on items relevant to the current task in tracks that contain large amounts of data. For example, to highlight ESTs expressed in the liver, set the EST track filter to display items in a different color when the associated tissue keyword is "liver" Configuration options let the user adjust the display to best show the data of interest. For example, the min vertical viewing range value on wiggle tracks can be used to establish a data threshold. By setting the min value to "50", only data values greater than 50 percent will display.
To access filter and configuration options for a specific annotation track, open the track's description page by clicking the label for the track's control menu under the Track Controls section, the mini-button to the left of the displayed track, or the "Configure..." option from the Genome Browser's right-click popup menu. The filter and configration section is located at the top of the description page. In most instances, more information about the configuration options is available within the description text or through a special help link located in the configuration section.
Filter and configuration settings are persistent from session to session on the same web browser. To return the Genome Browser display to the default set of tracks (but retain custom tracks and other configured Genome Browser settings), click the default tracks button on the Genome Browser tracks page. To remove all user configuration settings and custom tracks, and completely restore the defaults, click the "Click here to reset" link on the Genome Browser Gateway page.
At times you may want to adjust the amount of flanking region displayed in the annotation tracks window or adjust the scale of the display. At a scale of 1 pixel per base pair, the window accurately displays the width of exons and introns, and indicates the direction of transcription (using arrowheads) for multi-exon features. At a grosser scale, certain features - such as thin exons - may disappear. Also, some exons may falsely appear to fall within RepeatMasker features at some scales.
Click the zoom in and zoom out buttons at the top of the Genome Browser page to zoom in or out on the center of the annotation tracks window by 1.5, 3 or 10-fold. Alternatively, you can zoom in 3-fold on the display by clicking anywhere on the Base Position track. In this case, the zoom is centered on the coordinate of the mouse click. To view the base composition of the sequence underlying the current annotation track display, click the base button.
To scroll (pan) the view of the entire tracks image horizontally, click on the image and drag the cursor to the left or right, then release the mouse button, to shift the displayed region in the corresponding direction. The view may be scrolled by up to one image width. To scroll the annotation tracks horizontally by set increments of 10%, 50%, or 95% of the displayed size (as given in base pairs), click the corresponding move arrow. It is also possible to scroll the left or right side of the tracks by a specified number of vertical gridlines while keeping the position of the opposite side fixed. To do this, click the appropriate move start or move end arrow, located under the annotation tracks window. For example, to keep the left-hand display coordinate fixed but increase the right-hand coordinate, you would click the right-hand move end arrow. To increase or decrease the gridline scroll interval, edit the value in the move start or move end text box.
The browser's "drag-and-select" pop-up menu provides options to add single or multiple vertical highlights to selected regions, as described below:
Main features in drag-and-select menu:
In the genome browser, there are also options for right-clicking:
To display a completely different position in the genome, enter the new query in the position/search text box, then click the jump button. For more information on valid entries for this text box, refer to the Getting started section.
If a chromosome image (ideogram) is available above the track display, click anywhere on the chromosome to move to that position (the current window size will be maintained). Select a region of any size by clicking and dragging in the image. Finally, hold the "control" key while clicking on a chromosome band to select the entire band.
To vertically reposition a track in the annotation track window, click-and-hold the mouse button on the side label, then drag the highlighted track up or down within the image. Release the mouse button when the track is in the desired position. To move an entire group of associated tracks (such as all the displayed subtracks in a composite track), click-and-hold the gray mini-button to the left of the tracks, then drag.
To remove intronic or intergenic regions from the display or to view only custom specified regions, click the multi-region button under the track image. For human assemblies hg17 and later, you may also replace a section of the reference genome with an alternate haplotype chromosome in order to view annotations upstream and downstream of the sequence. For more information about the multi-region feature see the multi-region help page.
The first time the annotation track window is displayed, or after the Genome Browser has been reset, the size of the track window is set by default to the width that best fits your Internet browser window. If you horizontally resize the browser window, you can automatically adjust the annotation track image size to the new width by clicking the resize button under the track image. To manually override the default width, enter a new value in the image width text box on the Track Configuration page, then click the submit button. The maximum supported width is 5000 pixels.
The item labels (or track label, when viewed in dense mode) are displayed to the left of the annotation image. The width of this area is set to 17 characters by default. To change the width, edit the value in the label area width text box on the Track Configuration page, then click Submit.
The annotation track image may be adjusted to display text in a range of fonts from "tiny" to "huge". To change the size of the text, select an option from the text size pull-down menu on the Track Configuration page, then click Submit. The text size is set to "small" by default.
The track and element labels displayed above and to the left of the tracks in the annotation tracks image may be hidden from view by unchecking the Display track descriptions above each track and Display labels to the left of items in tracks boxes, respectively, on the Track Configuration page.
The light blue vertical guidelines on the annotation tracks image may be removed by unchecking the Show light blue vertical guidelines box on the Track Configuration page.
The chromosome ideogram, located just above the annotation tracks image, provides a graphical overview of the features on the selected chromosome, including its bands, the position of the centromere, and an indication of the region currently displayed in the annotation tracks image. To hide the ideogram, uncheck the Display chromosome ideogram above main graphic box on the Tracks Configuration page.
When the Next/previous item navigation configuration option is toggled on, on the Track Configuration page, gray double-headed arrows display in the Genome Browser tracks image on both sides of the track labels of gene, mRNA and EST tracks (or any standard tracks based on BED, PSL or genePred format). Clicking on the gray arrows shifts the image window toward that end of the chromosome so that the next item in the track is displayed. Similarly, the Next/previous exon navigation configuration option displays white double-headed arrows on the end of any item that extends off the edge of the current image. Clicking on one of the white arrows shifts the image window to the next exon in the indicated direction, unless the image window interrupts an exon, in which case the window shifts to the edge of the current exon. If the image window happens to be within a 5' or 3' UTR, then clicking the arrows shifts the image window towards the start or end of the next coding region, not the end of the exon.
Several of the common display and navigation operations offered on the Genome Browser tracks page may be quickly accessed by right-clicking on a feature on the tracks image and selecting an option from the displayed popup menu. Depending on context, the right-click feature allows the user to:
The Genome Browser provides a mechanism for saving a copy of the currently displayed annotation tracks image to a file that can be printed or edited. Images saved in PostScript format can be printed at high resolution and edited by drawing programs such as Adobe Illustrator. This is useful for generating figures intended for publication. Images can also be saved in PDF format for viewing by Adobe Acrobat Reader.
To print or save the image to a file:
NOTE: If you have configured your browser image to use one of the larger font sizes, the text in the resulting screen shot may not display correctly. If you encounter this problem, reduce the Genome Browser font size using the Configuration utility, then repeat the save/print process.
BLAT (BLAST-Like Alignment Tool) is a very fast sequence alignment tool similar to BLAST. For more information on BLAT's internal scoring schemes and its overall n-mer alignment seed strategy, refer to W. James Kent (2002) BLAT - The BLAST-Like Alignment Tool, Genome Res 12:4 656-664.
On DNA queries, BLAT is designed to quickly find sequences with 95% or greater similarity of length 25 bases or more. It may miss genomic alignments that are more divergent or shorter than these minimums, although it will find perfect sequence matches of 32 bases and sometimes as few as 22 bases. The tool is capable of aligning sequences that contain large introns. On protein queries, BLAT rapidly locates genomic sequences with 80% or greater similarity of length 20 amino acids or more. In general, gene family members that arose within the last 350 million years can generally be detected. More divergent sequences can be aligned to the human genome by using NCBI's BLAST and psi-BLAST, then using BLAT to align the resulting match onto the UCSC genome assembly. In practice DNA Blat works well on primates, and protein Blat works well on land vertebrates.
Some common uses of BLAT include:
To locate a nucleotide or protein within a genome using BLAT:
Header lines may be included in the input text if they are preceded by > and contain unique names. Multiple sequences may be submitted at the same time if they are of the same type and are preceded by unique header lines. Numbers, spaces, and extraneous characters are ignored:
>sequence_1 ATGCAGAGCAAGGTGCTGCTGGCCGTCGCCCTGTGGCTCTGCGTGGAGAC CCGGGCCGCCTCTGTGGGTTTGCCTAGTGTTTCTCTTGATCTGCCCAGGC >sequence_2 ATGTTGTTTACCGTAAGCTGTAGTAAAATGAGCTCGATTGTTGACAGAGA TGACAGTAGTATTTTTGATGGGTTGGTGGAAGAAGATGACAAGGACAAAG >sequence_3 ATGCTGCGAACAGAGAGCTGCCGCCCCAGGTCGCCCGCCGGACAGGTGGC CGCGGCGTCCCCGCTCCTGCTGCTGCTGCTGCTGCTCGCCTGGTGCGCGG
DNA input sequences are limited to a maximum length of 25,000 bases. Protein or translated input sequences must not exceed 10,000 letters. As many as 25 multiple sequences may be submitted at the same time. The maximum combined length of DNA input for multiple sequence submissions is 50,000 bases (with a 25,000 base limit per individual sequence). For protein or translated input, the maximum combined input length is 25,000 letters (with a 5000 letter limit per individual sequence).
NOTE: Program-driven BLAT use is limited to a maximum of one hit every 15 seconds and no more than 5000 hits per day.
If a query returns successfully, BLAT will display a flat database file that summarizes the alignments found. A BLAT query often generates multiple hits. This can happen when the genome contains multiple copies of a sequence, paralogs, pseudogenes, statistical coincidences, artifactual assembly duplications, or when the query itself contains repeats or common retrotransposons. When too many hits occur, try resubmitting the query sequence after filtering in slow mode with RepeatMasker.
Items in the search results list are ordered by the criteria specified in the Sort output menu. Each line item provides links to view the details of the sequence alignment or to open the corresponding view in the Genome Browser. The details link gives the letter-by-letter alignment of the sequence to the genome. It is recommended that you first examine the details of the alignment for match quality before viewing the sequence in the Genome Browser.
When several nearby BLAT matches occur on a single chromosome, a simple trick can be used to quickly adjust the Genome Browser track window to display all of them: open the Genome Browser with the match that has the lowest chromosome start coordinate, paste in the highest chromosome end coordinate from the list of matches, then click the jump button.
To make a custom track directly from BLAT, select the PSL format output option. The resulting PSL track can be uploaded into the Genome Browser by pasting the data into the data text box on the Genome Browser Add Custom Tracks page, accessed via the "add custom tracks" button on the Browser gateway and annotation tracks pages. See the Creating custom annotation tracks section for more information.
For large batch jobs or internal parameter changes, it is best to install command line BLAT on your own Linux server. Sources and executables are free for academic, personal, and non-profit purposes. BLAT source may be downloaded from http://www.soe.ucsc.edu/~kent (look for the blatSrc*.zip file with the most recent date). For BLAT executables, go to http://genome-test.soe.ucsc.edu/~kent/exe/; binaries are sorted by platform. Non-exclusive commercial licenses are available from the Kent Informatics website.
For more information on the BLAT suite of programs, see the BLAT Program Specifications and the Blat section of the Genome Browser FAQ.
Detailed information about an individual annotation track, including display characteristics, configuration information, and associated database tables, may be obtained from the track description page accessed by clicking the mini-button to the left of the displayed track in the Genome Browser, or by selecting the "Open details..." or "Show details..." option from the Genome Browser's right-click menu. Click the "View table schema" link on the track description page to display additional information about the primary database table underlying the track. Table schema information may also be accessed via the "describe table schema" button in the Table Browser. For more information on configuring and using the tracks displayed in the Genome Browser track window, see the section Interpreting and Fine-tuning the Genome Browser display.
The Table Browser provides text-based access to the genome assemblies and annotation data stored in the Genome Browser database. As a flexible alternative to the graphical-based Genome Browser, this tool offers an enhanced level of query support that includes restrictions based on field values, free-form SQL queries, and combined queries on multiple tables. Output can be filtered to restrict the fields and lines returned, and may be organized into one of several formats, including a simple tab-delimited file that can be loaded into a spreadsheet or database as well as advanced formats that may be uploaded into the Genome Browser as custom annotation tracks. The Table Browser provides a convenient alternative to downloading and manipulating the entire genome and its massive data tracks. (See the Downloading Genome Data section.)
For information on using the Table Browser features, refer to the Table Browser User Guide.
The Sessions tool allows users to configure their browsers with specific track combinations, including custom tracks, and save the configuration options. Multiple sessions may be saved for future reference, for comparison of scenarios or for sharing with colleagues. Saved sessions persist for four months after the last access, unless deleted. User-generated tracks can be saved within sessions.
This tool may be accessed by clicking the "My Data" pulldown in the top blue navigation bar in any assembly and then selecting Sessions. To ensure privacy and security, you must create an account and/or log in to use the Session tool. Individual sessions may be designated by the user as either "shared" or "non-shared" to protect the privacy of confidential data. To avoid having a new shared session from someone else override existing Genome Browser settings, users are encouraged to open a new web-browser instance or to save existing settings in a session before loading a new shared session.
For more detailed information on using the Session tool, see the Sessions User Guide.
The Genome Graphs tool can be used to display genome-wide data sets such as the results of genome-wide SNP association studies, linkage studies, and homozygosity mapping. This tool is not pre-loaded with any sample data; instead, you can upload your own data for display by the tool.
Once you have uploaded your data, you can view it in a variety of ways. You can view multiple sets of genome-wide data simultaneously either as superimposed graphs or side-by-side graphs. Once you see an area of interest in the Genome Graphs view, you can click on it to go directly to the Genome Browser at that position. You can also set a significance threshold for your data and view only regions or gene sets that meet that threshold.
For information on using the Genome Graphs features, refer to the Genome Graphs User Guide.
VisiGene is a browser for viewing in situ images. It enables the user to examine cell-by-cell as well as tissue-by-tissue expression patterns. The browser serves as a virtual microscope, allowing users to retrieve images that meet specific search criteria, then interactively zoom and scroll across the collection.
To start the VisiGene browser, click the VisiGene link in the left-hand sidebar menu on the Genome Browser home page.
The following image collections are currently available for browsing:
The image database may be searched by gene symbols, authors, years of publication, body parts, GenBank or UniProtKB accessions, organisms, Theiler stages (mice), and Nieuwkoop/Faber stages (frogs). The search returns only those images that match all the specified criteria. For a list of sample search strings, see the VisiGene Gateway page.
The wildcard characters * and ? are supported for gene name searches. For example, to view the images of all genes in the Hox A cluster, search for hoxa*. When searching on author names that include initials, use the format Smith AJ.
Following a successful search, VisiGene displays a list of thumbnails of images matching the search criteria in the lefthand pane of the browser. By default, the image corresponding to the first thumbnail in the list is displayed in the main image pane. If more than 25 images meet the search criteria, links at the bottom of the thumbnail pane allow the user to toggle among pages of search results. To display a different image in the main browser pane, click the thumbnail of the image you wish to view.
By default, an image is displayed at a resolution that provides optimal viewing of the overall image. This size varies among images. The image may be zoomed in or out, sized to match the resolution of the original image or best fit the image display window, and moved or scrolled in any direction to focus on areas of interest. The original full-sized image may also be downloaded.
Zooming in: To enlarge the image by 2X, click the Zoom in button above the image or click on the image using the left mouse button. Alternatively, the + key may be used to zoom in when the main image pane is the active window.
Zooming out: To reduce the image by 2X, click the Zoom out button above the image or click on the image using the right mouse button. Alternatively, the - key may be used to zoom out when the main image pane is the active window.
Sizing to full resolution: Click the Zoom full button above the image to resize the image such that each pixel on the screen corresponds to a pixel in the digitized image.
Sizing to best fit: Click the Zoom fit button above the image to zoom the image to the size that best fits the main image pane.
Moving the image: To move the image viewing area in any direction, click and drag the image using the mouse. Alternatively, the following keyboard shortcuts may be used after clicking on the image:
Downloading the original full-sized image: Most images may be viewed in their original full-sized format by clicking the "download" link at the bottom of the image caption. NOTE: due to the large size of some images, this action may take a long time and could potentially exceed the capabilities of some Internet browsers.
If you have an image set you would like to contribute for display in the VisiGene Browser, contact Jim Kent.
The Genome Browser provides a feature to configure the retrieval, formatting, and coloring of the text used to depict the DNA sequence underlying the features in the displayed annotation tracks window. Retrieval options allow the user to add a padding of extra bases to the upstream or downstream end of the sequence. Formatting options range from simply displaying exons in upper case to elaborately marking up a sequence according to multiple track data. The DNA sequence covered by various tracks can be highlighted by case, underlining, bold or italic fonts, and color.
The DNA display configuration feature can be useful to highlight features within a genomic sequence, point out overlaps between two types of features (for example, known genes vs. gene predictions), or mask out unwanted features.
To access the feature, click on the "View" pulldown on the top blue menu bar on the Genome Browser page and select "DNA", or select the "Get DNA..." option from the Genome Browser's right-click menu depending on context. "The Get DNA in Window" page that appears contains sections for configuring the retrieval and output format.
To display extra bases upstream of the 5' end of your sequence or downstream of the 3' end of the sequence, enter the number of bases in the corresponding text box. This option is useful in looking for regulatory regions.
The Sequence Formatting section lists several options for adjusting the case of all or part of the DNA sequence. To choose one of these formats, click the corresponding option button, then click the get DNA button. To access a table of extended formatting options, click the Extended case/color options button.
The Extended DNA Case/Color page presents a table with many more format options. The page provides instructions for using the formatting table, as well as examples of its use. The list of tracks in the Track Name column is automatically generated from the list of tracks available on the current genome.
A few caveats mentioned on the Extended DNA Case/Color page bear repeating. Keep the formatting simple at first: it is easy to make a display that is pretty to look at but is also completely cryptic. Also, be careful when requesting complex formatting for a large chromosomal region: when all the HTML tags have been added to the output page, the file size may exceed the size limits that your Internet browser, clipboard, and other software can safely display. The maximum size of genome that can be formatted by the tool is approximately 10 Mbp.
Coordinates of features frequently change from one assembly to the next as gaps are closed, strand orientations are corrected, and duplications are reduced. Occasionally, a chunk of sequence may be moved to an entirely different chromosome as the map is refined. There are three different methods available for migrating data from one assembly to another: BLAT alignment, coordinate conversion, and coordinate lifting. The BLAT alignment tool is described in the section Using BLAT alignments.
The Genome Browser Convert utility is useful for locating the position of a feature of interest in a different release of the same genome or (in some cases) in a genome assembly of another species. During the conversion process, portions of the genome in the coordinate range of the original assembly are aligned to the new assembly while preserving their order and orientation. In general, it is easier to achieve successful conversions with shorter sequences.
When coordinate conversion is available for an assembly, click on the "View" pulldown on the top blue menu bar on the Genome Browser page and select the "In Other Genomes (Convert)" link. You will be presented with a list of the genome/assembly conversion options available for the current assembly. Select the genome and assembly to which you'd like to convert the coordinates, then click the Submit button. If the conversion is successful, the browser will return a list of regions in the new assembly, along with the percent of bases and span covered by that region. Click on a region to display it in the browser. If the conversion is unsuccessful, the utility returns a failure message.
The liftOver tool is useful if you wish to convert a large number of coordinate ranges between assemblies. This tool is available in both web-based and command line forms, and supports forward/reverse conversions as well as conversions between species.
To access the graphical version of the liftOver tool, click on "Tools" pulldown in the top blue menu bar of the Genome Browser, then select LiftOver from the menu.
To convert one or more coordinate ranges using the default conversion settings:
Alternatively, you may load the coordinate ranges from an existing data file by entering the file name in the upload box at the bottom of the screen, then clicking the Submit File button.
The default parameter settings are recommended for general purpose use of the liftOver tool. However, you may want to customize settings if you have several very large regions to convert.
The command-line version of liftOver offers the increased flexibility and performance gained by running the tool on your local server. This utility requires access to a Linux platform. The executable file may be downloaded here. Command-line liftOver requires a UCSC-generated over.chain file as input. Pre-generated files for a given assembly can be accessed from the assembly's "LiftOver files" link on the Downloads page. If the desired conversion file is not listed, send a request to the genome mailing list and we may be able to generate one for you.
Most of the underlying tables containing the genomic sequence and annotation data displayed in the Genome Browser can be downloaded. All of the tables are freely usable for any purpose except as indicated in the README.txt file in the download directories. This data was contributed by many researchers, as listed on the Genome Browser Credits page. Please acknowledge the contributor(s) of the data you use.
Genome data can be downloaded in different ways:
rsync -a -P rsync://hgdownload.soe.ucsc.edu/path/file ./can quickly and efficiently download large files to your current directory (./). To download an entire directory (note the trailing slash), you would use an expression such as
rsync -a -P rsync://hgdownload.soe.ucsc.edu/directory/ ./For more information please click here.
ftp://hgdownload.soe.ucsc.edu/goldenPath/will take you to a directory that contains the genome download directories. This download method is not recommended if you plan to download a large file or multiple files from a single directory compared to rsync (see above). You can, however, use the mget command to download multiple files: mget filename1 filename2, or mget -a (to download all the files in the directory).
There may be several download directories associated with each version of a genome assembly: the full data set (bigZips), the full data set by chromosome (chromosome), the annotation database tables (database), and one or more sets of comparative cross-species alignments.
BigZips contains the entire draft of the genome in chromosome and/or contig form. Depending on the genome, this directory may contain some or all of the following files:
Chromosomes contains the assembled sequence for the genome in separate files for each chromosome in a zipped fasta format. The main assembly can be found in the chrN.fa files, where N is the name of the chromosome. The chrN_random.fa files contain clones that are not yet finished or cannot be placed with certainty at a specific place on the chromosome. In some cases, the chrN_random.fa files also contain haplotypes that differ from the main assembly.
Database contains all of the positional and non-positional tables in the genome annotation database. Each table is represented by 2 files:
Schema descriptions for all tables in the genome annotation database may be viewed by using the "describe table schema" button in the Table Browser.
Cross-species alignments directories, such as the vsMm4 and humorMm3Rn3 directories in the hg16 assembly, contain pairwise and multiple species alignments and filtered alignment files used to produce cross-species annotations. For more information, refer to the READMEs in these directories and the description of the Multiple Alignment Format (MAF).
Track hubs are web-accessible directories of genomic data that can be viewed on the UCSC Genome Browser alongside native annotation tracks. Hubs are a useful tool for visualizing a large number of genome-wide data sets. The Track Hub utility allows efficient access to data sets from around the world through the familiar Genome Browser interface. Browser users can display tracks from any public track hub that has been registered with UCSC. Additionally, users can import data from unlisted hubs or can set up, display, and share their own track hubs.
For information on using the Track Hub features, refer to the Genome Browser Track Hub User Guide. For specific information on configuring your trackDb.txt file, refer to the Track Database Definition Document. See also the Basic Hub Quick Start Guide and Quick Start Guide to Organizing Track Hubs into Groupings.