OriDB User Guide

This database summarizes our knowledge of replication origins in the budding yeast Saccharomyces cerevisiae. Each proposed origin site has been assigned a Status (Confirmed, Likely, or Dubious) expressing our confidence that the site genuinely corresponds to an origin. These assignments and the database represent the culmination of many studies and include data from a large number of labs.

The site includes a User Notes page for each origin and we ask yeast researchers with further information for particular sites to add annotations.

This 'About' page contains the following sections:

Origin Records: Brief Description
Origin Records: Full Description
Amalgamating Data
Definitions and Abbreviations

Origin Records: Brief Description

Each proposed or confirmed origin site appears as a record in OriDB, with each record comprising 7 dynamically generated pages, displaying information under the following headings:

Origin Summary Information - origin location and Status; links to download the DNA sequence or to access that site on external databases SGD (Yeast Genome Database) and UCSC (Genome Browser); time of origin replication; origin activity in hydroxyurea.
Origin Summary Graphics - three standardized graphical representations of available data for the origin site. Clicking on these graphics accesses a fully interactive Chromosome Viewer tool that allows the user to specify display characteristics.
Origin Location Assignments - origin locations assigned for this origin by the available genome-wide studies.
Origin Sequence Elements - sequences of proposed or experimentally confirmed origin sequence elements. The essential (ACS or A elements) is shown in comparison to the known A element consensi. Identified B elements are also shown.
Phylogenetic Sequence Conservation - graphic illustrating phylogenetic sequence conservation of origin sequence elements amongst sensu stricto yeast species.
User Notes - manually curated notes entered by OriDB users.
References for this Origin - published studies that refer to this replication origin (under construction - please contact us for additions).

(top)

Origin Records: Full Description

This section gives more complete details of the information displayed for each origin.

Origin Record Header

Status and genomic location - whether this origin is Confirmed, Likely, or Dubious.
- Confirmed - origin has been cloned and tested by ARS assay, or has been detected by 2D gel analysis.
- Likely - origin identified by two (or more) microarray studies but not yet confirmed.
- Dubious - origin identified by only one microarray study.
Systematic Name - for confirmed origin sites systematic names have been assigned in consultation with the Saccharomyces Genome Database. Origins are named based upon the chromosome they lie on and the order of discovery. Generally these names are in agreement with the proARS names assigned by Wyrick et al. (2001). For origin sites that are Likely or Dubious (rather than confirmed) it is stated "No systematic name assigned".
Other names - including proARS names assigned by Wyrick et al. (2001).

Origin Summary information

This tab presents a summary of what is known about the origin site under the following headings:

Status - as above, giving the number of array studies that found this origin.
Genomic location - chromosome number and coordinate interval. For confirmed origins this interval specifies the cloned fragment (or else restriction sites used for 2D gel confirmation). For unconfirmed origins, this interval shows the location assigned by the highest-resolution microarray study that identified the origin, expanded by the estimated error for that study. The intergenic or genic location of the origin is given (see below for how this was assigned).
DNA sequence - link to display the 'Genomic Location' DNA sequence, using the genome sequence held at the UCSC genome browser.
Time of replication data - assigned by two different whole genome timing studies, Heavy:Light (Raghuraman et al. (2001)) and Copy Number (Yabuki et al. (2002)). For each study, timing data is presented in two formats, explained in the Definitions section below:
1. Absolute replication time.
2. Relative replication time (replication index).
Origin activity in HU - whether this origin site initiates replication in an S phase blocked using hydroxyurea, as determined by studies based on copy number (Yabuki et al. (2002)) and single-stranded DNA analysis (Feng et al. (2006)).
ID - unique origin reference number (mainly for database functionality; allows cross-referencing and page linking).

Origin Summary Graphics

This tab accesses a page allowing the display of three standardized graphic representations of DNA replication data for the origin site. Clicking on any of these three standard graphics opens a more interactive graphic window that allows the user to specify display characteristics. Note the Graphic displays require Macromedia Flash. User-definable properties in these graphics include:

Image Window - defines the width of the displayed image (in pixels).
Chromosome - select from the drop down menu the chromosome of interest.
Coordinate - enter the 5' and 3' coordinates of the region to display.
Microarray data to display - select which studies to view data for (no timing data will be displayed when viewing windows less than 5 kb).
Display Origin loci - select which origins to display based upon their Status.

Other information about these graphics:

Whether Transcription units are display depends on the chromosome window size being viewed and the image window size. Clicking on a transcription unit opens a new browser window showing the relevant SGD page.
Mouse over - moving the mouse over the graphic displays additional information.
Zooming in - clicking within the plot area zooms in 2x (i.e. halves the amount of chromosome displayed), and recenters around the point selected.
Recentering - clicking on the x-axis scale bar (below the plot area) recenters the plot on that chromosomal coordinate.

Origin Location Assignments

Origin locations as assigned by published genome-wide studies. The following studies are collated on this page:

ORC & Mcm Chromatin Immunoprecipitation (ChIP) study (Wyrick et al. (2001));
Heavy:Light Timing study (Raghuraman et al. (2001));
Copy Number Timing study (Yabuki et al. (2002));
single-stranded DNA in hydroxyurea study (Feng et al. (2006));
Comparative genomics-based origin identification study (Nieduszynski et al. (2006)).
(Other studies, where necessary).

For the ChIP study the location range displayed corresponds to the probe containing the proARS; for the other studies the peak location is given. For the Heavy: Light study the Confidence value is given, where 9 corresponds to the most likely origins and 1 to the least likely. These values have not been expanded to account for error values for the various studies.

Origin Sequence Elements

DNA sequence elements reported to be important for the function of the origin. At present this is limited to the following classes of element:

ARS Consensus Sequence (ACS or A element) - either proposed (proACS) or experimentally confirmed, shown aligned to the 11 bp ACS motif, the 17 bp expanded ACS and the motif generated from phylogenetically conserved ARS sequences (drawn pictorially as a LOGO). Note: this element is proposed to recruit the Origin Recognition Complex (ORC) and is generally essential for origin function.
B elements: including B1, B2, B3 and B4. Note: the B1 is proposed to aid the ACS in recruiting ORC; the function of the remaining B elements is less clear; B elements contribute to origin function, but are not essential.

Phylogenetic Sequence Conservation

Origin sequence elements are often phylogenetically conserved amoungst the closely related sensu stricto Saccharomyces species (Nieduszynski et al. (2006)). This tab indicates whether phylogenetic sequence conservation has been reported for any of the origin elements. 'Highly conserved' means that at least 12 out of 15 base pairs in the ACS are identical. Where conservation has been reported appropriate sequence alignments are presented (including sequences from Saccharomyces cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii and S. bayanus). Note: the alignment display requires Macromedia Flash. At June 2006 alignments have been uploaded for the following origins: ARS305, ARS306, ARS307 and ARS309.

Features of the alignment display:

Sequences shown represent the T-rich strand of the ACS.
Positions of identity across all species are highlighted in yellow and marked with a * below the alignment.
Bases that show identity to the equivalent S. cerevisiae base are shown in black with others shown in blue.
The position of sequence elements are marked above the alignment.

User notes

This tab presents manually curated information about the origin site. OriDB users can add further information about the origin using the link on this tab.

References for this origin

This tab lists references relevant to the origin site, under the following headings:

Microarray studies that identify this origin;
Studies that cloned this origin;
Studies that proposed an essential ACS element;
Studies that confirmed an essential ACS element;
Studies that analyzed this origin by 2D gel;
Studies that determined origin activation (firing) time;
Other studies relevant to this origin.

Only references that have been curated by OriDB will be listed - these references can be viewed by selecting the Yeast Origin References button present at the top of every page. To correct or add additional references please contact us.

(top)

Amalgamating Data

The majority of the information in this database is collated from 5 origin identification studies: 4 microarray-based studies by Raghuraman et al. (2001), Wyrick et al. (2001), Yabuki et al. (2002), and Feng et al. (2006), each of which produced a list of proposed origin sites; and a fourth by Nieduszynski et al. (2006), which produced a list of confirmed origin sites. What are the criteria used by OriDB to decide whether closely spaced origin location assignments made by various studies correspond to the same origin?

The process of merging the datasets from various studies is automated according to rules described below. After merging of datasets, each resulting proposed origin site is then automatically assigned a Status (Confirmed, Likely, or Dubious) describing our confidence that the site is a replication origin.

Sources of origin location data are ranked as follows, best first, based upon their estimated resolutions given in brackets (as determined in Nieduszynski et al. (2006)):

origins cloned and assayed by Nieduszynski et al. (2006) or another study (+/- 0 bp);
2D gel-confirmed origins (+/-0bp);
proARS origins proposed by the Wyrick et al. (2001) ChIP study ((+/- 500 bp);
origins proposed by the Yabuki et al. (2002) Copy Number timing study (+/- 3,500 bp);
origins proposed by the Feng et al. (2006) ssDNA/HU (wt or rad53) study (+/- 4,000 bp);
origins proposed by the Raghuraman et al. (2001) Heavy:Light timing study (+/- 7,500 bp).

To amalgamate the data, cloned origins are first added to the database and annotated according to whether each of the other studies identified the origin, taking into account the estimated resolution of the study concerned as explained below. Second, origins identified by 2D gel analysis (but not yet cloned) are added to the database and as before annotated with whether each of the four microarray-based studies identified the origin. Third, origins are added that were predicted by the highest-resolution microarray-based study (Wyrick et al. (2001)) but that have not yet been included in the database, again annotating each with whether the other microarray-based studies identified the site. Outstanding assignments from the remaining microarray-based datasets are then added in order of study resolution.

To assign whether origins predicted by two different studies are the same, the estimated errors from the two studies are summed. For origin sites described as a range (e.g. chr1:147533-147596) the range is expanded by adding appropriate errors to both ends. For origin sites described as points (chr2:229342) a range is generated by adding and subtracting appropriate errors. For example, origin sites identified by Raghuraman et al. (2001) and Yabuki et al. (2002) are assigned as the same if they lie 10,800bp apart, but distinct if they lie 11,100bp apart.

The automated data-merging process inevitably results in the annotation of some sites as replication origins that are probably not origins. In most cases, these correspond to 'false positives' found by only one study and are therefore automatically assigned the Status 'Dubious'.

For all sites, we present as much information as possible to allow the user to make their own informed assessment.

Data from additional studies will be added to the database as they become available.

(top)

Definitions and Abbreviations

ACS - ARS Consensus Sequence (also called A element): most replication origins contain a single essential A element, although a number of replication origins have been found to contain multiple redundant A elements [see Theis & Newlon (2001)].

ARS - Autonomously Replicating Sequence: a sequence that supports DNA replication on a plasmid and has the potential to behave as a replication origin. ARS sequences are not necessarily chromosomally active for replication initiation.

Coordinate system - Genomic locations are based upon the sequence at the UCSC genome browser. This sequence dates from Oct 2003 and benefits from being a static coordinate system. Sequence coordinates from published studies are presented using the authors coordinates and may therefore be slightly 'mis-aligned' from UCSC sequence.

DUE - DNA Unwinding Element

Intergenic and Genic Location Assignment - Origin sites were assigned as intergenic or genic using the following rules:

for origin locations that overlap a single intergenic space (by at least 30 bp) the origin is assumed to be contained within that intergenic space;
for origin locations that lie completely within a single gene (less than 30 bp of either neighbouring intergenic space) the origin is assumed to lie within that gene.

The nature of the intergenic space and the names of the flanking genes are given. If neither situation is true then the intergenic or genic location of the origin cannot be specified, and the statement is shown: Resolution insufficient to assign intergenic space.

Origin Locations - In all tables, graphics and pages (except the Origin Location Assignment tab), origin locations used are derived as follows:

Cloned origins - smallest cloned fragment with confirmed ARS activity.
2D gel - restriction fragment with origin activity (bubble arc).
ChIP microarray - proARS location coordinates with 500 bp added to each side to include estimated error.
Other microarray studies - peak location with additional sequence added either side to include estimated errors: 2,500 bp for Copy Number timing study; 4,000 bp for ssDNA in HU study; 7,500 bp for Heavy Light study.

Replication Index is a measure of the relative replication time. For each study (independently) the earliest replicating sequence is assigned the value 0 and the latest replicating sequence is assigned the value 1. Then each replication origin is assigned a value, based upon its replication time in minutes, representing the proportion of S phase to have elapsed. By definition the earliest replicating sequence in each study corresponds to a replication origin (ARS306 for the heavy:light study; ARSVII-888 for the copy number study).

Replication Time (Absolute) is given in minutes. For the Heavy:Light genome-wide timing study this value is minutes from cdc7-1 release at 23 C. In the case of the Copy Number genome-wide timing study this value is minutes from alpha factor release at 30 C.

Status - Origin locations are divided into three classes:

Confirmed - origins that have been cloned and detected by ARS assay or detected by 2D gels;
Likely - class of probable origin sites based upon identification by two (or more) microarray studies;
Dubious - sites proposed as origins by only one microarray-based study, which are therefore suspected not to be genuine origins.

Origins that have been cloned and ARS-assayed, but on large fragments ( >1.5 kb) are included in the database only if they have not been identified in any other way. They are given the Status 'Confirmed'.

(top)