OriDB User Guide

This database summarizes our knowledge of replication origins in the budding yeast Saccharomyces cerevisiae. Each proposed origin site has been assigned a Status (Confirmed, Likely, or Dubious) expressing our confidence that the site genuinely corresponds to an origin. These assignments and the database represent the culmination of many studies and include data from a large number of labs.

The site includes a User Notes page for each origin and we ask yeast researchers with further information for particular sites to add annotations.

This 'About' page contains the following sections:


Origin Records: Brief Description

Each proposed or confirmed origin site appears as a record in OriDB, with each record comprising 7 dynamically generated pages, displaying information under the following headings:

(top)


Origin Records: Full Description

This section gives more complete details of the information displayed for each origin.

Origin Record Header

Origin Summary information

This tab presents a summary of what is known about the origin site under the following headings:

Origin Summary Graphics

This tab accesses a page allowing the display of three standardized graphic representations of DNA replication data for the origin site. Clicking on any of these three standard graphics opens a more interactive graphic window that allows the user to specify display characteristics. Note the Graphic displays require Macromedia Flash. User-definable properties in these graphics include:

Other information about these graphics:

Origin Location Assignments

Origin locations as assigned by published genome-wide studies. The following studies are collated on this page:

For the ChIP study the location range displayed corresponds to the probe containing the proARS; for the other studies the peak location is given. For the Heavy: Light study the Confidence value is given, where 9 corresponds to the most likely origins and 1 to the least likely. These values have not been expanded to account for error values for the various studies.

Origin Sequence Elements

DNA sequence elements reported to be important for the function of the origin. At present this is limited to the following classes of element:

Phylogenetic Sequence Conservation

Origin sequence elements are often phylogenetically conserved amoungst the closely related sensu stricto Saccharomyces species (Nieduszynski et al. (2006)). This tab indicates whether phylogenetic sequence conservation has been reported for any of the origin elements. 'Highly conserved' means that at least 12 out of 15 base pairs in the ACS are identical. Where conservation has been reported appropriate sequence alignments are presented (including sequences from Saccharomyces cerevisiae, S. paradoxus, S. mikatae, S. kudriavzevii and S. bayanus). Note: the alignment display requires Macromedia Flash. At June 2006 alignments have been uploaded for the following origins: ARS305, ARS306, ARS307 and ARS309.

Features of the alignment display:

User notes

This tab presents manually curated information about the origin site. OriDB users can add further information about the origin using the link on this tab.

References for this origin

This tab lists references relevant to the origin site, under the following headings:

Only references that have been curated by OriDB will be listed - these references can be viewed by selecting the Yeast Origin References button present at the top of every page. To correct or add additional references please contact us.

(top)


Amalgamating Data

The majority of the information in this database is collated from 5 origin identification studies: 4 microarray-based studies by Raghuraman et al. (2001), Wyrick et al. (2001), Yabuki et al. (2002), and Feng et al. (2006), each of which produced a list of proposed origin sites; and a fourth by Nieduszynski et al. (2006), which produced a list of confirmed origin sites. What are the criteria used by OriDB to decide whether closely spaced origin location assignments made by various studies correspond to the same origin?

The process of merging the datasets from various studies is automated according to rules described below. After merging of datasets, each resulting proposed origin site is then automatically assigned a Status (Confirmed, Likely, or Dubious) describing our confidence that the site is a replication origin.

Sources of origin location data are ranked as follows, best first, based upon their estimated resolutions given in brackets (as determined in Nieduszynski et al. (2006)):

  1. origins cloned and assayed by Nieduszynski et al. (2006) or another study (+/- 0 bp);
  2. 2D gel-confirmed origins (+/-0bp);
  3. proARS origins proposed by the Wyrick et al. (2001) ChIP study ((+/- 500 bp);
  4. origins proposed by the Yabuki et al. (2002) Copy Number timing study (+/- 3,500 bp);
  5. origins proposed by the Feng et al. (2006) ssDNA/HU (wt or rad53) study (+/- 4,000 bp);
  6. origins proposed by the Raghuraman et al. (2001) Heavy:Light timing study (+/- 7,500 bp).

To amalgamate the data, cloned origins are first added to the database and annotated according to whether each of the other studies identified the origin, taking into account the estimated resolution of the study concerned as explained below. Second, origins identified by 2D gel analysis (but not yet cloned) are added to the database and as before annotated with whether each of the four microarray-based studies identified the origin. Third, origins are added that were predicted by the highest-resolution microarray-based study (Wyrick et al. (2001)) but that have not yet been included in the database, again annotating each with whether the other microarray-based studies identified the site. Outstanding assignments from the remaining microarray-based datasets are then added in order of study resolution.

To assign whether origins predicted by two different studies are the same, the estimated errors from the two studies are summed. For origin sites described as a range (e.g. chr1:147533-147596) the range is expanded by adding appropriate errors to both ends. For origin sites described as points (chr2:229342) a range is generated by adding and subtracting appropriate errors. For example, origin sites identified by Raghuraman et al. (2001) and Yabuki et al. (2002) are assigned as the same if they lie 10,800bp apart, but distinct if they lie 11,100bp apart.

The automated data-merging process inevitably results in the annotation of some sites as replication origins that are probably not origins. In most cases, these correspond to 'false positives' found by only one study and are therefore automatically assigned the Status 'Dubious'.

For all sites, we present as much information as possible to allow the user to make their own informed assessment.

Data from additional studies will be added to the database as they become available.

(top)


Definitions and Abbreviations

ACS - ARS Consensus Sequence (also called A element): most replication origins contain a single essential A element, although a number of replication origins have been found to contain multiple redundant A elements [see Theis & Newlon (2001)].

ARS - Autonomously Replicating Sequence: a sequence that supports DNA replication on a plasmid and has the potential to behave as a replication origin. ARS sequences are not necessarily chromosomally active for replication initiation.

Coordinate system - Genomic locations are based upon the sequence at the UCSC genome browser. This sequence dates from Oct 2003 and benefits from being a static coordinate system. Sequence coordinates from published studies are presented using the authors coordinates and may therefore be slightly 'mis-aligned' from UCSC sequence.

DUE - DNA Unwinding Element

Intergenic and Genic Location Assignment - Origin sites were assigned as intergenic or genic using the following rules:

The nature of the intergenic space and the names of the flanking genes are given. If neither situation is true then the intergenic or genic location of the origin cannot be specified, and the statement is shown: Resolution insufficient to assign intergenic space.

Origin Locations - In all tables, graphics and pages (except the Origin Location Assignment tab), origin locations used are derived as follows:

Replication Index is a measure of the relative replication time. For each study (independently) the earliest replicating sequence is assigned the value 0 and the latest replicating sequence is assigned the value 1. Then each replication origin is assigned a value, based upon its replication time in minutes, representing the proportion of S phase to have elapsed. By definition the earliest replicating sequence in each study corresponds to a replication origin (ARS306 for the heavy:light study; ARSVII-888 for the copy number study).

Replication Time (Absolute) is given in minutes. For the Heavy:Light genome-wide timing study this value is minutes from cdc7-1 release at 23 C. In the case of the Copy Number genome-wide timing study this value is minutes from alpha factor release at 30 C.

Status - Origin locations are divided into three classes:

Origins that have been cloned and ARS-assayed, but on large fragments ( >1.5 kb) are included in the database only if they have not been identified in any other way. They are given the Status 'Confirmed'.

(top)