Architecture of Gscope

From Wikili
Revision as of 18:28, 8 January 2018 by Ripp (talk | contribs)
Jump to: navigation, search

Architecture of Gscope

To undestand how it is today we need a brief overview of the Historical Evolution or Evolutionary History of Gscope

Gscope from the begining

Odile Lecompte, Olivier Poch and Raymond Ripp had to annotate the genome of Pirococcus abyssi.

Starting with the DNA sequence of Pyrococcsu abyssi (1765120 bases) we determined the genes and tried to find the function of each protein.

For that we needed to have an interactive visualization tool allowing to show the sequences, blast outputs, multiple alignments and many other things.



The Pabyssi gscope project handles DNA and protein sequences. Each one is represented as a rectangular box on the GscopeBoard.

We called it a PAB (from Pyrococcus AByssi) (and were never able to find a more generic name ... it could be Box or SeqEntity or ???)

Each one had an id PAB0001, PAB0002, ... (Numerotation may not be consecutive)

The procedure ListeDesPABs returns the list of all this ids. We use very often :

 foreach Nom [ListeDesPABs] {
     DoSomething $Nom

Since Pabyssi I didn't change the name of this central procedure.

To give a name to each 'PAB' of a project we use a prefix (ex. PAB oe BOX or EHomsa) and a 1, 2, 3 4, .. digits PAB0001 EHoma12345

Gscope File Organisation

Each Gscope project (we call it MyProject) is located in one directories tree. Starting at RepertoireDuGenome (normally /genomics/link/MyProject)

In that directory you'll find the directories

  • nuctfa a fasta file for each nucleic sequence
  • nucembl a embl
  • prottfa a fasta file for each protein PAB
  • protembl a embl