Title: | Orthology vs Paralogy Relationships among Glutamine Synthetase from Plants |
---|---|
Description: | Tools to analyze and infer orthology and paralogy relationships between glutamine synthetase proteins in seed plants. |
Authors: | Elena Aledo [aut, cre, cph], Juan-Carlos Aledo [aut] |
Maintainer: | Elena Aledo <[email protected]> |
License: | GPL (>= 2) |
Version: | 0.1.5 |
Built: | 2024-11-24 06:30:23 UTC |
Source: | https://github.com/cranhaven/cranhaven.r-universe.dev |
155 x 155 square matrix (155 GS proteins from 45 seed plant species)
A_selected
A_selected
A matrix with 155 rows and 155 columns
It has been generated using the function orthG::mapTrees() and the reconciliation output file 'selected'. Verbigracia: orthG::mapTrees('./inst/extdata/selected') The reconciliation was carried out using RANGER-DTL with parameters D = 1, T = 10 and L = 1.
Angiosperms Gymnosperms Ferns
agf
agf
A dataframe with 275 rows (GS proteins) and 23 columns:
Reference number
Unique identification label of the protein/gen
Species
Acrogymnospermae, Angiospermae, Polypodiopsida
CDS sequence
Protein sequence
Unique three letter identification of the species
GS2, GS1a or GS1b_Ang, GS1b_Gym
isoelectric point
Ferns, GS2, GS1a, GS1b_Ang, GS1b_Gym
number of residues
position signal
prediction
seq pep
mit
chl
thy
amino acid at position 2
core
db
acc
uniprot
note
It has been manually curated by the authors
Angiosperms Gymnosperms
AngGym
AngGym
A dataframe with 155 rows (GS proteins) and 10 columns:
Reference number
Unique identification label of the protein/gen
Species
Acrogymnospermae or Angiospermae
Angiosperms: Amborellopsida, Liliopsida, Magnoliopsida; Gymnosperms: Ginkgoopsida, Cycadopsida, Gnetopsida, Pinopsida
CDS sequence
Protein sequence
Unique three letter identification of the species
Either GS2, GS1a or GS1b
Primitive angiosperms, Modern angiosperms, Ginkgo-Cycadales, Gnetales, Pinacea, Conifer II
It has been manually curated by the authors
Make a color vector for colouring tree tips
coltips(phy)
coltips(phy)
phy |
tree as a phylo object |
Each tip is given a color according to the nature of the isoform: green (GS2), blue (GS1a), brown (GS1b Gym), salmon (GS1b Ang), purple (other).
a color vector as long as the number of tips
coltips(ape::read.tree(text = "((Bdi, Sly), (Pp, Ap));"))
coltips(ape::read.tree(text = "((Bdi, Sly), (Pp, Ap));"))
Removes gaps in a given msa.
gapless_msa(msa, seqtype = 'AA', df = TRUE, sfile = FALSE)
gapless_msa(msa, seqtype = 'AA', df = TRUE, sfile = FALSE)
msa |
input alignment. |
seqtype |
the nature of the sequences: 'DNA' or 'AA'. |
df |
logical. When TRUE msa should be a matrix, when FALSE msa should be a string giving the path to a fasta file containing the alignment. |
sfile |
if different to FALSE, then it should be a string indicating the path to save a fasta alignment file. |
It should be noted that this function does not carry out the alignment itself.
an alignment without gaps in form of matrix or a file containing such an alignment in fasta format.
msa
gapless_msa(msa(sequences = c("APGW", "AGWC", "CWGA"),ids = c("a", "b", "c"))$ali)
gapless_msa(msa(sequences = c("APGW", "AGWC", "CWGA"),ids = c("a", "b", "c"))$ali)
Provides the requested GS sequence
getseqGS(phylo_id, molecule = "Prot")
getseqGS(phylo_id, molecule = "Prot")
phylo_id |
the unique sequence identifier |
molecule |
either "Prot" or "CDS" |
The identifier should be one of the 'phylo_id' from data(agf).
The requested sequence as a character string.
getseqGS("Pp_GS1b_2")
getseqGS("Pp_GS1b_2")
Finds the root of an unrooted phylogenetic tree by minimizing the relative deviation from the molecular clock.
madRoot(tree, output_mode = 'phylo')
madRoot(tree, output_mode = 'phylo')
tree |
unrooted tree string in newick format or a tree object of class 'phylo'. |
output_mode |
amount of information to return. If 'phylo' (default) only the rooted tree is returned. If 'stats' also a structure with the ambiguity index, clock cv, the minimum ancestor deviation and the number of roots. If 'full' also an unrooted tree object, the index of the root branch, the branch ancestor deviations and a rooted tree object. |
This function is a slight modification of the code provided by Tria et al at https://www.mikrobio.uni-kiel.de/de/ag-dagan/ressourcen.
a rooted tree and supplementary information if required.
Tria, F. D. K., Landan, G. and Dagan, T.
Tria, F. D. K., Landan, G. and Dagan, T. Nat. Ecol. Evol. 1, 0193 (2017).
a <- msa(sequences=c("RAPGT", "KMPGT", "ESGGT"), ids = letters[1:3])$ali rownames(a) <- letters[1:3] tr <- mltree(a)$tree rtr <- madRoot(tr)
a <- msa(sequences=c("RAPGT", "KMPGT", "ESGGT"), ids = letters[1:3])$ali rownames(a) <- letters[1:3] tr <- mltree(a)$tree rtr <- madRoot(tr)
Maps a gene/protein tree into a species tree
mapTrees(path2rec)
mapTrees(path2rec)
path2rec |
path to the file containing the reconciliation output. |
Mapping gene tree into species tree allow to infer the sequence of events (Duplication, Speciation, Transfer).
A list with three elements. The first one is a 'phylo' object where the nodelabels indicate the event: D, duplication or T transfer. If no label is shown is because the event correspond to speciation. The second element is a dataframe (the first column is the label of the internal nodes in the gene tree; the second column is the label of the internal nodes in the species tree, and the third and fourth columns label each internal node according to the inferred event). The third element of the list is an adjacency matrix: 1 when two proteins are orthologous, 0 if they are paralogous.
mapTrees(fs::path_package("extdata", "representatives", package = "orthGS"))
mapTrees(fs::path_package("extdata", "representatives", package = "orthGS"))
Given an alignment builds an ML tree.
mltree(msa, df = TRUE, gapl = TRUE, model = "WAG")
mltree(msa, df = TRUE, gapl = TRUE, model = "WAG")
msa |
input alignment. |
df |
logical. When TRUE msa should be a dataframe, when FALSE msa should be a string giving the path to a fasta file containing the alignment. |
gapl |
logical, when TRUE a gapless alignment is used. |
model |
allows to choose an amino acid models (see the function phangorn::as.pml) |
The function makes a NJ tree and then improvove it using an optimization procedure based on ML.
a ML optimized tree (and parameters)
gapless_msa
a <- msa(sequences=c("RAPGT", "KMPGT", "ESGGT"), ids = letters[1:3])$ali rownames(a) <- letters[1:3] tr <- mltree(a)$tree
a <- msa(sequences=c("RAPGT", "KMPGT", "ESGGT"), ids = letters[1:3])$ali rownames(a) <- letters[1:3] tr <- mltree(a)$tree
Aligns multiple protein, DNA or CDS sequences.
msa(sequences, ids = names(sequences), seqtype = "prot", sfile = FALSE, inhouse = FALSE)
msa(sequences, ids = names(sequences), seqtype = "prot", sfile = FALSE, inhouse = FALSE)
sequences |
vector containing the sequences as strings. |
ids |
character vector containing the sequences' ids. |
seqtype |
it should be either "prot" of "dna" or "cds" (see details). |
sfile |
if different to FALSE, then it should be a string indicating the path to save a fasta alignment file. |
inhouse |
logical, if TRUE the in-house MUSCLE software is used. It must be installed on your system and in the search path for executables. |
If seqtype is set to "cds" the sequences must not contain stop codons and they will be translated using the standard code. Afterward, the amino acid alignment will be used to lead the codon alignment.
Returns a list of four elements. The first one ($seq) provides the sequences analyzed, the second element ($id) returns the identifiers, the third element ($aln) provides the alignment in fasta format and the fourth element ($ali) gives the alignment in matrix format.
msa(sequences = c("APGW", "AGWC", "CWGA"),ids = c("a", "b", "c"))
msa(sequences = c("APGW", "AGWC", "CWGA"),ids = c("a", "b", "c"))
Infers GS orthogroups using tree reconciliation
orthG(set = "all")
orthG(set = "all")
set |
set of species of interest provided as a character vector either with the binomial or short code of the species (see data(sdf)). |
When set = "all", all the species in the database will be included.
A list with two elements. The first one is the adjacency matrix (1 for orthologous, 0 for paralogous). The second element is an orthogroup graph.
orthG(set = c("Pp", "Psy", "Psm", "Ap"))
orthG(set = c("Pp", "Psy", "Psm", "Ap"))
Searchs orthologous of a given protein within a set of selected species
orthP(phylo_id, set = "all")
orthP(phylo_id, set = "all")
phylo_id |
phylo_id of the query protein |
set |
set of species of interest provided as a character vector, either with the binomial or short code of the species (see details). |
When set = "all", the search will be carry out against all the species in the database.
A list with thee elements: 1. subtree of the relevant proteins; 2. vector color; 3. phylo_ids of the orthologous found.
orthP(phylo_id = "Pp_GS1a", set = c("Pp", "Psy", "Psm", "Ap"))
orthP(phylo_id = "Pp_GS1a", set = c("Pp", "Psy", "Psm", "Ap"))
155 GS proteins from 25 seed plants species and 41 GS proteins from 11 fern species
sdf
sdf
A dataframe with 196 rows (GS proteins) and 7 columns:
Reference number
Unique identification label of the protein
Species
Acrogymnospermae, Angiospermae or Polypodiopsida
Unique three letter identification of the species
Either GS2, GS1a, GS1b_Gym or GS1b_Ang. Here the ferns proteins have been forced to be either GS1a or GS2
Taxonomic group
It has been handly curated by the authors
155 GS proteins from 45 seed plants species Rooted using MAD (Minimal Ancestor Deviation)
selected_tr
selected_tr
An phylo object
It has been manually curated by the authors
Map binomial species name to short code species name and vice versa
speciesGS(sp)
speciesGS(sp)
sp |
set of species of interest (either binomial or short code name) |
The species set should be given as a character vector (see example)
A datafrane containing the information for the requested species.
speciesGS(c("Pinus pinaster", "Ath"))
speciesGS(c("Pinus pinaster", "Ath"))
Assembles a report regarding the GS proteins found in the indicated subset of species
subsetGS(sp)
subsetGS(sp)
sp |
set of species of interest (either binomial or short code name) |
This function returns the protein and DNA sequences of the different isoforms found in each species, along with other relevant data.
A dataframe with the information for the requested species.
subsetGS(c("Pinus pinaster", "Ath"))
subsetGS(c("Pinus pinaster", "Ath"))