Commit d11c8360 authored by Aurelien Brionne's avatar Aurelien Brionne
Browse files

vignette update

parent 663fd8ac
---
title: "An introduction to GenomeFeatures"
author:
- name: Aurelien Brionne
affiliation: Institut National de la Recherche Agronomique (INRA)
date: "`r format(Sys.time(), '%d %B, %Y')`"
output:
BiocStyle::html_document:
highlight: tango
vignette: >
%\VignetteIndexEntry{introduction}
%\VignetteEngine{knitr::rmarkdown}
%\usepackage[utf8]{inputenc}
bibliography: bibliography.bib
csl: bmc-genomics.csl
---
```{r setup,include=FALSE}
###################
# load library
library(GenomeFeatures)
###################
# knitr document options
knitr::opts_chunk$set(eval=T,echo=T, message=F,comment=NA,warning=F)
```
# Introduction
The purpose of the `GenomeFeatures` package is to produce a detailed genomic annotation from chromosome coordinates similar as `ChIPseeker::annotate` @ChIPseeker, but without prioritizing annotations.
The internal functions used in this package for a fast genomic annotation and data manipulations are based on the `AnnotationDbi`@AnnotationDbi,`GenomicRanges`,`IRanges`, `GenomicFeatures`@GenomicFeatures, and `data.table`@data.table R packages.
Plots of features distribution are based on `plotly`@plotly and `UpSetR`@UpSetR R packages.
In this vignette, we use only the chicken (galGal5) chromosome 1 (NC_006088.4) to illustrate `GenomeFeatures` results.
# Annotate
## Load the genomic coordinates targets
At the first step we read peaks files (also known bed files with genomic coordinates chr, start, end, strand) in a `base::list`, and convert it in GRanges object with `GenomeFeatures::readPeakFile`.
```{r peaks}
###################
# load peak files
peaks=base::list(
A=GenomeFeatures::readPeakFile(
base::system.file(
"extdata/data/input/A.bed",
package="GenomeFeatures"
)
),
B=GenomeFeatures::readPeakFile(
base::system.file(
"extdata/data/input/B.bed",
package="GenomeFeatures"
)
)
)
```
## Build a transcript database from a genome annotation file (GFF in this case)
For this vignette, we build the transcript database from the chicken gff with `GenomicFeatures::makeTxDbFromGFF` @GenomicFeatures.
```{r gff,eval=F}
###################
# create folders to store input and output results
base::dir.create("./data/input/",recursive = T)
base::dir.create("./data/output/")
###################
# import the GFF (galGal5 chicken GFF in this case)
utils::download.file("ftp://ftp.ncbi.nlm.nih.gov/genomes/Gallus_gallus/GFF/ref_Gallus_gallus-5.0_top_level.gff3.gz",
quiet=T,destfile = "./data/input/galGal5_gff.gz")
##################
# uncompress the file
R.utils::gunzip("./data/input/galGal5_gff.gz")
```
<u>Important note:</u> In this vignette, we use only a part of the chicken (galGal5) chromosome 1 (NC_006088.4) gff to illustrate `GenomeFeatures` results.
```{r TxDb_build}
###################
# build the database
toplevel=GenomicFeatures::makeTxDbFromGFF(
file= base::system.file(
"/extdata/data/input/galGal5.gff",
package="GenomeFeatures"
),
format="gff",
organism="Gallus gallus",
taxonomyId=9031,
dbxrefTag="ID"
)
```
Now, we build a convenient object with `GenomeFeatures::build_genome_features`, needed for find overlaps beetween all selected features and genomic coordinates targets into a single step (see below).
```{r features}
###################
# build genome features
features<-GenomeFeatures::build_genome_features(
toplevel,
features=base::c("promoter","UTR5","UTR3","exon","intron","cds","downstream"),
parameters = base::list(
promoter_ranges=base::list(
upstream=3000,
downstream=500
),
downstream_range=1000
)
)
```
We can display a short summary.
```{r features_display}
###################
# build genome features
features
```
## Find overlaps beetween genomic coordinates and features
Now we can find all overlaps beetween targets coordinates and the needed genomic features with `GenomeFeatures::genome_features_overlaps`.
```{r overlaps}
###################
# find overlapss
features_overlaps<-GenomeFeatures::genome_features_overlaps(
peaks,
features,
figs_path=base::system.file(
"/extdata/data/output/",
package="GenomeFeatures"
),
ignore.strand = T
)
```
We can display a short object summary.
```{r overlaps_display}
###################
# find overlaps
features_overlaps
```
# Features distribution
## Features plot
Bar and lines features plots are available with`GenomeFeatures::Plot`.
* bar
```{r bar,fig.width=6}
###################
# display bar
GenomeFeatures::Plot(features_overlaps,"bar")
```
* line
```{r lines,fig.width=6}
###################
# display lines
GenomeFeatures::Plot(features_overlaps,"line")
```
## Annotation table
The annotation table could be easily extract from the overlapped features object with `GenomeFeatures::Table`.
```{r table}
###################
# Extract the annotation table
annot<-GenomeFeatures::Table(features_overlaps)
###################
# display header
annot
```
## Features overlaps
We can also display the `UpSetR::upset` plot previously built by`GenomeFeatures::genome_features_overlaps`.
* A
<img src="`r base::system.file("extdata/data/output/upset_A.png",package="GenomeFeatures")`" alt="Drawing" style="width: 800px;"/>
* B
<img src="`r base::system.file("extdata/data/output/upset_B.png",package="GenomeFeatures")`" alt="Drawing" style="width: 800px;"/>
# References
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment