MPOA - Documentation

Quick Start

Short overview how to run MPOA. For more details see steps below.

Execute MPOA

nextflow run replikation/MPOA --fastq '*.fastq.gz' --fasta '*.fasta' -profile local,docker

Installation

Install nextflow and docker or singularity.

1. Install Nextflow and Java

To run MPOA the workflow manager Nextflow and its dependency java run time (default-jre) needs to be installed.

Install java run time via:

sudo apt install -y default-jre

Install Nextflow (this creates a nextflow file at your current location):

curl -s https://get.nextflow.io | bash

Optional: Move the executable "nextflow" to the $PATH location (so you can execute it from everywhere):

sudo mv nextflow /bin && sudo chmod 770 /bin/nextflow

2. Install Docker

Full Docker installation can be found here for Ubuntu:

If you never worked with Docker we recommend to install it via apt install:

sudo apt install -y docker

sudo usermod -a -G docker $USER

Install Nextflow (this creates a nextflow file at your current location):

curl -s https://get.nextflow.io | bash

OR

2. Install Singularity

As an alternative to Docker you can also install and use Singularity, e.g. on a HPC.

To use Singularity follow the installion steps here.

Note, that with Singularity the following environment variables are automatically passed to the container to ensure execution on HPCs: HTTPS_PROXY, HTTP_PROXY, http_proxy, https_proxy, FTP_PROXY and ftp_proxy.

3. Validate the installation

 nextflow run replikation/MPOA --help

If everything is installed correctly, the MPOA help message should now be displayed in your terminal!

Execution

Analyse your samples.

Prepare Read Data

We need one read file per sample.

Make sure that you have one read file
(.fastq or .fastq.gz) per sample. You can combine multiple read files into one via:
cat *.fastq > sample1_all.fastq
or
cat *.fastq.gz > sample1_all.fastq.gz
Rename files

So MPOA can auto detect

Reads and Genome files are matched based on the first word before the first dot in their filename.
YES: Sample1.clean.fasta & Sample1.fastq.gz
NO: clean.Sample1.fasta & Sample1.fastq.gz
Open terminal

In current dir

Open your terminal where you have access to the fastq and fasta files.
Execute
MPOA
Now!

nextflow run replikation/MPOA --fasta '*.fasta' --fastq '*.fastq.gz' -profile local,docker -r '1.4.1'

Interpretation

Some explanations and interpretation helps.

CSV file with summary of ambiguous/masked positions

The default output is a CSV file summarizing the masked positions per sample. You will find the information on how many positions were masked by the respective IUPAC base (N, W, S, M, K, R, Y, B, D, H, V) in masked_bases_summary.csv . Depth masking (N) is performed if fewer than ten bases are present at one position. The masked positions on the entire genome and only on the chromosome are shown for each sample.

For example:

In sample2, we have 3108 positions masked by N on all contigs (genome), which means these positions exhibit a sequencing depth below 10. Let's take a look only at the chromosome contig. We see a massive reduction in masked and especially depth-masked positions, so we used only the chromosomal contig for our analysis.

 
                        name,type,N(ATCG),W(AT),S(CG),M(AC),K(TG),R(AG),Y(TC),B(TCG),D(ATG),H(ATC),V(ACG)
                        sample1,genome,1,0,0,1,0,14,12,0,0,0,0,
                        sample1,chromosome,1,0,0,1,0,12,9,0,0,0,0,
                        sample2,genome,3108,100,88,109,110,396,405,0,0,0,0,
                        sample2,chromosome,19,0,2,0,4,200,181,0,0,0,0,
                        sample3,genome,9534,6,14,20,10,310,293,0,0,0,0,
                        sample3,chromosome,17,0,4,4,0,266,254,0,0,0,0,

Sequence Logo

A graphical presentation of all ambiguous positions and their surrounding bases per analyzed sample. The overall height of each stack indicates the sequence conservation at a position (measured in bits) , whereas the height of symbols within the stack reflects the relative frequency of the corresponding base at that position. In default mode, the sequence logo shows five bases up and downstream of the ambiguous position masked by IUPAC-Code. The length of the logo can be modified using the --motif flag.

For example:

For sample BK16641_k14 we have 233 masked positions by R and 225 masked positions by Y. For all reagions masked by R and Y we see rough conserved motives RACG and CGTY.

Frequency Plot

A violin chart is created when the --frequency flag is used. The plot is build from all analyzed samples of the run. For each ambiguous position found by aligning reads to their reference, the percentage occurrence of the bases within the reads is calculated. The orientation of the strand (forward or reverse) is determined and displayed as data points in two plots per position.

For example:

This plot comprises 6,556 ambiguous positions from 33 samples. Each dot represents a base occurrence within the respective base combination at the ambiguous position. The main errors exist when the Basecaller need to decide between Purin bases (A and G) masked by IUPAC Code R and Pyrimidin bases (T and C) masked by IUPAC code Y. 3,311 positions were masked by R. The plot differntiate between forward and reverse strand. On the reverse strand Guanine is more frequently called (100-95%) as Adenine (0-5%). This leads to the conclusion that Guanine is the correct base in this position. It's not that clear on the forward strand. The basecaller mainly calles Adenine in this position and less frequent Guanine. This base ratio in the reads can results in erroneous assemblies. Since it is a strand-specific error, it points to base-modification.

Citation

Accurate bacterial outbreak tracing with Oxford Nanopore sequencing and reduction of methylation-induced errors

Mara Lohde, Gabriel E. Wagner, Johanna Dabernig-Heinz, Adrian Viehweger, Sascha D. Braun, Stefan Monecke,
Celia Diezel, Claudia Stein, Mike Marquet, Ralf Ehricht, Mathias W. Pletz, Christian Brandt

https://genome.cshlp.org/content/34/11/2039.long

If you are interested in the work of our group: click here

FAQ

Maybe we can help you.

This is a RAM related error.
This means your computer did run out of Memory. You could try in reducing the overall parallelistation of MPOA by adding the following flags for instance --cores 8 --max_cores 8 This will execute only one process after another and assigning only 8 threads to each. This should execute less process simultaneously. MPOA was tested on a 16 GB RAM laptop with 8 threads and was working fine.

MPOA Releases
If you want to run a specific release version of MPOA, use the -r flag and the version you want to execute.
For example:
nextflow run replikation/MPOA --fasta '*.fasta' --fastq '*.fastq.gz' -profile local,docker -r '1.4.1'

Head over to MPOA`s issue sections
Use search first, maybe someone had the same issue already. If not create a new issue.

Quick Start

Short overview how to run MPOA. For more details see steps below.

Installation

Execute MPOA

Interpretation

Installation

Install nextflow and docker or singularity.

If everything is installed correctly, the MPOA help message should now be displayed in your terminal!

Execution

Analyse your samples.

Prepare Read Data

We need one read file per sample.

Rename files

So MPOA can auto detect

Open terminal

In current dir

Execute MPOA Now!

Interpretation

Some explanations and interpretation helps.

Citation

Accurate bacterial outbreak tracing with Oxford Nanopore sequencing and reduction of methylation-induced errors

FAQ

Maybe we can help you.

MPOA fails with Exit 137

How to run the latest release

My question is not listed

Execute
MPOA
Now!