A Walkthrough: Oligotyping Interface on VAMPS

Sep 17, 2012 by     No Comments    Posted under: Oligotyping

VAMPS is now offering an oligotyping interface, which means, you can run oligotyping on publicly available datasets or datasets you uploaded on VAMPS without any hassle if you have a VAMPS account.

In this post you will find a step-by-step narration of running oligotyping on a selected genus of a dataset.

NOTE: VAMPS interface keeps changing for good. and it is pretty hard to keep this post updated based on the latest changes. This post will probably give you a good idea about what to expect, but please don’t be surprised if the interface on VAMPS is different than the screenshots here.

***

VAMPS (Visualization and Analysis of Microbial Population Structures) is “an integrated collection of tools for researchers to visualize and analyze data for microbial population structures and distributions“. It has been used to analyze datasets generated in MBL for a long time now and being used by almost 900 researchers. An overview of VAMPS is available here.

Andrew Voorhis, who is the current developer of VAMPS, recently implemented an interface so VAMPS users can run oligotyping on datasets that are available on VAMPS without having to download reads on their own computers. While running oligotyping locally requires some familiarity with UNIX terminal, interface Voorhis has implemented requires no more than mouse clicks (well, and knowing the data).

***

I used a publicly available dataset that is composed of samples from Great Sippewissett Marsh, near West Falmouth, Massachusetts. The project ID on VAMPS is KCK_LSM_Bv6v4. In this study water samples were collected from 7 stations for almost two years, and V4-v6 region of 16S rRNA genes were sequenced using 454. Here I will use only samples from Station 1 for simplicity reasons.

First, lets have a look what is the composition of samples from Station 1 looks like at genus level.

To achieve that I click ‘Community Visualization’ link on the left bar on http://vamps.mbl.edu, find the project KCK_LSM_Bv6v4 in the list, select relevant samples, specify that the resolution I want is ‘genus’ on the right side, and click “Compraison Bar Graphs” from the “Choose Visualizatoin Method” pull-down menu:

And this is what VAMPS show me:

One of the things one can argue by looking at this genus level composition is that the samples look fairly similar to each other at genus level. Most of the taxa are present through the two years period. A relevant question may be this:

Is there a variation we are missing in any of these genera at this level of resolution that oligotyping can explain better?“.

When you mouse-over those bars, VAMPS tell you what genus those colors represent. The green stuff that appears to be very abundant in every sample is actually Pelagibacter.

I will oligotype Pelagibacter in these samples to see if there is a variation within Pelagibacter in respect to any ecological parameters that might be associated with these samples.

***

Now I know the what genus I am interested in, I navigate myself to oligotyping page from the links on the left. A new page greets me, in which I can specify samples and the genus I am interested in for oligotyping. After choosing samples, I select Pelagibacter on the right, and click ‘next’.

When I click next, VAMPS pulls up all the reads that were classified as Pelagibacter in the samples I chose and creates a fasta file to perform alignment and entropy analysis. At this point the fasta file VAMPS generates contain reads that were represented by those green bars in the genus level composition figure. This is what VAMPS shows me when it is ready to go:

Next thing to do is to start the alignment and entropy entropy analysis by clicking that button above. This step may take a while and actually you can leave the page to come back when VAMPS sends you an e-mail to let you know that the entropy analysis is done.

When I get an e-mail and go back to VAMPS, it suggests me to examine the entropy analysis results and start the initial oligotyping analysis by choosing various parameters:

Examining entropy graph is and important step to decide how many initial components should be used. This is how the entropy graph looks:

These reads were sequenced from V6 towards V4 and then reverse complemented. So beginning of every read is actually the low quality end of it. That’s why there are more entropy noise at the beginning. You can actually see where V4, V5 and V6 regions are. V5, an island towards the middle of the figure that is separated from the rest with two conservative regions separates V6 and V4 regions. Seeing those two clean peaks at the left side of the entropy graph, I decide to start with 2 components (after the initial analysis I will most likely increase the number of components). I put in ’3′ for -s parameter (minimum number of samples an oligotype should appear), 1.0 for -a (minimum percent abundance of an oligotype at least in one sample), and click ‘start’ (please see article Command Line Parameters Explained for more information on the parameters).

In about ten minutes, VAMPS is done with the initial oligotyping. Page is updated with results:

“Oligotype Results Page” link goes to the HTML output generated to communicate oligotyping results. Using that HTML output, I should decide which components are required to perform a better decomposition. But even now, with only two components, Pelagibacter reads are decomposed int two major groups as you can see from the following stackbar figure, which I copied from the HTML output:

In this figure datasets are sorted by months. If you look at labels you can see that the organism represented by purple becomes more abundant during the 7th and 8th months compared to the colder months of the year, which are mostly on the left side of the graph.

These are the representative sequences of these two types aligned:

CLUSTAL 2.1 multiple sequence alignment

GC              TGGGCTTAAAGAGTTCGTAGGTGGTTGAAAAAGTTGGTGGTGAAATCCCAGAGCTTAACT 60
AT              TGGGCTTAAAGAGTTCGTAGGTGGTTGAAAAAGTTAGTGGTGAAATCCCAGAGCTTAACT 60
                *********************************** ************************

GC              CTGGAACTGCCATCAAAACTTTTCAGCTAGAGTATGATAGAGGAAAGCAGAATTTCTAGT 120
AT              CTGGAACTGCCATTAAAACTTTTCAGCTAGAGTATGATAGAGGAAAGCAGAATTTCTAGT 120
                ************* **********************************************

GC              GTAGAGGTGAAATTCGTAGATATTAGAAAGAATACCAATTGCGAAGGCAGCTTTCTGGAT 180
AT              GTAGAGGTGAAATTCGTAGATATTAGAAAGAATACCAATTGCGAAGGCAGCTTTCTGGAT 180
                ************************************************************

GC              CATTACTGACACTGAGGAACGAAAGCATGGGTAGCGAAGAGGATTAGATACCCTCGTAGT 240
AT              CATTACTGACACTGAGGAACGAAAGCATGGGTAGCGAAGAGGATTAGATACCCTCGTAGT 240
                ************************************************************

GC              CCATGCCGTAAACGATGTGTGTTAGACGTTGGAAATTTATTTTCAGTGTCGCAGGGAAAC 300
AT              CCATGCCGTAAACGATGTGTGTTAGACGTTGGAAATTTATTTTCAGTGTCGCAGGGAAAC 300
                ************************************************************

GC              CGATAAACACACCGCCTGGGGAGTACGACCGCAAGGTTAAAACTCAAATGAATTGACGGG 360
AT              CGATAAACACACCGCCTGGGGAGTACGACCGCAAGGTTAAAACTCAAATGAATTGACGGG 360
                ************************************************************

GC              GACCCGCACAAGTAGTGGAGCATGTGGTTTAATTCGAAGATACGCGCAGAACCTTACCAA 420
AT              GACCCGCACAAGTAGTGGAGCATGTGGTTTAATTCGAAGATACGCGCAGAACCTTACCAA 420
                ************************************************************

GC              CACTTGACATGTTCGTCGCGACTCTAAGAGATTAGAGTTTTCGGTTCGGCCGGACGAAAC 480
AT              CACTTGACATGTTCGTCGCGACTCTAAGAGATTAGAGTTTTCGGTTCGGCCGGACGAAAC 480
                ************************************************************

GC              AC 482
AT              AC 482
                **

They are only 2 nucleotide different from each other. In other words, 16S rRNA genes of these two organisms are %99.6 identical to each other at V4-V6 region, which is way beyond the clustering methods can identify. So I am already seeing some previously unexplained variation within Pelagibacter, but I am not done. Now I need to examine the initial results to find out within oligotype diversity that needs to be further decomposed, and what components should be included for the next round of oligotyping to achieve that. This is the most important step of oligotyping and requires careful supervision. I would like to talk about this in more detail, preferably in another post that is dedicated to this topic, so I will not go into it here. But simply, my examination indicates that I need to add 3 more components. I put those in, and click ‘start’ again:

When results are back with 2 highest entropy components expanded with the selection of 3 hand picked components, stackbar figure for samples look much better resolved:

Now, this is how the first three oligotypes that are represented by green, brown and light blue look like when they are aligned:

ATTCC           TGGGCTTAAAGAGTTCGTAGGTGGTTGAAAAAGTTAGTGGTGAAATCCCAGAGCTTAACT 60
ATTGC           TGGGCTTAAAGAGTTCGTAGGTGGTTGAAAAAGTTAGTGGTGAAATCCCAGAGCTTAACT 60
GTCCC           TGGGCTTAAAGAGTTCGTAGGTGGTTGAAAAAGTTGGTGGTGAAATCCCAGAGCTTAACT 60
                *********************************** ************************

ATTCC           CTGGAACTGCCATTAAAACTTTTCAGCTAGAGTATGATAGAGGAAAGCAGAATTTCTAGT 120
ATTGC           CTGGAACTGCCATTAAAACTTTTCAGCTAGAGTATGATAGAGGAAAGCAGAATTTCTAGT 120
GTCCC           CTGGAACTGCCATCAAAACTTTTCAGCTAGAGTATGATAGAGGAAAGCAGAATTTCTAGT 120
                ************* **********************************************

ATTCC           GTAGAGGTGAAATTCGTAGATATTAGAAAGAATACCAATTGCGAAGGCAGCTTTCTGGAT 180
ATTGC           GTAGAGGTGAAATTCGTAGATATTAGAAAGAATACCAATTGCGAAGGCAGCTTTCTGGAT 180
GTCCC           GTAGAGGTGAAATTCGTAGATATTAGAAAGAATACCAATTGCGAAGGCAGCTTTCTGGAT 180
                ************************************************************

ATTCC           CATTACTGACACTGAGGAACGAAAGCATGGGTAGCGAAGAGGATTAGATACCCTCGTAGT 240
ATTGC           CATTACTGACACTGAGGAACGAAAGCATGGGTAGCGAAGAGGATTAGATACCCTCGTAGT 240
GTCCC           CATTACTGACACTGAGGAACGAAAGCATGGGTAGCGAAGAGGATTAGATACCCTCGTAGT 240
                ************************************************************

ATTCC           CCATGCCGTAAACGATGTGTGTTAGACGTTGGAAATTTATTTTCAGTGTCGCAGGGAAAC 300
ATTGC           CCATGCCGTAAACGATGTGTGTTAGACGTTGGAAATTTATTTTCAGTGTCGCAGCGAAAG 300
GTCCC           CCATGCCGTAAACGATGTGTGTTAGACGTTGGAAATTTATTTTCAGTGTCGCAGGGAAAC 300
                ****************************************************** **** 

ATTCC           CGATAAACACACCGCCTGGGGAGTACGACCGCAAGGTTAAAACTCAAATGAATTGACGGG 360
ATTGC           CGATAAACACACCGCCTGGGGAGTACGACCGCAAGGTTAAAACTCAAATGAATTGACGGG 360
GTCCC           CGATAAACACACCGCCTGGGGAGTACGACCGCAAGGTTAAAACTCAAATGAATTGACGGG 360
                ************************************************************

ATTCC           GACCCGCACAAGTAGTGGAGCATGTGGTTTAATTCGAAGATACGCGCAGAACCTTACCAA 420
ATTGC           GACCCGCACAAGTAGTGGAGCATGTGGTTTAATTCGAAGATACGCGCAGAACCTTACCAA 420
GTCCC           GACCCGCACAAGTAGTGGAGCATGTGGTTTAATTCGAAGATACGCGCAGAACCTTACCAA 420
                ************************************************************

ATTCC           CACTTGACATGTTCGTCGCGACTCTAAGAGATTAGAGTTTTCGGTTCGGCCGGACGAAAC 480
ATTGC           CACTTGACATGTTCGTCGCGACTCTAAGAGATTAGAGTTTTCGGTTCGGCCGGACGAAAC 480
GTCCC           CACTTGACATGTTCGTCGCGACTCTAAGAGATTAGAGTTTTCGGTTCGGCCGGACGAAAC 480
                ************************************************************

ATTCC           AC 482
ATTGC           AC 482
GTCCC           AC 482
                **

They differ from each other only at 3 sites and they are in the same 97% cluster (actually they are even in the same 99% cluster).

***

From the oligotyping results page you can obtain all necessary files in TAB separated format to investigate results with third-party softwares, such as R or Excel.

You can reach oligotyping web interface on VAMPS via http://vamps.mbl.edu/oligotype/oligotype.php