-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
OTU table 18S #22
Comments
Hi Ramon, by looking at the number of OTUs you mention, I assume you are talking about the LGC 18S data. Anyway, the following is true for all datasets processed with SILVAngs. In principle you have everything you need already, just not in the right form. I recommend you have a look at #14 before you go on, it might shed some light on the OTU mapping SILVAngs does and what I in this context call 'metaOTUs'. The files you would be interested in are:
What you need to do is:
Some toolkits like QIIME provide scripts to do that (e.g. filter_fasta.py) Regarding your chimera question, SILVAngs does not perform chimera check. We had a very quick look at the sequence you mentioned but couldn't confirm your observation right away. Can you please send some more details on how you identified the chimera (looking at a tree, blasting against custom-curated in-house database, etc.)? Hope this helps. Best, |
Hi Ivo, Yes, I am talking about the LGC 18S rDNA OTU table. In most routines, you generate an OTU table, together with a list of the representative sequences of each OTU. There are different ways to select the "representative sequence" between the pool of very similar sequences included in each OTU. I guess the most used is to select the most common sequence. So, you should provide this list of reference sequences in order to make people's life much easier. I understand that we have the data to make it, but for you would be much easier and everybody will end with the same list of reference sequences. Second, it is fundamental to process your reference sequences through a chimera check routine. Chimeras do occur, are very frequent, and account for a large number of OTUs (generally at low abundance). They can be easily removed with several programs. So, why not to clean them? Regarding the chimera from my previous message, HWI-M02024:112:000000000-ACJ3F:1:1101:10006:19946 (corresponding to the sixth OTU in the OTU table), I know that is a chimera just by the results on a BLAST search: Here, Ramiro processed your reads to have an OTU table with reference sequences and without chimeras. After removing singletons, there are 18,503 OTUs, a number much lower than in your table (50,058). Best regards, Ramon El 21/09/2015, a las 19:03, Ivo escribi�:
Ramon Massana i Molera |
Hi Ramon, thank you for your feedback. The SILVAngs pipeline does not produce an OTU-by-sample table by default. The OTU table was requested at the OSD Analysis Workshop in March. Both the OSD Team and the SILVA Tram have invested a considerable amount of time in providing these tables as part of the result packages. We will consider providing a FASTA of the reference sequences from the OTU tables in the future, as per your request. At the moment our efforts are focused at the OSD 2015 datasets. In the mean time, the method I outlined above does guarantee that everyone ends up with the same FASTA file (the reference sequence is already selected by the pipeline). Minor complementary information: In step 2, please extract the sequences from the FASTA files and not from the CSV file! SILVAngs does not check for chimeras. The reason is that after extensive testing and comparisons, no tool seems to deliver reliable results. However, the coverage of a query sequence is considered during its classification. If a sequence is chimeric, it's likely that the alignment coverage will cause the sequence to be classified as 'No Relative'. This is only an indication and chimeras may well be classified like Your analysis is a valuable contribution to the OSD community. If you wish to discuss the chimera topic in length and compare approaches with the OSD analysis team, I would be happy to put you in touch. Best, |
Dear Ramon, |
Hi Ivo, Yes, I have not repplied to your last message. I thought it was from a couple of weeks ago, but now I realize that it was from one month ago! Sorry for the delay, I have been very busy these days. Coming to the OSD data, I still think it would be better that you create the reference sequences and, most importantly, it is very important to run standard chimera check programs. As it is, the output available from the OSD project on eukaryotes is very hard to work with. And, as I said befofe, Ramiro processed your reads to have an OTU table with reference sequences and without chimeras. After removing singletons, there are 18,503 OTUs, a number much lower than in your table (50,058). We are glad to share this OTU tabe to the eukaryotic consortium. Best regards Ramon El 22/10/2015, a las 15:47, Ivo escribió:
Ramon Massana i Molera |
Hi all,
I need to have the reference sequences from the OTU table. The OTU table has 96,764 OTUs. So I need to have 96,764 reference sequences.
Also, I am wondering if chimera check was processed in this dataset. In a very superficial look at the sequences I could spot easily very obvious chimeras. For instance, sequence HWI-M02024:112:000000000-ACJ3F:1:1101:10006:19946 (corresponding to the sixth OTU in the OTU table) in is a chimera between a copepod and a ascomycota.
Thanks for your help
Ramon Massana
The text was updated successfully, but these errors were encountered: