You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hello gifrop team,
I installed gifrop through manual and when I run % gifrop --get_islands
This is gifrop 0.0.9
command issued:
/gss1/App_os7/miniconda3/envs/gifrop/bin/gifrop --get_islands
===== Dependencies check =====
parallel .... good
abricate .... good
Rscript .... good
find .... good
[1] "All required R packages were detected"
/gss2/home_new/xuefeng01/gff/gene_presence_absence.csv exist
found 3299 .gff files
WRANGLING SEQUENCE DATA...
making shortened gffs...
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:
To silence this citation notice: run 'parallel --citation' once.
found 3299 .gff files
extracting fastas from prokka gffs...
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:
To silence this citation notice: run 'parallel --citation' once.
DONE WRANGLING SEQUENCE DATA
EXECUTING Rscript 'gifrop_id.R'
[1] "loading packages"
Warning message:
package ‘dplyr’ was built under R version 4.2.3
Warning message:
package ‘tidyr’ was built under R version 4.2.3
Warning message:
package ‘readr’ was built under R version 4.2.3
Warning message:
package ‘purrr’ was built under R version 4.2.3
[1] "done loading packages"
Warning message:
One or more parsing issues, call problems() on your data frame for details,
e.g.:
dat <- vroom(...)
problems(dat)
[1] "reading in gffs..."
Joining with by = join_by(seqid, locus_tag)
Error in left_join():
! This join would result in more rows than dplyr can handle.
5723840911 rows would be returned. 2147483647 rows is the maximum number
allowed.
Double check your join keys. This error commonly occurs due to a missing join
key, or an improperly specified join condition.
Backtrace:
▆
Execution halted
DONE EXECUTING 'gifrop_id.R'
RUNNING ABRICATE ON THE ISLANDS
Using nucl database ncbi: 5386 sequences - 2023-Nov-4
Processing: All_islands.fasta
ERROR: 'All_islands.fasta' does not exist, or is unreadable
The text was updated successfully, but these errors were encountered:
It looks like this is a pretty large pangenome you are working with. Unfortunately gifrop isn't designed for use on very large pangenomes.
This portion of the error message is the real issue:
Error in left_join():
! This join would result in more rows than dplyr can handle.
5723840911 rows would be returned. 2147483647 rows is the maximum number
allowed.
My recommendation is to reduce the size of the pangenome you are working with, maybe focus on a subset of genomes you are interested in. Otherwise you may need to consider using a different tool that has been designed for very large datasets. I've had good luck with ppanggolin though you will need to do some of the classification steps that gifrop performs manually.
Hello gifrop team,
I installed gifrop through manual and when I run % gifrop --get_islands
This is gifrop 0.0.9
command issued:
/gss1/App_os7/miniconda3/envs/gifrop/bin/gifrop --get_islands
===== Dependencies check =====
parallel .... good
abricate .... good
Rscript .... good
find .... good
[1] "All required R packages were detected"
/gss2/home_new/xuefeng01/gff/gene_presence_absence.csv exist
found 3299 .gff files
WRANGLING SEQUENCE DATA...
making shortened gffs...
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:
Tange, O. (2024, May 22). GNU Parallel 20240522 ('Tbilisi').
Zenodo. https://doi.org/10.5281/zenodo.11247979
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#citation-notice
To silence this citation notice: run 'parallel --citation' once.
found 3299 .gff files
extracting fastas from prokka gffs...
Academic tradition requires you to cite works you base your article on.
If you use programs that use GNU Parallel to process data for an article in a
scientific publication, please cite:
Tange, O. (2024, May 22). GNU Parallel 20240522 ('Tbilisi').
Zenodo. https://doi.org/10.5281/zenodo.11247979
This helps funding further development; AND IT WON'T COST YOU A CENT.
If you pay 10000 EUR you should feel free to use GNU Parallel without citing.
More about funding GNU Parallel and the citation notice:
https://www.gnu.org/software/parallel/parallel_design.html#citation-notice
To silence this citation notice: run 'parallel --citation' once.
DONE WRANGLING SEQUENCE DATA
EXECUTING Rscript 'gifrop_id.R'
[1] "loading packages"
Warning message:
package ‘dplyr’ was built under R version 4.2.3
Warning message:
package ‘tidyr’ was built under R version 4.2.3
Warning message:
package ‘readr’ was built under R version 4.2.3
Warning message:
package ‘purrr’ was built under R version 4.2.3
[1] "done loading packages"
Warning message:
One or more parsing issues, call
problems()
on your data frame for details,e.g.:
dat <- vroom(...)
problems(dat)
[1] "reading in gffs..."
Joining with
by = join_by(seqid, locus_tag)
Error in
left_join()
:! This join would result in more rows than dplyr can handle.
5723840911 rows would be returned. 2147483647 rows is the maximum number
allowed.
Double check your join keys. This error commonly occurs due to a missing join
key, or an improperly specified join condition.
Backtrace:
▆
<env>
)<fn>
(<vctrs___>
)Execution halted
DONE EXECUTING 'gifrop_id.R'
RUNNING ABRICATE ON THE ISLANDS
Using nucl database ncbi: 5386 sequences - 2023-Nov-4
Processing: All_islands.fasta
ERROR: 'All_islands.fasta' does not exist, or is unreadable
The text was updated successfully, but these errors were encountered: