Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Warning: NVBLAS_CONFIG_FILE environment variable is NOT set #23

Open
tomchor opened this issue Oct 29, 2022 · 6 comments
Open

Warning: NVBLAS_CONFIG_FILE environment variable is NOT set #23

tomchor opened this issue Oct 29, 2022 · 6 comments

Comments

@tomchor
Copy link

tomchor commented Oct 29, 2022

Every time AMGX gets called I see the following errors:

julia> using AMGX
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is NOT set : relying on default config filename 'nvblas.conf'
[NVBLAS] Cannot open default config file 'nvblas.conf'
[NVBLAS] Config parsed
[NVBLAS] CPU Blas library need to be provided

I googled it but couldn't make much sense of what this means. Does this require any action on my end or can I just safely ignore it?

@navidcy
Copy link
Contributor

navidcy commented Oct 29, 2022

Perhaps on a similar note, is there a way as a developer to hide this [NVBLAS]-related info from all users?

@glwagner
Copy link

@glwagner
Copy link

I suspect the first three errors can be dealt with by

  1. Checking if the environment variable NVBLAS_CONFIG_FILE is set
  2. If not, looking somewhere for nvblas.conf (like the current working directory)
  3. If nvblas.conf is not found, invoking some reasonable default like the one in the docs (pasted below) by copying the default to file and then setting NVBLAS_CONFIG_FILE to point there

The final puzzle is how to find the CPU Blas library, and point to that in the default config file. I don't know how to do that.

The "typical" config file from the docs (seems like it needs interpretation):

#Copyright 2013 NVIDIA Corporation. All rights reserved.
# This is the configuration file to use NVBLAS Library
# Setup the environment variable NVBLAS_CONFIG_FILE to specify your own config
 file.
# By default, if NVBLAS_CONFIG_FILE is not defined,
# NVBLAS Library will try to open the file "nvblas.conf" in its current
 directory
# Example : NVBLAS_CONFIG_FILE /home/cuda_user/my_nvblas.conf
# Specify which output log file (default is stderr)
NVBLAS_LOGFILE nvblas.log
#Put here the CPU BLAS fallback Library of your choice
NVBLAS_CPU_BLAS_LIB libopenblas.so
#NVBLAS_CPU_BLAS_LIB libmkl_rt.so
# List of GPU devices Id to participate to the computation
# Use ALL if you want all your GPUs to contribute
# Use ALL0, if you want all your GPUs of the same type as device 0 to contribute
# However, NVBLAS consider that all GPU have the same performance and PCI
 bandwidth
# By default if no GPU are listed, only device 0 will be used
#NVBLAS_GPU_LIST 0 2 4
#NVBLAS_GPU_LIST ALL
NVBLAS_GPU_LIST ALL0
# Tile Dimension
NVBLAS_TILE_DIM 2048
# Autopin Memory
NVBLAS_AUTOPIN_MEM_ENABLED
#List of BLAS routines that are prevented from running on GPU (use for debugging
 purpose
# The current list of BLAS routines supported by NVBLAS are
# GEMM, SYRK, HERK, TRSM, SYMM, HEMM, SYR2K, HER2K,
#NVBLAS_GPU_DISABLED_SGEMM
#NVBLAS_GPU_DISABLED_DGEMM
#NVBLAS_GPU_DISABLED_CGEMM
#NVBLAS_GPU_DISABLED_ZGEMM
# Computation can be optionally hybridized between CPU and GPU
# By default, GPU-supported BLAS routines are ran fully on GPU
# The option NVBLAS_CPU_RATIO_<BLAS_ROUTINE> give the ratio [0,1]
# of the amount of computation that should be done on CPU
# CAUTION : this option should be used wisely because it can actually
# significantly reduced the overall performance if too much work is given to CPU
#NVBLAS_CPU_RATIO_CGEMM 0.07

@navidcy
Copy link
Contributor

navidcy commented Jan 26, 2023

indeed with a default.conf as above then:

$ export NVBLAS_CONFIG_FILE=default.conf

$ julia --project
               _
   _       _ _(_)_     |  Documentation: https://docs.julialang.org
  (_)     | (_) (_)    |
   _ _   _| |_  __ _   |  Type "?" for help, "]?" for Pkg help.
  | | | | | | |/ _` |  |
  | | |_| | | | (_| |  |  Version 1.8.4 (2022-12-23)
 _/ |\__'_|_|_|\__'_|  |
|__/                   |

julia> using AMGX
[NVBLAS] NVBLAS_CONFIG_FILE environment variable is set to 'default.conf'

However, when I exit Julia I get flooded with:

julia> exit()

signal (11): Segmentation fault
in expression starting at REPL[2]:1
fflush at /lib64/libc.so.6 (unknown line)
unknown function (ip: 0x7f6b811090bb)
unknown function (ip: 0x7f6b8110246e)
unknown function (ip: 0x7f6b8116e3f0)
__run_exit_handlers at /lib64/libc.so.6 (unknown line)
exit at /lib64/libc.so.6 (unknown line)
ijl_exit at /g/data/v45/nc3020/julia-1.8/src/jl_uv.c:641
exit at ./initdefs.jl:28 [inlined]
exit at ./initdefs.jl:29
jfptr_exit_48098 at /g/data/v45/nc3020/julia-1.8/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /g/data/v45/nc3020/julia-1.8/src/gf.c:2377 [inlined]
ijl_apply_generic at /g/data/v45/nc3020/julia-1.8/src/gf.c:2559
jl_apply at /g/data/v45/nc3020/julia-1.8/src/julia.h:1843 [inlined]
do_call at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:126
eval_value at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:215
eval_stmt_value at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:166 [inlined]
eval_body at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:594
jl_interpret_toplevel_thunk at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:750
jl_toplevel_eval_flex at /g/data/v45/nc3020/julia-1.8/src/toplevel.c:906
jl_toplevel_eval_flex at /g/data/v45/nc3020/julia-1.8/src/toplevel.c:850
eval_body at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:556
eval_body at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:522
jl_interpret_toplevel_thunk at /g/data/v45/nc3020/julia-1.8/src/interpreter.c:750
jl_toplevel_eval_flex at /g/data/v45/nc3020/julia-1.8/src/toplevel.c:906
ijl_toplevel_eval_in at /g/data/v45/nc3020/julia-1.8/src/toplevel.c:965
eval at ./boot.jl:368 [inlined]
eval_user_input at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:151
repl_backend_loop at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:247
start_repl_backend at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:232
#run_repl#47 at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:369
run_repl at /g/data/v45/nc3020/julia-1.8/usr/share/julia/stdlib/v1.8/REPL/src/REPL.jl:355
jfptr_run_repl_64854 at /g/data/v45/nc3020/julia-1.8/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /g/data/v45/nc3020/julia-1.8/src/gf.c:2377 [inlined]
ijl_apply_generic at /g/data/v45/nc3020/julia-1.8/src/gf.c:2559
#967 at ./client.jl:419
jfptr_YY.967_49733 at /g/data/v45/nc3020/julia-1.8/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /g/data/v45/nc3020/julia-1.8/src/gf.c:2377 [inlined]
ijl_apply_generic at /g/data/v45/nc3020/julia-1.8/src/gf.c:2559
jl_apply at /g/data/v45/nc3020/julia-1.8/src/julia.h:1843 [inlined]
jl_f__call_latest at /g/data/v45/nc3020/julia-1.8/src/builtins.c:774
#invokelatest#2 at ./essentials.jl:729 [inlined]
invokelatest at ./essentials.jl:726 [inlined]
run_main_repl at ./client.jl:404
exec_options at ./client.jl:318
_start at ./client.jl:522
jfptr__start_49949 at /g/data/v45/nc3020/julia-1.8/usr/lib/julia/sys.so (unknown line)
_jl_invoke at /g/data/v45/nc3020/julia-1.8/src/gf.c:2377 [inlined]
ijl_apply_generic at /g/data/v45/nc3020/julia-1.8/src/gf.c:2559
jl_apply at /g/data/v45/nc3020/julia-1.8/src/julia.h:1843 [inlined]
true_main at /g/data/v45/nc3020/julia-1.8/src/jlapi.c:575
jl_repl_entrypoint at /g/data/v45/nc3020/julia-1.8/src/jlapi.c:719
main at /g/data/v45/nc3020/julia-1.8/cli/loader_exe.c:59
__libc_start_main at /lib64/libc.so.6 (unknown line)
_start at /g/data/v45/nc3020/julia-1.8/julia (unknown line)
Allocations: 11859354 (Pool: 11854492; Big: 4862); GC: 4
Segmentation fault

@navidcy
Copy link
Contributor

navidcy commented Jan 26, 2023

But I am not sure what all these defaults do so is it safe just to enforce them just to avoid seeing the warnings?

@glwagner
Copy link

One key line is

NVBLAS_CPU_BLAS_LIB libopenblas.so

that might need to be correct, specific to your system

Also this line

NVBLAS_LOGFILE nvblas.log

it says the default is stderr (not nvblas.log). Maybe better to keep stderr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants