Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error: Unable to fetch any events from nvidia-smi: Error read |0: file already closed #24

Closed
NAshwinKumar opened this issue Sep 4, 2019 · 6 comments

Comments

@NAshwinKumar
Copy link

Can someone help in solving the issue

@deepujain
Copy link
Contributor

  1. Full stack trace and logs when you run nvidiagpubeat executable. ( Enclose in three back ticks otherwise logs will be unreadable )
  2. What branch did you build.
  3. Details on your environment.
  4. Share the output of nvidia-smi command
  5. What is output of ls /dev | grep nvidia | grep -v nvidia-uvm | grep -v nvidiactl | wc -l
  6. What is output of nvidiagpubeat --query-gpu=utilization.gpu,utilization.memory,memory.total,memory.free,memory.used,temperature.gpu,pstate --format=csv

@NAshwinKumar
Copy link
Author

  1. Full stack trace
2019-09-04T21:08:38.188+0530    INFO    instance/beat.go:607    Home path: [/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat] Config path: [/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat] Data path: [/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/data] Logs path: [/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/logs]
2019-09-04T21:08:38.188+0530    DEBUG   [beat]  instance/beat.go:659    Beat metadata path: /home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/data/meta.json
2019-09-04T21:08:38.188+0530    INFO    instance/beat.go:615    Beat ID: 68386e1f-0080-4249-ae78-5278a46d79ac
2019-09-04T21:08:38.189+0530    INFO    [beat]  instance/beat.go:903    Beat info       {"system_info": {"beat": {"path": {"config": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat", "data": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/data", "home": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat", "logs": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/logs"}, "type": "nvidiagpubeat", "uuid": "68386e1f-0080-4249-ae78-5278a46d79ac"}}}
2019-09-04T21:08:38.189+0530    INFO    [beat]  instance/beat.go:912    Build info      {"system_info": {"build": {"commit": "unknown", "libbeat": "7.3.2", "time": "1754-08-30T22:43:41.128Z", "version": "7.3.2"}}}
2019-09-04T21:08:38.189+0530    INFO    [beat]  instance/beat.go:915    Go runtime info {"system_info": {"go": {"os":"linux","arch":"amd64","max_procs":4,"version":"go1.12.9"}}}
2019-09-04T21:08:38.192+0530    INFO    [beat]  instance/beat.go:919    Host info       {"system_info": {"host": {"architecture":"x86_64","boot_time":"2019-09-03T20:40:11+05:30","containerized":false,"name":"linux-d4hc","ip":["127.0.0.1/8","::1/128","192.168.29.221/24","2405:201:e806:9f60:29d1:864b:af2b:f9f0/64","2405:201:e806:9f60:7a45:61ff:fec0:c319/64","fe80::7a45:61ff:fec0:c319/64"],"kernel_version":"4.12.14-lp151.27-default","mac":["c8:5b:76:68:99:f7","78:45:61:c0:c3:19"],"os":{"family":"","platform":"opensuse-leap","name":"openSUSE Leap","version":"15.1","major":15,"minor":1,"patch":0},"timezone":"IST","timezone_offset_sec":19800,"id":"1ae32b0454884a1cac7ab936ce597373"}}}
2019-09-04T21:08:38.193+0530    INFO    [beat]  instance/beat.go:948    Process info    {"system_info": {"process": {"capabilities": {"inheritable":null,"permitted":null,"effective":null,"bounding":["chown","dac_override","dac_read_search","fowner","fsetid","kill","setgid","setuid","setpcap","linux_immutable","net_bind_service","net_broadcast","net_admin","net_raw","ipc_lock","ipc_owner","sys_module","sys_rawio","sys_chroot","sys_ptrace","sys_pacct","sys_admin","sys_boot","sys_nice","sys_resource","sys_time","sys_tty_config","mknod","lease","audit_write","audit_control","setfcap","mac_override","mac_admin","syslog","wake_alarm","block_suspend","audit_read"],"ambient":null}, "cwd": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat", "exe": "/home/ashwin/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat/nvidiagpubeat", "name": "nvidiagpubeat", "pid": 2913, "ppid": 27508, "seccomp": {"mode":"disabled","no_new_privs":false}, "start_time": "2019-09-04T21:08:37.370+0530"}}}
2019-09-04T21:08:38.193+0530    INFO    instance/beat.go:292    Setup Beat: nvidiagpubeat; Version: 7.3.2
2019-09-04T21:08:38.194+0530    DEBUG   [beat]  instance/beat.go:318    Initializing output plugins
2019-09-04T21:08:38.194+0530    INFO    [index-management]      idxmgmt/std.go:178      Set output.elasticsearch.index to 'nvidiagpubeat-7.3.2' as ILM is enabled.
2019-09-04T21:08:38.194+0530    INFO    elasticsearch/client.go:170     Elasticsearch url: http://localhost:9200
2019-09-04T21:08:38.195+0530    DEBUG   [publisher]     pipeline/consumer.go:137        start pipeline event consumer
2019-09-04T21:08:38.195+0530    INFO    [publisher]     pipeline/module.go:97   Beat name: linux-d4hc
2019-09-04T21:08:38.196+0530    INFO    [monitoring]    log/log.go:118  Starting metrics logging every 30s
2019-09-04T21:08:38.196+0530    INFO    instance/beat.go:422    nvidiagpubeat start running.
2019-09-04T21:08:38.196+0530    INFO    beater/nvidiagpubeat.go:57      nvidiagpubeat is running for ** test ** environment. ! Hit CTRL-C to stop it.
2019-09-04T21:08:39.205+0530    ERROR   beater/nvidiagpubeat.go:75      Event not generated, error: Unable to fetch any events from nvidia-smi: Error read |0: file already closed
2019-09-04T21:08:40.207+0530    ERROR   beater/nvidiagpubeat.go:75      Event not generated, error: Unable to fetch any events from nvidia-smi: Error read |0: file already closed
^C2019-09-04T21:08:41.178+0530  DEBUG   [service]       service/service.go:53   Received sigterm/sigint, stopping
2019-09-04T21:08:41.178+0530    DEBUG   [publisher]     pipeline/client.go:149  client: closing acker
2019-09-04T21:08:41.178+0530    DEBUG   [publisher]     pipeline/client.go:151  client: done closing acker
2019-09-04T21:08:41.178+0530    DEBUG   [publisher]     pipeline/client.go:155  client: cancelled 0 events
2019-09-04T21:08:41.184+0530    INFO    [monitoring]    log/log.go:153  Total non-zero metrics  {"monitoring": {"metrics": {"beat":{"cpu":{"system":{"ticks":20,"time":{"ms":24}},"total":{"ticks":60,"time":{"ms":64},"value":60},"user":{"ticks":40,"time":{"ms":40}}},"handles":{"limit":{"hard":4096,"soft":1024},"open":5},"info":{"ephemeral_id":"f413aa9b-3ef6-4b77-998e-6e1f39166bc3","uptime":{"ms":3010}},"memstats":{"gc_next":4194304,"memory_alloc":1289056,"memory_total":3070424,"rss":23531520},"runtime":{"goroutines":8}},"libbeat":{"config":{"module":{"running":0}},"output":{"type":"elasticsearch"},"pipeline":{"clients":0,"events":{"active":0}}},"system":{"cpu":{"cores":4},"load":{"1":1.65,"15":2.23,"5":2.1,"norm":{"1":0.4125,"15":0.5575,"5":0.525}}}}}}
2019-09-04T21:08:41.185+0530    INFO    [monitoring]    log/log.go:154  Uptime: 3.016770154s
2019-09-04T21:08:41.185+0530    INFO    [monitoring]    log/log.go:131  Stopping metrics logging.
2019-09-04T21:08:41.185+0530    INFO    instance/beat.go:432    nvidiagpubeat stopped.
  • Beats branch : 7.3
  • nvidiagpubeat branch : withBeats7.3

OS: openSUSE Leap 15.1

ashwin@linux-d4hc:~> nvidia-smi
If 'nvidia-smi' is not a typo you can use command-not-found to lookup the package that contains it, like this:
    cnf nvidia-smi
ashwin@linux-d4hc:~> ls /dev | grep nvidia | grep -v nvidia-uvm | grep -v nvidiactl | wc -l
0
ashwin@linux-d4hc:~/Downloads/beats_dev/src/github.com/ebay/nvidiagpubeat> nvidiagpubeat --query-gpu=utilization.gpu,utilization.memory,memory.total,memory.free,memory.used,temperature.gpu,pstate --format=csv
Error: unknown flag: --query-gpu
Usage:
  nvidiagpubeat [flags]
  nvidiagpubeat [command]

Available Commands:
  export      Export current config or index template
  help        Help about any command
  keystore    Manage secrets keystore
  run         Run nvidiagpubeat
  setup       Setup index template, dashboards and ML jobs
  test        Test config
  version     Show current version info

Flags:
  -E, --E setting=value      Configuration overwrite
  -N, --N                    Disable actual publishing for testing
  -c, --c string             Configuration file, relative to path.config (default "nvidiagpubeat.yml")
      --cpuprofile string    Write cpu profile to file
  -d, --d string             Enable certain debug selectors
  -e, --e                    Log to stderr and disable syslog/file output
  -h, --help                 help for nvidiagpubeat
      --httpprof string      Start pprof http server
      --memprofile string    Write memory profile to this file
      --path.config string   Configuration path
      --path.data string     Data path
      --path.home string     Home path
      --path.logs string     Logs path
      --plugin pluginList    Load additional plugins
      --strict.perms         Strict permission checking on config files (default true)
  -v, --v                    Log at INFO level

Use "nvidiagpubeat [command] --help" for more information about a command.

@deepujain
Copy link
Contributor

#4 indicates nvidia-smi is not in PATH or not installed at all. nvidia-smi is NVIDIA GPU driver that can collect metrics from gpu cards.
I can add checks and throw appropriate error message, if this is the root cause of this issue.

#3 indicates 0 GPU Cards, on the current machine.

#5 was my typo. Can you run

nvidia-smi --query-gpu=utilization.gpu,utilization.memory,memory.total,memory.free,memory.used,temperature.gpu,pstate --format=csv
Error: unknown flag: --query-gpu```

@deepujain
Copy link
Contributor

I noticed this from your logs

2019-09-04T21:08:38.196+0530    INFO    instance/beat.go:422    nvidiagpubeat start running.
2019-09-04T21:08:38.196+0530    INFO    beater/nvidiagpubeat.go:57      nvidiagpubeat is running for ** test ** environment. ! Hit CTRL-C to stop it.

And you are running on Suse Linux.

https://github.com/eBay/nvidiagpubeat#run-in-test-environment-macos indicates that "test" mode is supported on MacOS. Test mode uses localnvidiasmi and that executable is built on and for MacOS.

@deepujain
Copy link
Contributor

Do you want to work on #25 ? It will fix current issue.

@NAshwinKumar
Copy link
Author

Thanks deepujain. Installing nvidia-smi solved the issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants