Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

IO Engine Cannot set affinity #1458

Open
sfxworks opened this issue Jul 19, 2023 · 11 comments
Open

IO Engine Cannot set affinity #1458

sfxworks opened this issue Jul 19, 2023 · 11 comments

Comments

@sfxworks
Copy link

Describe the bug
First install of mayastor, I'm getting a "Cannot set affinity" error

To Reproduce
Steps to reproduce the behavior:

  1. Install mayastor with values below
USER-SUPPLIED VALUES:
agents:
  core:
    capacity:
      thin:
        poolCommitment: 250%
        snapshotCommitment: 40%
        volumeCommitment: 40%
        volumeCommitmentInitial: 40%
    logLevel: info
    partialRebuildWaitPeriod: ""
    priorityClassName: ""
    resources:
      limits:
        cpu: 1000m
        memory: 128Mi
      requests:
        cpu: 500m
        memory: 32Mi
    tolerations: []
  ha:
    cluster:
      logLevel: info
      resources:
        limits:
          cpu: 100m
          memory: 64Mi
        requests:
          cpu: 100m
          memory: 16Mi
    enabled: true
    node:
      logLevel: info
      priorityClassName: ""
      resources:
        limits:
          cpu: 100m
          memory: 64Mi
        requests:
          cpu: 100m
          memory: 64Mi
      tolerations: []
apis:
  rest:
    logLevel: info
    priorityClassName: ""
    replicaCount: 1
    resources:
      limits:
        cpu: 100m
        memory: 64Mi
      requests:
        cpu: 50m
        memory: 32Mi
    service:
      nodePorts:
        http: 30011
        https: 30010
      type: ClusterIP
    tolerations: []
base:
  cache_poll_period: 30s
  default_req_timeout: 5s
  imagePullSecrets:
    enabled: false
    secrets:
    - name: login
  initContainers:
    containers:
    - command:
      - sh
      - -c
      - trap "exit 1" TERM; until nc -vzw 5 {{ .Release.Name }}-agent-core 50051;
        do date; echo "Waiting for agent-core-grpc services..."; sleep 1; done;
      image: busybox:latest
      name: agent-core-grpc-probe
    - command:
      - sh
      - -c
      - trap "exit 1" TERM; until nc -vzw 5 {{ .Release.Name }}-etcd {{.Values.etcd.service.port}};
        do date; echo "Waiting for etcd..."; sleep 1; done;
      image: busybox:latest
      name: etcd-probe
    enabled: true
  initCoreContainers:
    containers:
    - command:
      - sh
      - -c
      - trap "exit 1" TERM; until nc -vzw 5 {{ .Release.Name }}-etcd {{.Values.etcd.service.port}};
        do date; echo "Waiting for etcd..."; sleep 1; done;
      image: busybox:latest
      name: etcd-probe
    enabled: true
  initHaNodeContainers:
    containers:
    - command:
      - sh
      - -c
      - trap "exit 1" TERM; until nc -vzw 5 {{ .Release.Name }}-agent-core 50052;
        do date; echo "Waiting for agent-cluster-grpc services..."; sleep 1; done;
      image: busybox:latest
      name: agent-cluster-grpc-probe
    enabled: true
  initRestContainer:
    enabled: true
    initContainer:
    - command:
      - sh
      - -c
      - trap "exit 1" TERM; until nc -vzw 5 {{ .Release.Name }}-api-rest 8081; do
        date; echo "Waiting for REST API endpoint to become available"; sleep 1; done;
      image: busybox:latest
      name: api-rest-probe
  jaeger:
    agent:
      initContainer:
      - command:
        - sh
        - -c
        - trap "exit 1" TERM; until nc -vzw 5 -u {{.Values.base.jaeger.agent.name}}
          {{.Values.base.jaeger.agent.port}}; do date; echo "Waiting for jaeger...";
          sleep 1; done;
        image: busybox:latest
        name: jaeger-probe
      name: jaeger-agent
      port: 6831
    enabled: false
    initContainer: true
  logSilenceLevel: null
  metrics:
    enabled: true
    pollingInterval: 5m
csi:
  controller:
    logLevel: info
    priorityClassName: ""
    resources:
      limits:
        cpu: 32m
        memory: 128Mi
      requests:
        cpu: 16m
        memory: 64Mi
    tolerations: []
  image:
    attacherTag: v4.3.0
    provisionerTag: v3.5.0
    pullPolicy: IfNotPresent
    registrarTag: v2.8.0
    registry: registry.k8s.io
    repo: sig-storage
    snapshotControllerTag: v6.2.1
    snapshotterTag: v6.2.1
  node:
    kubeletDir: /var/lib/kubelet
    logLevel: info
    nvme:
      ctrl_loss_tmo: "1980"
      io_timeout: "30"
      keep_alive_tmo: ""
    pluginMounthPath: /csi
    priorityClassName: ""
    resources:
      limits:
        cpu: 100m
        memory: 128Mi
      requests:
        cpu: 100m
        memory: 64Mi
    socketPath: csi.sock
    tolerations: []
    topology:
      nodeSelector: false
      segments:
        openebs.io/csi-node: mayastor
earlyEvictionTolerations:
- effect: NoExecute
  key: node.kubernetes.io/unreachable
  operator: Exists
  tolerationSeconds: 5
- effect: NoExecute
  key: node.kubernetes.io/not-ready
  operator: Exists
  tolerationSeconds: 5
etcd:
  auth:
    rbac:
      allowNoneAuthentication: true
      create: false
      enabled: false
  autoCompactionMode: revision
  autoCompactionRetention: 100
  client:
    secureTransport: false
  clusterDomain: k8s.sfxworks
  debug: false
  extraEnvVars:
  - name: ETCD_QUOTA_BACKEND_BYTES
    value: "8589934592"
  initialClusterState: new
  nodeSelector: {}
  peer:
    secureTransport: false
  persistence:
    enabled: true
    reclaimPolicy: Delete
    size: 2Gi
    storageClass: nvme-replicated
  podAntiAffinityPreset: hard
  podLabels:
    app: etcd
    openebs.io/logging: "true"
  priorityClassName: ""
  removeMemberOnContainerTermination: true
  replicaCount: 3
  service:
    nodePorts:
      clientPort: 31379
      peerPort: ""
    port: 2379
    type: ClusterIP
  tolerations: []
  volumePermissions:
    enabled: true
eventing:
  enabled: true
image:
  pullPolicy: Always
  registry: harbor.home.sfxworks.net/docker
  repo: openebs
  repoTags:
    controlPlane: ""
    dataPlane: ""
    extensions: ""
  tag: release-2.2
io_engine:
  api: v1
  coreList: []
  cpuCount: "2"
  envcontext: ""
  logLevel: info
  nodeSelector:
    kubernetes.io/arch: amd64
    openebs.io/engine: mayastor
  priorityClassName: ""
  reactorFreezeDetection:
    enabled: false
  resources:
    limits:
      cpu: "2"
      hugepages2Mi: 2Gi
      memory: 1Gi
    requests:
      cpu: "2"
      hugepages2Mi: 2Gi
      memory: 1Gi
  target:
    nvmf:
      iface: ""
      ptpl: true
  tolerations: []
jaeger-operator:
  crd:
    install: false
  jaeger:
    create: false
  name: '{{ .Release.Name }}'
  priorityClassName: ""
  rbac:
    clusterRole: true
  tolerations: []
loki-stack:
  enabled: true
  loki:
    config:
      compactor:
        compaction_interval: 20m
        retention_delete_delay: 1h
        retention_delete_worker_count: 50
        retention_enabled: true
      limits_config:
        retention_period: 168h
    enabled: true
    initContainers:
    - command:
      - /bin/bash
      - -ec
      - chown -R 1001:1001 /data
      image: docker.io/bitnami/bitnami-shell:10
      imagePullPolicy: IfNotPresent
      name: volume-permissions
      securityContext:
        runAsUser: 0
      terminationMessagePath: /dev/termination-log
      terminationMessagePolicy: File
      volumeMounts:
      - mountPath: /data
        name: storage
    persistence:
      enabled: true
      reclaimPolicy: Delete
      size: 10Gi
      storageClassName: ""
    priorityClassName: ""
    rbac:
      create: true
      pspEnabled: false
    securityContext:
      fsGroup: 1001
      runAsGroup: 1001
      runAsNonRoot: false
      runAsUser: 1001
    service:
      nodePort: 31001
      port: 3100
      type: ClusterIP
    tolerations: []
  promtail:
    config:
      lokiAddress: http://{{ .Release.Name }}-loki:3100/loki/api/v1/push
      snippets:
        scrapeConfigs: |
          - job_name: {{ .Release.Name }}-pods-name
            pipeline_stages:
              - docker: {}
            kubernetes_sd_configs:
            - role: pod
            relabel_configs:
            - source_labels:
              - __meta_kubernetes_pod_node_name
              target_label: hostname
              action: replace
            - action: labelmap
              regex: __meta_kubernetes_pod_label_(.+)
            - action: keep
              source_labels:
              - __meta_kubernetes_pod_label_openebs_io_logging
              regex: true
              target_label: {{ .Release.Name }}_component
            - action: replace
              replacement: $1
              separator: /
              source_labels:
              - __meta_kubernetes_namespace
              target_label: job
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_name
              target_label: pod
            - action: replace
              source_labels:
              - __meta_kubernetes_pod_container_name
              target_label: container
            - replacement: /var/log/pods/*$1/*.log
              separator: /
              source_labels:
              - __meta_kubernetes_pod_uid
              - __meta_kubernetes_pod_container_name
              target_label: __path__
    enabled: true
    priorityClassName: ""
    rbac:
      create: true
      pspEnabled: false
    tolerations: []
nats:
  cluster:
    enabled: true
    replicas: 3
  nats:
    image:
      pullPolicy: IfNotPresent
      registry: ""
    jetstream:
      enabled: true
      fileStorage:
        enabled: false
      memStorage:
        enabled: true
        size: 5Mi
  natsbox:
    enabled: false
nodeSelector:
  kubernetes.io/arch: amd64
obs:
  callhome:
    enabled: true
    logLevel: info
    priorityClassName: ""
    resources:
      limits:
        cpu: 100m
        memory: 32Mi
      requests:
        cpu: 50m
        memory: 16Mi
    sendReport: true
    tolerations: []
  stats:
    logLevel: info
    resources:
      limits:
        cpu: 100m
        memory: 32Mi
      requests:
        cpu: 50m
        memory: 16Mi
    service:
      nodePorts:
        http: 90011
        https: 90010
      type: ClusterIP
operators:
  pool:
    logLevel: info
    priorityClassName: ""
    resources:
      limits:
        cpu: 100m
        memory: 32Mi
      requests:
        cpu: 50m
        memory: 16Mi
    tolerations: []
priorityClassName: ""
tolerations: []

Expected behavior

  1. Mayastor installs, ioengine runs

** OS info (please complete the following information):**

  • Distro: Arch
  • Kernel version: 6.3.9-arch1-1
  • MayaStor revision or container image: release-2.2

Additional context
One is fine.
I also tried giving mayastor dedicated cpus and running helm upgrade. This lead to an etcd issue though.

NAME                                          READY   STATUS             RESTARTS       AGE     IP              NODE                NOMINATED NODE   READINESS GATES
mayastor-agent-core-cdd744cf7-b2skc           2/2     Running            0              10h     10.0.2.233      epyc7713            <none>           <none>
mayastor-agent-ha-node-cs7qf                  1/1     Running            0              10h     192.168.0.100   home-2cf05d8a44a0   <none>           <none>
mayastor-agent-ha-node-hhq9k                  1/1     Running            0              10h     192.168.0.245   home-2cf05d8a449c   <none>           <none>
mayastor-agent-ha-node-v468b                  1/1     Running            1              10h     192.168.0.119   epyc-gigabyte       <none>           <none>
mayastor-agent-ha-node-xl25j                  1/1     Running            0              10h     192.168.0.149   epyc7713            <none>           <none>
mayastor-api-rest-69d59fcd7d-j5p5t            1/1     Running            0              10h     10.0.2.105      epyc7713            <none>           <none>
mayastor-csi-controller-884d9f8d8-x7hsc       3/3     Running            0              10h     192.168.0.149   epyc7713            <none>           <none>
mayastor-csi-node-dn5gp                       2/2     Running            0              10h     192.168.0.245   home-2cf05d8a449c   <none>           <none>
mayastor-csi-node-sr7rd                       2/2     Running            0              10h     192.168.0.149   epyc7713            <none>           <none>
mayastor-csi-node-x2pvp                       2/2     Running            0              10h     192.168.0.100   home-2cf05d8a44a0   <none>           <none>
mayastor-csi-node-x95dn                       2/2     Running            2              10h     192.168.0.119   epyc-gigabyte       <none>           <none>
mayastor-etcd-0                               1/1     Running            0              10h     10.0.2.166      epyc7713            <none>           <none>
mayastor-etcd-1                               1/1     Running            0              10h     10.0.0.15       home-2cf05d8a449c   <none>           <none>
mayastor-etcd-2                               0/1     CrashLoopBackOff   6 (81s ago)    8m26s   10.0.1.224      epyc-gigabyte       <none>           <none>
mayastor-io-engine-64ktf                      1/2     Error              5 (89s ago)    3m10s   192.168.0.149   epyc7713            <none>           <none>
mayastor-io-engine-ptt7w                      2/2     Running            0              10h     192.168.0.245   home-2cf05d8a449c   <none>           <none>
mayastor-io-engine-r4skq                      1/2     Error              5 (94s ago)    3m10s   192.168.0.100   home-2cf05d8a44a0   <none>           <none>
mayastor-io-engine-t274w                      1/2     Error              5 (110s ago)   3m10s   192.168.0.119   epyc-gigabyte       <none>           <none>
mayastor-loki-0                               1/1     Running            0              10h     10.0.2.20       epyc7713            <none>           <none>
mayastor-obs-callhome-6b66c87b45-tqzvj        1/1     Running            0              10h     10.0.2.72       epyc7713            <none>           <none>
mayastor-operator-diskpool-7cd4c6594f-2glmz   1/1     Running            0              10h     10.0.2.22       epyc7713            <none>           <none>
mayastor-promtail-24zg7                       1/1     Running            0              10h     10.0.8.206      home-2cf05d8a44a0   <none>           <none>
mayastor-promtail-gwngd                       0/1     Running            0              10h     10.0.5.2        soquartz-1          <none>           <none>
mayastor-promtail-mr52b                       0/1     Running            0              10h     10.0.1.9        soquartz-4          <none>           <none>
mayastor-promtail-nwcf5                       0/1     Running            0              10h     10.0.4.149      soquartz-2          <none>           <none>
mayastor-promtail-rmgdf                       0/1     Running            0              10h     10.0.0.77       soquartz-3          <none>           <none>
mayastor-promtail-wpn7z                       1/1     Running            0              10h     10.0.0.147      home-2cf05d8a449c   <none>           <none>
mayastor-promtail-xjpfc                       1/1     Running            1              10h     10.0.1.101      epyc-gigabyte       <none>           <none>
mayastor-promtail-zx7h9                       1/1     Running            0              10h     10.0.2.180      epyc7713            <none>           <none>
[2023-07-19T10:33:53.528339830+00:00  INFO io_engine:io-engine.rs:200] Engine responsible for managing I/Os version 1.0.0, revision 36b73467bd2a (v2.2.0)
[2023-07-19T10:33:53.528420989+00:00  INFO io_engine:io-engine.rs:179] free_pages 2MB: 2048 nr_pages 2MB: 2048
[2023-07-19T10:33:53.528425619+00:00  INFO io_engine:io-engine.rs:180] free_pages 1GB: 0 nr_pages 1GB: 0
[2023-07-19T10:33:53.528495798+00:00  INFO io_engine:io-engine.rs:232] kernel io_uring support: yes
[2023-07-19T10:33:53.528500798+00:00  INFO io_engine:io-engine.rs:236] kernel nvme initiator multipath support: yes
[2023-07-19T10:33:53.528519138+00:00  INFO io_engine::core::env:env.rs:786] loading mayastor config YAML file /var/local/io-engine/config.yaml
[2023-07-19T10:33:53.528526938+00:00  INFO io_engine::subsys::config:mod.rs:168] Config file /var/local/io-engine/config.yaml is empty, reverting to default config
[2023-07-19T10:33:53.528532698+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVMF_TCP_MAX_QUEUE_DEPTH value to '32'
[2023-07-19T10:33:53.528539548+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVME_QPAIR_CONNECT_ASYNC value to 'true'
[2023-07-19T10:33:53.528543038+00:00  INFO io_engine::subsys::config:mod.rs:216] Applying Mayastor configuration settings
EAL: FATAL: Cannot set affinity
EAL: Cannot set affinity
thread 'main' panicked at 'Failed to init EAL', io-engine/src/core/env.rs:627:13
stack backtrace:
   0: std::panicking::begin_panic
   1: io_engine::core::env::MayastorEnvironment::init
   2: io_engine::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.
@sfxworks sfxworks added the NEW New issue label Jul 19, 2023
@sfxworks
Copy link
Author

It looks like it doesn't respect kubelet cpu static policy.

@tiagolobocastro
Copy link
Contributor

tiagolobocastro commented Jul 25, 2023

hmm I'm not too falimiar with cpu policies but seems this may be true. @Abhinandan-Purkait ?
The io-engine tries to affinitize to the specified core list configure in the helm chart (default from your chart would be taken from core-count, so 1,2 I think).
Did you isolate cores 1 and 2? I wonder if that would sidestep the policy.

@mike-pisman
Copy link

Hi, I am getting the same error on 2 servers, while the third one managed to start the pod:

[2023-08-10T02:11:01.133568146+00:00  INFO io_engine:io-engine.rs:179] Engine responsible for managing I/Os version 1.0.0, revision b0734db654d8 (v2.0.0)
[2023-08-10T02:11:01.133812452+00:00  INFO io_engine:io-engine.rs:158] free_pages 2MB: 1024 nr_pages 2MB: 1024
[2023-08-10T02:11:01.133829859+00:00  INFO io_engine:io-engine.rs:159] free_pages 1GB: 0 nr_pages 1GB: 0
[2023-08-10T02:11:01.134049851+00:00  INFO io_engine:io-engine.rs:211] kernel io_uring support: yes
[2023-08-10T02:11:01.134079945+00:00  INFO io_engine:io-engine.rs:215] kernel nvme initiator multipath support: yes
[2023-08-10T02:11:01.134165623+00:00  INFO io_engine::core::env:env.rs:791] loading mayastor config YAML file /var/local/io-engine/config.yaml
[2023-08-10T02:11:01.134191763+00:00  INFO io_engine::subsys::config:mod.rs:168] Config file /var/local/io-engine/config.yaml is empty, reverting to default config
[2023-08-10T02:11:01.134213488+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVMF_TCP_MAX_QUEUE_DEPTH value to '32'
[2023-08-10T02:11:01.134239781+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVME_QPAIR_CONNECT_ASYNC value to 'true'
[2023-08-10T02:11:01.134251732+00:00  INFO io_engine::subsys::config:mod.rs:216] Applying Mayastor configuration settings
EAL: FATAL: Cannot set affinity
EAL: Cannot set affinity
thread 'main' panicked at 'Failed to init EAL', io-engine/src/core/env.rs:628:13
stack backtrace:
   0: std::panicking::begin_panic
   1: io_engine::core::env::MayastorEnvironment::init
   2: io_engine::main
note: Some details are omitted, run with `RUST_BACKTRACE=full` for a verbose backtrace.

I'm using microk8s and installed mayastor via add-on. Kubernetes version is 1.27 with mayastor 2.0.0.
The resources after creation:

~ ❯ kubectl get pod -n mayastor
NAME                                          READY   STATUS             RESTARTS      AGE
mayastor-csi-node-qwt5m                       2/2     Running            0             59m
mayastor-csi-node-l64bd                       2/2     Running            0             59m
etcd-wcckw7dkcs                               1/1     Running            0             58m
etcd-pcf79w5kxn                               1/1     Running            0             58m
mayastor-agent-core-f7ccf485-tzszv            1/1     Running            2 (57m ago)   59m
mayastor-operator-diskpool-5b4cfb555b-pht6l   1/1     Running            0             59m
mayastor-api-rest-bcb58d479-v7jm9             1/1     Running            0             59m
etcd-operator-mayastor-8574f998bc-q2z8z       1/1     Running            1 (55m ago)   59m
mayastor-csi-controller-6b867dd474-grwcw      3/3     Running            0             59m
mayastor-csi-node-m6ksd                       2/2     Running            4 (19m ago)   59m
etcd-s86jdxw5v8                               1/1     Running            2 (19m ago)   57m
mayastor-io-engine-9h6bg                      1/1     Running            2 (19m ago)   59m
mayastor-io-engine-bd8zz                      0/1     CrashLoopBackOff   5 (73s ago)   4m19s
mayastor-io-engine-szvcv                      0/1     CrashLoopBackOff   5 (50s ago)   4m6s

As you can see 2 mayastor-io-engine failing.

If not the core count, could that be the CPU frequency too low? The server that managed to start mayastor-io-engine runs at 3.0 Ghz, while the 2 servers that failed have a lower spec CPU running at 1.7 Ghz. I would not want to change the CPUs right now, so is there another way?

@tiagolobocastro
Copy link
Contributor

How many cpu cores on these 2 servers?

@mike-pisman
Copy link

mike-pisman commented Aug 10, 2023

I have allocated 8 cores, 16 GB of RAM, and 64 GB of space, on all 3 servers. I will try to add more cores - 32, and will get back with the results.


Update

Added 32 to cores to LXC container running microk8s. Rebooted the container and added RUST_BACKTRACE=full to the mayastor_io_engine daemon set. Getting the same error:

[2023-08-10T18:58:35.477169774+00:00  INFO io_engine:io-engine.rs:179] Engine responsible for managing I/Os version 1.0.0, revision b0734db654d8 (v2.0.0)
[2023-08-10T18:58:35.477449869+00:00  INFO io_engine:io-engine.rs:158] free_pages 2MB: 1024 nr_pages 2MB: 1024
[2023-08-10T18:58:35.477467622+00:00  INFO io_engine:io-engine.rs:159] free_pages 1GB: 0 nr_pages 1GB: 0
[2023-08-10T18:58:35.477682164+00:00  INFO io_engine:io-engine.rs:211] kernel io_uring support: yes
[2023-08-10T18:58:35.477713263+00:00  INFO io_engine:io-engine.rs:215] kernel nvme initiator multipath support: yes
[2023-08-10T18:58:35.477806753+00:00  INFO io_engine::core::env:env.rs:791] loading mayastor config YAML file /var/local/io-engine/config.yaml
[2023-08-10T18:58:35.477831688+00:00  INFO io_engine::subsys::config:mod.rs:168] Config file /var/local/io-engine/config.yaml is empty, reverting to default config
[2023-08-10T18:58:35.477856564+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVMF_TCP_MAX_QUEUE_DEPTH value to '32'
[2023-08-10T18:58:35.477875581+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVME_QPAIR_CONNECT_ASYNC value to 'true'
[2023-08-10T18:58:35.477896816+00:00  INFO io_engine::subsys::config:mod.rs:216] Applying Mayastor configuration settings
EAL: FATAL: Cannot set affinity
EAL: Cannot set affinity
thread 'main' panicked at 'Failed to init EAL', io-engine/src/core/env.rs:628:13
stack backtrace:
   0:     0x563edae8c63c - std::backtrace_rs::backtrace::libunwind::trace::h3fea1eb2e0ba2ac9
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/../../backtrace/src/backtrace/libunwind.rs:90:5
   1:     0x563edae8c63c - std::backtrace_rs::backtrace::trace_unsynchronized::h849d83492cbc0d59
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x563edae8c63c - std::sys_common::backtrace::_print_fmt::he3179d37290f23d3
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/sys_common/backtrace.rs:67:5
   3:     0x563edae8c63c - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::h140f6925cad14324
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/sys_common/backtrace.rs:46:22
   4:     0x563edaeb3a8c - core::fmt::write::h31b9cd1bedd7ea38
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/fmt/mod.rs:1150:17
   5:     0x563edae85485 - std::io::Write::write_fmt::h1fdf66f83f70913e
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/io/mod.rs:1667:15
   6:     0x563edae8e670 - std::sys_common::backtrace::_print::he7ac492cd19c3189
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/sys_common/backtrace.rs:49:5
   7:     0x563edae8e670 - std::sys_common::backtrace::print::hba20f8920229d8e8
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/sys_common/backtrace.rs:36:9
   8:     0x563edae8e670 - std::panicking::default_hook::{{closure}}::h714d63979ae18678
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:210:50
   9:     0x563edae8e227 - std::panicking::default_hook::hf1afb64e69563ca8
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:227:9
  10:     0x563edae8ed24 - std::panicking::rust_panic_with_hook::h02231a501e274a13
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:624:17
  11:     0x563edaa4c865 - std::panicking::begin_panic::{{closure}}::h7a63bfeb662f20ad
  12:     0x563edaa4a5e4 - std::sys_common::backtrace::__rust_end_short_backtrace::h4247f61ed8ce89f4
  13:     0x563eda2db9fc - std::panicking::begin_panic::h2a5b2d5b2df0b927
  14:     0x563eda63ed57 - io_engine::core::env::MayastorEnvironment::init::h00d4823a049822b2
  15:     0x563eda5313ec - io_engine::main::hf80554fcb427d3c4
  16:     0x563eda568183 - std::sys_common::backtrace::__rust_begin_short_backtrace::h4ead7c1f369eb43e
  17:     0x563eda53ebed - std::rt::lang_start::{{closure}}::h58a35d1e00786750
  18:     0x563edae8f32a - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h2790017aba790142
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/core/src/ops/function.rs:259:13
  19:     0x563edae8f32a - std::panicking::try::do_call::hd5d0fbb7d2d2d85d
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:403:40
  20:     0x563edae8f32a - std::panicking::try::h675520ee37b0fdf7
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:367:19
  21:     0x563edae8f32a - std::panic::catch_unwind::h803430ea0284ce79
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panic.rs:129:14
  22:     0x563edae8f32a - std::rt::lang_start_internal::{{closure}}::h3a398a8154de3106
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/rt.rs:45:48
  23:     0x563edae8f32a - std::panicking::try::do_call::hf60f106700df94b2
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:403:40
  24:     0x563edae8f32a - std::panicking::try::hb2022d2bc87a9867
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panicking.rs:367:19
  25:     0x563edae8f32a - std::panic::catch_unwind::hbf801c9d61f0c2fb
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/panic.rs:129:14
  26:     0x563edae8f32a - std::rt::lang_start_internal::hdd488b91dc742b96
                               at /rustc/59eed8a2aac0230a8b53e89d4e99d55912ba6b35/library/std/src/rt.rs:45:20
  27:     0x563eda532e42 - main
  28:     0x7f3dac00eded - __libc_start_main
  29:     0x563eda2fdf2a - _start
                               at /build/glibc-2.32/csu/../sysdeps/x86_64/start.S:120
  30:                0x0 - <unknown>

On the other server, that still has 8 cores, I get slightly different output

[2023-08-10T19:04:14.476441862+00:00  INFO io_engine:io-engine.rs:179] Engine responsible for managing I/Os version 1.0.0, revision b0734db654d8 (v2.0.0)
[2023-08-10T19:04:14.476619998+00:00  INFO io_engine:io-engine.rs:158] free_pages 2MB: 1024 nr_pages 2MB: 1024
[2023-08-10T19:04:14.476630074+00:00  INFO io_engine:io-engine.rs:159] free_pages 1GB: 0 nr_pages 1GB: 0
[2023-08-10T19:04:14.476755343+00:00  INFO io_engine:io-engine.rs:211] kernel io_uring support: yes
[2023-08-10T19:04:14.476788992+00:00  INFO io_engine:io-engine.rs:215] kernel nvme initiator multipath support: yes
[2023-08-10T19:04:14.476839572+00:00  INFO io_engine::core::env:env.rs:791] loading mayastor config YAML file /var/local/io-engine/config.yaml
[2023-08-10T19:04:14.476854233+00:00  INFO io_engine::subsys::config:mod.rs:168] Config file /var/local/io-engine/config.yaml is empty, reverting to default config
[2023-08-10T19:04:14.476863175+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVMF_TCP_MAX_QUEUE_DEPTH value to '32'
[2023-08-10T19:04:14.476872222+00:00  INFO io_engine::subsys::config::opts:opts.rs:151] Overriding NVME_QPAIR_CONNECT_ASYNC value to 'true'
[2023-08-10T19:04:14.476878751+00:00  INFO io_engine::subsys::config:mod.rs:216] Applying Mayastor configuration settings
PANIC in rte_eal_init():
Cannot set affinity
11: [io-engine(+0x13af2a) [0x563c69519f2a]]
10: [/nix/store/sbbifs2ykc05inws26203h0xwcadnf0l-glibc-2.32-46/lib/libc.so.6(__libc_start_main+0xed) [0x7f802e1d1ded]]
9: [io-engine(+0x36fe42) [0x563c6974ee42]]
8: [io-engine(+0xccc32a) [0x563c6a0ab32a]]
7: [io-engine(+0x37bbed) [0x563c6975abed]]
6: [io-engine(+0x3a5183) [0x563c69784183]]
5: [io-engine(+0x36e3ec) [0x563c6974d3ec]]
4: [io-engine(+0x47ae78) [0x563c69859e78]]
3: [/nix/store/8lijpmw0rwja558780llanxmmvr572zi-io-engine/lib/libspdk-bundle.so(+0x915ee) [0x7f802e58c5ee]]
2: [/nix/store/8lijpmw0rwja558780llanxmmvr572zi-io-engine/lib/libspdk-bundle.so(__rte_panic+0xb6) [0x7f802e5880b9]]
1: [/nix/store/8lijpmw0rwja558780llanxmmvr572zi-io-engine/lib/libspdk-bundle.so(rte_dump_stack+0x1b) [0x7f80310abfab]]```

@mike-pisman
Copy link

@tiagolobocastro Any ideas?

@tiagolobocastro
Copy link
Contributor

Is there some kind of limit to your lxc container to run on a subset of your cpus?
Also I noticed you're running on v2.0.0, could could move to 2.3.0, though I suspect that won't help in this case.

@mike-pisman
Copy link

I tried to install v2.3.0 from a chart and it did not help. There are no limits for LXC container. I decided to upgrade CPUs, and if it helps I will post an update.

@tiagolobocastro
Copy link
Contributor

tiagolobocastro commented Aug 13, 2023

If it doesn't help, would you be able to change io-engine container image to something else that would allow you to run this from the container:

grep Cpus_allowed_list /proc/self/status

Also, do you have a cpu manager policy of static?

@tiagolobocastro
Copy link
Contributor

I've tested this with lxd, and when we limit lxc containers to cpu, indeed I start to see the cpu allowed list being setup by lxc, example:

root@ksnode-2:~# grep Cpus_allowed_list /proc/self/status
Cpus_allowed_list: 2,9,12

In this case to get io-engine to run I had to change the cpu-list to those...
I think we may need to tweak the io-engine dataplane cpu affinity for it to be more compatible with lxd and similar configurations.

@mike-pisman
Copy link

mike-pisman commented Nov 28, 2023

@tiagolobocastro, sorry I forgot to update. I have replaced the CPUs, but that did not resolve the issue.

I think one of the issues I have experienced with k8s in LXC and various storage solutions, including ebs, ceph(csi driver), and others, was the inability to mount new drive inside the lxc container(even though it was privileged). Can't remember exactly why, but it seems like a limitation of LXD all together. I did find a post regarding this...

I ultimately just installed k8s bare bones on the server and most of those issues disappeared. I'm sure if I would try to run open ebs, it would work. So the issue is most likely related to running Kubernetes inside LXC.

@tiagolobocastro tiagolobocastro removed the NEW New issue label Jan 20, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants