arrow-left

All pages
gitbookPowered by GitBook
1 of 10

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

Loading...

GPU GR Engine Usage Trend

GPU Instances

GPU Temperature Trend

Monitoring GPU

GPU Temperature Trendchevron-right
GPU Framebuffer Memory Usage Trendchevron-right
GPU Power Consumption Trendchevron-right
GPU GR Engine Usage Trendchevron-right
GPU Tensor Core Usage Trendchevron-right
GPU GR Engine Activation Ratiochevron-right
GPU Framebuffer Memory Usagechevron-right
GPU Timeslicing Statuschevron-right
GPU Instanceschevron-right

GPU Power Consumption Trend

hashtag
GPU GR Engine Usage Trend

get

Retrieve the GR Engine usage trend per minute for each GPU.

Query parameters
clusterIdsstringRequired

Cluster Id

startstringRequired

Start date for the query

endstringRequired

End date for the query

Responses
chevron-right
200

OK

*/*
get
/v1/monitoring-gpu/metric/gr-engine-utilize/result/transition
200

OK

GET /v1/monitoring-gpu/metric/gr-engine-utilize/result/transition?clusterIds=text&start=text&end=text HTTP/1.1
Accept: */*
[
  {
    "metrics": {
      "cluster": "gpu-cluster",
      "gpu": "0",
      "model": "",
      "node": "cocktail-gpu-node-1"
    },
    "samples": [
      0,
      0,
      0
    ],
    "timestamps": [
      1727156220,
      1727156280,
      1727156340
    ],
    "values": null
  }
]

GPU Framebuffer Memory Usage Trend

GPU Tensor Core Usage Trend

GPU GR Engine Activation Ratio

hashtag
GPU Instances

get

Retrieve GPU instance information (GPU, MIG).

Query parameters
clusterIdsstringRequired

Cluster Id

Responses
chevron-right
200

OK

*/*
get
/v1/monitoring-gpu/metric/model/result/current
200

OK

GET /v1/monitoring-gpu/metric/model/result/current?clusterIds=text HTTP/1.1
Accept: */*
[
  {
    "metrics": {
      "UUID": "GPU-79e36614...",
      "cluster": "gpu-cluster",
      "gpu": "0",
      "measure": "allocate",
      "modelName": "NVIDIA A30",
      "node": "cocktail-gpu-node-1"
    },
    "timestamp": 1727159760,
    "value": 1
  }
]

GPU Framebuffer Memory Usage

hashtag
GPU Temperature Trend

get

Retrieve the temperature trend per minute for each GPU.

Query parameters
clusterIdsstringRequired

Cluster Id

startstringRequired

Start date for the query

endstringRequired

End date for the query

Responses
chevron-right
200

OK

*/*
get
/v1/monitoring-gpu/metric/temperature/result/transition
200

OK

GET /v1/monitoring-gpu/metric/temperature/result/transition?clusterIds=text&start=text&end=text HTTP/1.1
Accept: */*
[
  {
    "metrics": {
      "cluster": "gpu-cluster",
      "gpu": "0",
      "model": "",
      "node": "cocktail-gpu-node-1"
    },
    "samples": [
      40,
      39,
      39
    ],
    "timestamps": [
      1727156220,
      1727156280,
      1727156340
    ],
    "values": null
  }
]

GPU Timeslicing Status

hashtag
GPU Power Consumption Trend

get

Retrieve the power consumption trend per minute for each GPU.

Query parameters
clusterIdsstringRequired

Cluster Id

startstringRequired

Start date for the query

endstringRequired

End date for the query

Responses
chevron-right
200

OK

*/*
get
/v1/monitoring-gpu/metric/power/result/transition
200

OK

GET /v1/monitoring-gpu/metric/power/result/transition?clusterIds=text&start=text&end=text HTTP/1.1
Accept: */*
[
  {
    "metrics": {
      "cluster": "gpu-cluster",
      "gpu": "0",
      "model": "",
      "node": "cocktail-gpu-node-1"
    },
    "samples": [
      29.909,
      29.703,
      29.698
    ],
    "timestamps": [
      1727156220,
      1727156280,
      1727156340
    ],
    "values": null
  }
]

hashtag
GPU Framebuffer Memory Usage Trend

get

Retrieve the framebuffer memory usage trend per minute for each GPU.

Query parameters
clusterIdsstringRequired

Cluster Id

startstringRequired

Start date for the query

endstringRequired

End date for the query

Responses
chevron-right
200

OK

*/*
get
/v1/monitoring-gpu/metric/memory/result/transition
200

OK

GET /v1/monitoring-gpu/metric/memory/result/transition?clusterIds=text&start=text&end=text HTTP/1.1
Accept: */*
[
  {
    "metrics": {
      "cluster": "gpu-cluster",
      "gpu": "0",
      "model": "",
      "node": "cocktail-gpu-node-1"
    },
    "samples": [
      0,
      17,
      0
    ],
    "timestamps": [
      1727156220,
      1727156280,
      1727156340
    ],
    "values": null
  }
]

hashtag
GPU Tensor Core Usage Trend

get

Retrieve the Tensor Core usage trend per minute for each GPU.

Query parameters
clusterIdsstringRequired

Cluster Id

startstringRequired

Start date for the query

endstringRequired

End date for the query

Responses
chevron-right
200

OK

*/*
get
/v1/monitoring-gpu/metric/tensor/result/transition
200

OK

GET /v1/monitoring-gpu/metric/tensor/result/transition?clusterIds=text&start=text&end=text HTTP/1.1
Accept: */*
[
  {
    "metrics": {
      "cluster": "gpu-cluster",
      "gpu": "0",
      "model": "",
      "node": "cocktail-gpu-node-1"
    },
    "samples": [
      0,
      0,
      0
    ],
    "timestamps": [
      1727156220,
      1727156280,
      1727156340
    ],
    "values": null
  }
]

hashtag
GPU GR Engine Activation Ratio

get

Retrieve the average activation ratio of the GR Engine for each GPU.

Query parameters
clusterIdsstringRequired

Cluster Id

Responses
chevron-right
200

OK

*/*
get
/v1/monitoring-gpu/metric/gr-engine-rate/result/current
200

OK

GET /v1/monitoring-gpu/metric/gr-engine-rate/result/current?clusterIds=text HTTP/1.1
Accept: */*
[
  {
    "metrics": {
      "cluster": "gpu-cluster",
      "node": "cocktail-gpu-node-1u"
    },
    "timestamp": 1727158980,
    "value": 0
  }
]

hashtag
GPU Framebuffer Memory Usage

get

Retrieve the framebuffer memory usage rate for each GPU.

Query parameters
clusterIdsstringRequired

Cluster Id

Responses
chevron-right
200

OK

*/*
get
/v1/monitoring-gpu/metric/memory-rate/result/current
200

OK

GET /v1/monitoring-gpu/metric/memory-rate/result/current?clusterIds=text HTTP/1.1
Accept: */*
[
  {
    "metrics": {
      "cluster": "gpu-cluster",
      "node": "cocktail-gpu-node-1u"
    },
    "timestamp": 1727158980,
    "value": 0
  }
]

hashtag
GPU Timeslicing Status

get

Retrieve the Capacity and Used status when the GPU is used in Time-Slicing mode.

Query parameters
clusterIdsstringRequired

Cluster Id

Responses
chevron-right
200

OK

*/*
get
/v1/monitoring-gpu/metric/timeslicing/result/current
200

OK

GET /v1/monitoring-gpu/metric/timeslicing/result/current?clusterIds=text HTTP/1.1
Accept: */*
[
  {
    "metrics": {
      "UUID": "GPU-79e36614...",
      "cluster": "gpu-cluster",
      "gpu": "0",
      "measure": "allocate",
      "modelName": "",
      "node": "cocktail-gpu-node-1"
    },
    "timestamp": 1727159760,
    "value": 1
  }
]