LogoLogo
Cocktail Cloud
  • What is Cocktail Cloud?
  • Apply for Service
    • Using Cocktail Cloud
  • overview
    • Kubernetes and Cocktail Cloud
    • Understanding Concepts
      • Platform
      • Workspace
      • Cluster
      • Service Map
      • Image Build
      • Security
      • Monitoring
      • Catalog
  • Getting Started
    • Cluster Creation
      • AWS (EKS)
      • NCP (NKS)
      • Azure (AKS)
      • GCP (GKE)
      • ETC (Datacenter)
    • Cluster Registration
      • AWS (EKS)
      • NCP (NKS)
      • Azure (AKS)
      • GCP (GKE)
      • ETC (Datacenter)
    • Managing Cloud Provider
      • AWS
    • Creating a User
    • Create Service Map
    • Create Registry
      • Create Registry
      • External Registry Registration
        • Setting Up AWS ECR
        • Setting Up Azure ACR
        • Setting Up Docker Hub
        • Setting Up Docker Registry
        • Setting Up Google GCR
        • Setting Up Harbor
        • Setting Up Naver
        • Setting Up Quay
    • Create a Workspace
  • Cluster Backup and Restore
    • Cocktail Backup and Restore
    • Backup/Restore Preparations
      • AWS S3 Configuration
      • Azure Blob Storage Configuration
      • Google Cloud Storage Configuration
      • MinIO Configuration
    • Create storages
    • Backups
    • Restoration
    • Backup/Restore Overview
  • Log Service
    • Cocktail Log Service
    • Installation
      • Install Log Service
      • Registration Log Service
      • Install Log Agent
      • Install Log Operator
    • Setting
      • Change Opensearch Admin password
    • Application Logs
      • Application Management
      • Application logging
        • Automatic instrumentation of container logs
          • Java
          • Python
        • Manual measurement of file logs(SDK)
          • Java
          • Python
        • Manual measurement of file logs (Sidecar)
          • Fluent-bit
    • Container Logs
    • Cluster Audit Logs
    • Troubleshooting
  • CI/CD
    • Creating a Build Server
    • Build Image
    • Setting up a Pipeline
  • application
    • Catalog
    • Application Deployment
    • Configuration Information Creation
    • Volume Requests
    • Service Exposure
    • Ingress
    • Service Mesh Configuration
  • Platform Management
    • Multicluster Configuration
    • Cluster Management
    • Workspace Management
    • Security
    • Integrated Monitoring
  • API Management
    • API Token Issuance
    • API Issuance History
    • API Execution Logs
  • Certificate Management
    • Issuer Management
    • Private Certificate
    • Public Certificate
  • Deepening operations
    • Maintaining Login Session in Case of Inability with Ingress Configuration
    • Add Ingress Proxy Configuration
    • Accessing Harbor from a Server Other Than the Registry VM
    • Configuring Harbor with a Public SSL Certificate
Powered by GitBook

â“’2023. Acornsoft Corp. All rights reserved.

On this page
  • 1. What types of resources can be monitored?
  • 2. Where can monitoring information be accessed?
  • 3. How to check the cluster status?
  • 4. Checking Ingress Status
  • 5. Checking ETCD Status
  • 6. Checking Node Status
  • 7. Checking GPU Status
  • 8. Checking Namespace Status
  • 9. Checking Notification/Event History

Was this helpful?

Export as PDF
  1. Platform Management

Integrated Monitoring

PreviousSecurityNextAPI Token Issuance

Last updated 1 year ago

Was this helpful?

1. What types of resources can be monitored?

Cocktail Cloud utilizes over 200 metrics for resources and states in a multi-cluster environment, providing more than 100 monitoring panels.

Each panel is arranged in views for clusters, ingresses, ETCD, nodes, and namespaces. Additionally, an alarm/event page is provided to review alarms/events chronologically and maximize the visualization of the user platform's status.

2. Where can monitoring information be accessed?

Monitoring information in Cocktail Cloud can be accessed in the left-hand [Monitoring] menu. Sub-menus include clusters, ingresses, ETCD, nodes, GPUs, namespaces, and alarms/events.

3. How to check the cluster status?

The platform offers up-to-date status information at the cluster level. Key status information provided in the cluster view includes

  • Number of API server calls per second

  • CPU usage

  • Disk usage

  • Disk I/O speed

  • Memory usage

  • Restarted Pod tracking

  • Average request time over the last 10 minutes

  • Pod executions by status

  • Top 5 Pods with high CPU usage

  • Top 5 Pods with high memory usage

4. Checking Ingress Status

Ingress exposes HTTP and HTTPS paths from outside the cluster to internal services. It provides configuration options for externally accessible URLs, load balancing traffic, SSL/TLS termination, and name-based virtual hosting. Ingress plays a crucial role in the network area of services, making multidimensional monitoring essential.

The status information provided in the integrated dashboard's Ingress view includes

  • Ingress controller requests

  • Ingress controller connections

  • Ingress controller request success rate

  • Recent Ingress configuration reload success and failure

  • Ingress controller request trends

  • Ingress controller success rate trends

  • Network I/O trends

  • Average memory usage trends

  • Average CPU usage trends

5. Checking ETCD Status

The status information provided in the integrated dashboard's ETCD view includes

  • Presence of ETCD leader

  • Number of recent leader changes

  • Number of recent leader change proposal failures

  • RPC ratio

  • Database usage

  • Node disk processing speed

  • Overall disk processing speed

  • Client traffic In/Out

  • ETCD server-specific processing status

  • Network usage

  • Snapshot processing speed

6. Checking Node Status

The status information provided in the integrated dashboard's Node view includes

  • Cluster CPU usage frequency

  • Cluster memory usage

  • Cluster disk usage

  • Cluster network usage

  • Recent changes and current values of the file system's free space ratio

  • List of file systems and their usage

7. Checking GPU Status

The status information provided in the integrated dashboard's GPU view includes

  • Average GPU utilization

  • GPU usage trends

  • Average GPU memory utilization

  • GPU memory usage trends

  • GPU temperature and power

  • GPUs/MIGs

  • Timeslicing

8. Checking Namespace Status

The status information provided in the integrated dashboard's Namespace view includes

  • Number of containers

  • Namespace creation time

  • Total number of Pods in the namespace

  • Namespace PVC status

  • Namespace CPU allocation

  • Namespace memory allocation

  • Number of Pods running in the namespace

9. Checking Notification/Event History

The monitoring metrics displayed in the integrated dashboard are delivered through dashboard, SMS, and E-mail channels based on user configurations. Users can filter and view metrics by cluster, namespace, and major resource groups.

In the dashboard, events occurring in the past hour can be reviewed, and accumulated events per minute are provided with detailed event descriptions, enabling quick identification of the cause based on event content alone.

Each event is categorized into five levels of importance, and real-time notifications are sent through SMS or E-mail (or both) according to user preferences. Users can filter and view recently occurring events and notifications, with the option to retain data for up to one year based on user settings.

[Screen] Integrated Monitoring Dashboard Main
[Screen] Cluster Monitoring
[Screen] Ingress Monitoring
[Screen] ETCD Monitoring
[Screen] Node Monitoring
[Screen] GPU Monitoring
[Screen] Namespace Monitoring
[Screen] Alerts