Elasticsearch OpenTelemetry 통합을 설치합니다.

업계 표준 프로토콜로 Elasticsearch 클러스터를 모니터링하려면 뉴렐릭 Elasticsearch OpenTelemetry 통합을 설치하세요. 이 가이드는 OpenTelemetry Collector 구성하여 Elasticsearch 인프라에서 범위 및 로그를 수집하고 이를 뉴렐릭으로 전송하는 방법을 안내합니다.

통합 기능을 설치하려면 다음 단계를 완료하십시오.

시작하기 전에 - 요구 사항 및 사전 조건을 확인하세요
OpenTelemetry Collector 구성 - 데이터 수집 설정
환경 변수 설정 - 인증 구성
데이터를 찾고 활용하세요 - 뉴럴릭에서 Elasticsearch 데이터를 확인하세요
알림 설정 - 사전 예방적 모니터링 구성

1단계: 시작하기 전에

다음 사항을 확인하십시오:

필수 접근 권한 - Elasticsearch 클러스터 관리자 권한 및 접근 권한이 있는 뉴렐릭 계정
Elasticsearch 버전 7.16 이상 - 이 통합 기능을 사용하려면 최신 Elasticsearch 클러스터가 필요합니다.
모니터 또는 클러스터 관리 권한 - 보안이 활성화된 경우 모니터 또는 클러스터 관리 권한이 필요합니다. 자세한 내용은 Elasticsearch 보안 권한 문서를 참조하세요.
네트워크 연결 - 뉴렐릭의 OTLP 수집 엔드포인트에 대한 아웃바운드 HTTPS 연결(포트 443)
OpenTelemetry Collector - Elasticsearch를 모니터링하려면 호스트에 OpenTelemetry Collector가 설치되어 실행 중이어야 합니다. 지원되는 배포판은 두 가지입니다.
- NRDOT (권장): 공식 NRDOT 설치 가이드를 따라 호스트에 수집기를 설치하십시오.
- OTel Contrib: OpenTelemetry Collector Contrib이 호스트에 설치되어 실행 중입니다. systemd 서비스 유닛이 올바르게 생성되도록 공식 패키지(.deb 또는 .rpm)를 통해 설치하십시오.
설정 값 준비 - 설정을 위해서는 두 가지 핵심 가치가 필요합니다.
- Elasticsearch 엔드포인트 - 실제 Elasticsearch URL (https://localhost:9200 을 대체하세요)
- Cluster 이름 - 클러스터를 뉴렐릭에서 식별하는 고유한 이름입니다.

팁

저희는 표준 커뮤니티 버전보다는 NRDOT(뉴렐릭의 OpenTelemetry 배포) 사용을 강력히 권장합니다. 뉴렐릭 소유 구성 요소로서 다음과 같습니다.

최적화: 뉴렐릭 백앤드를 통해 최대 성능을 발휘하도록 사전 구성되었습니다.
신뢰성 높음: 기업 환경에서의 안정성과 보안을 위해 광범위한 테스트를 거쳤습니다.
지원됨: 더 빠른 문제 해결, 해결을 위해 뉴렐릭 지원팀의 전폭적인 지원을 받습니다.

2단계: OpenTelemetry Collector 구성

Elasticsearch 클러스터에서 메트릭 및 로그 컬렉션을 구성하려면 NRDOT의 경우 /etc/nrdot-수집기/config.yaml 또는 Collector Contrib의 경우 /etc/otelcol-contrib/config.yaml에서 설정 파일을 생성하거나 업데이트하세요.

구성 방법은 Elasticsearch 설정 및 모니터링 요구 사항에 따라 달라집니다. 아래에서 적절한 설정을 선택하세요.

다음과 같은 경우 여기에서 시작하세요: 인증 또는 SSL이 적용되지 않은 보안되지 않은 Elasticsearch 클러스터가 있는 경우.

이 설정은 인증 없이 Elasticsearch 및 호스트 시스템에서 포괄적인 정보를 수집합니다.

중요

endpoint 값을 Elasticsearch 클러스터 입체포인트로 바꾸고 프로세서 블록의 elasticsearch.cluster.name 고유한 이름으로 업데이트하여 뉴렐릭에서 클러스터를 고유하게 식별하세요.

# =================================================================================================
# OpenTelemetry Collector Configuration for Elasticsearch and Host
# This configuration collects metrics and logs for a complete observability solution.
# =================================================================================================
# -------------------------------------------------------------------------------------------------
# Receivers
# Receivers define how data gets into the Collector. This config uses four receivers:
# - elasticsearch: to scrape metrics from the Elasticsearch API
# - hostmetrics: to collect system-level metrics from the host itself
# - filelog: to tail Elasticsearch log files
# -------------------------------------------------------------------------------------------------
receivers:
  elasticsearch:
    endpoint: "http://localhost:9200"
    collection_interval: 15s
    metrics:
      elasticsearch.os.cpu.usage:
        enabled: true
      elasticsearch.cluster.data_nodes:
        enabled: true
      elasticsearch.cluster.health:
        enabled: true
      elasticsearch.cluster.in_flight_fetch:
        enabled: true
      elasticsearch.cluster.nodes:
        enabled: true
      elasticsearch.cluster.pending_tasks:
        enabled: true
      elasticsearch.cluster.shards:
        enabled: true
      elasticsearch.cluster.state_update.time:
        enabled: true
      elasticsearch.index.documents:
        enabled: true
      elasticsearch.index.operations.merge.current:
        enabled: true
      elasticsearch.index.operations.time:
        enabled: true
      elasticsearch.node.cache.count:
        enabled: true
      elasticsearch.node.cache.evictions:
        enabled: true
      elasticsearch.node.cache.memory.usage:
        enabled: true
      elasticsearch.node.shards.size:
        enabled: true
      elasticsearch.node.cluster.io:
        enabled: true
      elasticsearch.node.documents:
        enabled: true
      elasticsearch.node.disk.io.read:
        enabled: true
      elasticsearch.node.disk.io.write:
        enabled: true
      elasticsearch.node.fs.disk.available:
        enabled: true
      elasticsearch.node.fs.disk.total:
        enabled: true
      elasticsearch.node.http.connections:
        enabled: true
      elasticsearch.node.ingest.documents.current:
        enabled: true
      elasticsearch.node.ingest.operations.failed:
        enabled: true
      elasticsearch.node.open_files:
        enabled: true
      elasticsearch.node.operations.completed:
        enabled: true
      elasticsearch.node.operations.current:
        enabled: true
      elasticsearch.node.operations.get.completed:
        enabled: true
      elasticsearch.node.operations.time:
        enabled: true
      elasticsearch.node.shards.reserved.size:
        enabled: true
      elasticsearch.index.shards.size:
        enabled: true
      elasticsearch.os.cpu.load_avg.1m:
        enabled: true
      elasticsearch.os.cpu.load_avg.5m:
        enabled: true
      elasticsearch.os.cpu.load_avg.15m:
        enabled: true
      elasticsearch.os.memory:
        enabled: true
      jvm.gc.collections.count:
        enabled: true
      jvm.gc.collections.elapsed:
        enabled: true
      jvm.memory.heap.max:
        enabled: true
      jvm.memory.heap.used:
        enabled: true
      jvm.memory.heap.utilization:
        enabled: true
      jvm.threads.count:
        enabled: true
      elasticsearch.index.segments.count:
        enabled: true
      elasticsearch.index.operations.completed:
        enabled: true
      elasticsearch.node.script.cache_evictions:
        enabled: false
      elasticsearch.node.cluster.connections:
        enabled: false
      elasticsearch.node.pipeline.ingest.documents.preprocessed:
        enabled: false
      elasticsearch.node.thread_pool.tasks.queued:
        enabled: false
      elasticsearch.cluster.published_states.full:
        enabled: false
      jvm.memory.pool.max:
        enabled: false
      elasticsearch.node.script.compilation_limit_triggered:
        enabled: false
      elasticsearch.node.shards.data_set.size:
        enabled: false
      elasticsearch.node.pipeline.ingest.documents.current:
        enabled: false
      elasticsearch.cluster.state_update.count:
        enabled: false
      elasticsearch.node.fs.disk.free:
        enabled: false
      jvm.memory.nonheap.used:
        enabled: false
      jvm.memory.pool.used:
        enabled: false
      elasticsearch.node.translog.size:
        enabled: false
      elasticsearch.node.thread_pool.threads:
        enabled: false
      elasticsearch.cluster.state_queue:
        enabled: false
      elasticsearch.node.translog.operations:
        enabled: false
      elasticsearch.memory.indexing_pressure:
        enabled: false
      elasticsearch.node.ingest.documents:
        enabled: false
      jvm.classes.loaded:
        enabled: false
      jvm.memory.heap.committed:
        enabled: false
      elasticsearch.breaker.memory.limit:
        enabled: false
      elasticsearch.indexing_pressure.memory.total.replica_rejections:
        enabled: false
      elasticsearch.breaker.memory.estimated:
        enabled: false
      elasticsearch.cluster.published_states.differences:
        enabled: false
      jvm.memory.nonheap.committed:
        enabled: false
      elasticsearch.node.translog.uncommitted.size:
        enabled: false
      elasticsearch.node.script.compilations:
        enabled: false
      elasticsearch.node.pipeline.ingest.operations.failed:
        enabled: false
      elasticsearch.indexing_pressure.memory.limit:
        enabled: false
      elasticsearch.breaker.tripped:
        enabled: false
      elasticsearch.indexing_pressure.memory.total.primary_rejections:
        enabled: false
      elasticsearch.node.thread_pool.tasks.finished:
        enabled: false
  hostmetrics:
    collection_interval: 60s # Recommended for cost savings and stability
    scrapers:
      cpu:
        metrics:
          # CPU Utilization and Time are the core metrics
          system.cpu.utilization: {enabled: true}
          system.cpu.time: {enabled: true}
      load:
        metrics:
          # Load Averages (used for system health dashboards)
          system.cpu.load_average.1m: {enabled: true}
          system.cpu.load_average.5m: {enabled: true}
          system.cpu.load_average.15m: {enabled: true}
      memory:
        metrics:
          # Memory Usage and Utilization
          system.memory.usage: {enabled: true}
          system.memory.utilization: {enabled: true}
      disk:
        metrics:
          # Disk I/O operations (throughput)
          system.disk.io: {enabled: true}
          system.disk.operations: {enabled: true}
      filesystem:
        metrics:
          # Filesystem usage (disk space capacity)
          system.filesystem.usage: {enabled: true}
          system.filesystem.utilization: {enabled: true} 
      network:
        # Since this was already working, keeping it simple is best.
        # But for completeness:
        metrics:
          system.network.io: {enabled: true}
          system.network.packets: {enabled: true}
      process:
           metrics:
             process.cpu.utilization:
               enabled: true
# -------------------------------------------------------------------------------------------------
# Processors
# -------------------------------------------------------------------------------------------------
processors:
  # used to prevent out of memory situations on the collector
  memory_limiter:
    check_interval: 60s
    limit_mib: ${env:NEW_RELIC_MEMORY_LIMIT_MIB:-100}
  cumulativetodelta: {}
  resource/cluster_name_override:
    attributes:
      # Use the actual cluster name defined in your Elasticsearch config
      - key: elasticsearch.cluster.name
        value: "<elasticsearch-cluster-name>" # <-- REPLACE THIS WITH A UNIQUE CLUSTER NAME TO UNIQUELY IDENTIFY YOUR CLUSTER IN NEW RELIC 
        action: upsert
  # This processor adds resource attributes to all telemetry data.
  # 'service.name' is crucial for creating an entity in New Relic.
  resourcedetection:
    detectors: [ system ]
    system:
      resource_attributes:
        host.name:
          enabled: true
        host.id:
          enabled: true
        os.type:
          enabled: true 
  # This processor batches data for more efficient sending.
  batch:
    timeout: 10s
    send_batch_size: 1024
  # 1. CARDINALITY REDUCTION: Drops volatile or redundant attributes
  attributes/cardinality_reduction:
    actions:
      # Filter out VOLATILE PROCESS IDS (High churn)
      - key: process.pid
        action: delete
      - key: process.parent_pid
        action: delete
  transform/metadata_nullify:
    # We use 'metric_statements' to run OTTL logic on the metric signal
    metric_statements:
      - context: metric  # <-- Targets the high-level Metric structure itself
        statements:
          # Sets the 'description' field to an empty string ("")
          - set(description, "")
          # Sets the 'unit' field to an empty string ("")
          - set(unit, "")      
exporters:
  # This exporter sends all data to New Relic via OTLP/HTTP.
  otlphttp:
    endpoint: ${env:NEWRELIC_OTLP_ENDPOINT}
    headers:
      api-key: ${env:NEWRELIC_LICENSE_KEY}
# -------------------------------------------------------------------------------------------------
# Service
# The service block defines the pipelines.
# -------------------------------------------------------------------------------------------------
service:
  pipelines:
    metrics/elasticsearch:
      receivers: [elasticsearch]
      processors: [memory_limiter, resourcedetection, resource/cluster_name_override, attributes/cardinality_reduction, cumulativetodelta, transform/metadata_nullify, batch]
      exporters: [otlphttp]
    metrics/host:
      receivers: [hostmetrics]
      processors: [memory_limiter, resourcedetection,batch]
      exporters: [otlphttp]

다음과 같은 경우에 사용하십시오: 인증 및/또는 SSL 인증서가 있는 보안 Elasticsearch 클러스터.

위의 기본 설정에 인증 자격 증명과 SSL 설정을 추가하세요.

receivers:
  elasticsearch:
    endpoint: "https://localhost:9200"
    username: "elastic"
    password: "your_password"
    tls:
      ca_file: "/etc/elasticsearch/certs/http_ca.crt"
      insecure_skip_verify: false
    collection_interval: 15s

선택 사항: 지표 외에 Elasticsearch 로그 파일을 뉴렐릭으로 보내려는 경우 이를 포함합니다.

Elasticsearch 로그를 수집하고 전달하려면 filelog 수신자 설정을 추가하세요. 수집기 서비스(예: nrdot-수집기 또는 otelcol-contrib)를 실행하는 사용자에게 Elasticsearch 로그 파일에 대한 읽기 액세스 권한이 있는지 확인하세요.

Elasticsearch를 Linux(호스트)에서 실행하는 경우:

receivers:
  filelog:
    include:
      - /var/log/elasticsearch/elasticsearch.log #Replace with path of the elasticsearch log file.
      - /var/log/elasticsearch/*.log             #We can send multiple log files using regex.

도커에서 Elasticsearch 실행하는 경우:

receivers:
  filelog:
    include:
      - /var/lib/docker/containers/*/*.log       # Replace with the container log file path. 
    operators:
      - type: move
        from: attributes.log
        to: body

서비스 파이프라인에 파일 로그 수신기를 추가합니다.

service:
  pipelines:
    logs:
      receivers: [filelog]
      processors: [resource/cluster_name_override]
      exporters: [otlphttp]

선택 사항: 환경, 팀 또는 지역과 같은 사용자 지정 속성으로 데이터에 태그를 지정하려면 이 항목을 포함하세요.

resource/static_override 프로세서를 사용하여 모든 메트릭에 사용자 지정 메타데이터 태그를 추가하세요.

processors:
  resource/static_override:
    attributes:
      - key: env
        value: "production"
        action: upsert
service:
  pipelines:
    metrics/elasticsearch:
      receivers: [elasticsearch]
      processors: [resourcedetection, resource/cluster_name_override, resource/static_override, attributes/cardinality_reduction, cumulativetodelta, transform/metadata_nullify, batch]
      exporters: [otlphttp]
    metrics/host:
      receivers: [hostmetrics]
      processors: [resourcedetection, resource/static_override, batch]
      exporters: [otlphttp]

팁

APM과 Elasticsearch 연동: APM 애플리케이션과 Elasticsearch 클러스터를 연결하려면 APM 메트릭에 리소스 속성 es.cluster.name="your-cluster-name" 포함하세요. 이를 통해 서비스 간 가시성과 문제 해결 속도가 빨라지고 뉴렐릭 내에서 문제가 해결됩니다.

3단계: 환경 변수 설정

/etc/systemd/system/nrdot-수집기.service.d에 다음 설정을 추가합니다. 환경 변수가 [Service] 섹션 아래에 배치되도록 파일을 수정하십시오.

bash

$[Service]
$Environment="NEW_RELIC_LICENSE_KEY=YOUR_LICENSE_KEY_HERE"
$Environment="OTEL_EXPORTER_OTLP_ENDPOINT=YOUR_OTLP_ENDPOINT"
$Environment="NEW_RELIC_MEMORY_LIMIT_MIB=COLLECTOR_MEMORY_LIMIT"

이러한 변경 사항을 적용하려면 systemd 관리자를 다시 로드하고 수집기를 다시 시작하십시오.

bash

$sudo systemctl daemon-reload
$sudo systemctl restart nrdot-collector.service

다음 설정을 /etc/systemd/system/otelcol-contrib.service.d 파일에 추가하십시오. 환경 변수가 [Service] 섹션 아래에 배치되도록 파일을 수정하십시오.

bash

$[Service]
$Environment="NEW_RELIC_LICENSE_KEY=YOUR_LICENSE_KEY_HERE"
$Environment="OTEL_EXPORTER_OTLP_ENDPOINT=YOUR_OTLP_ENDPOINT"
$Environment="NEW_RELIC_MEMORY_LIMIT_MIB=COLLECTOR_MEMORY_LIMIT"

이러한 변경 사항을 적용하려면 systemd 관리자를 다시 로드하고 수집기를 다시 시작하십시오.

bash

$sudo systemctl daemon-reload
$sudo systemctl restart otelcol-contrib.service

4단계: Elasticsearch 데이터 보기

수집기가 실행되어 데이터 전송이 시작되면 뉴럴릭에서 Elasticsearch 범위를 확인할 수 있습니다.

one.newrelic.com > Integrations & Agents로 이동하세요.
Elasticsearch(OpenTelemetry)를 검색하세요.
Dashboards 아래에서 Elasticsearch OpenTelemetry Dashboard [Elasticsearch OpenTelemetry 대시보드]를클릭하세요.
계정을 선택하고 View dashboard [대시보드 보기]를클릭하세요.

클러스터 상태, 성능 지표 및 리소스 사용량이 표시되는 대시보드를 볼 수 있습니다.

팁

데이터가 보이지 않나요? 데이터가 표시되는 데 몇 분 정도 소요될 수 있습니다. 10분이 지나도 지표가 표시되지 않으면 문제 해결, 문제 해결 가이드를 확인하세요.

데이터를 활용한 다음 단계:

메트릭 탐색: 모든 Elasticsearch 메트릭은 Metric 이벤트 유형으로 저장됩니다.
사용자 지정 쿼리 생성: NRQL을 사용하여 사용자 지정 차트 및 대시보드를 구축하세요.
알림 설정: 사전 예방적 모니터링을 구성하려면 5단계로 진행하십시오.

5단계: 알림 설정

사전 예방적 모니터링과 알림 기능을 통해 문제가 사용자에게 영향을 미치기 전에 문제를 파악할 수 있습니다. 뉴렐릭에서 공지 조건을 생성하려면:

one.newrelic.com > Alerts > Alert Conditions 으로 이동하세요.
조건 만들기 를 클릭합니다.
Guided mode [안내 모드] 또는 NRQL 쿼리 빌더를 사용하여 공지를 구성합니다.

강력한 Elasticsearch 모니터링을 위해서는 아래 공지 설정을 권장합니다.

중요 알림 (우선순위 높음)

이러한 알림은 데이터 손실이나 서비스 중단을 초래할 수 있는 중요한 클러스터 상태 문제를 모니터링합니다.

공지 이름	주니어 값, 릴레이 이론적 근거 (예시 조건)
할당되지 않은 샤드 공지	최소 5분 동안 `elasticsearch.cluster.shards` (여기서 `state = 'unassigned'` )이 0보다 큽니다.
건강한 데이터 노드 공지	메트릭 `elasticsearch.cluster.data_nodes` 이 최소 5분 동안 필요한 최소 노드 수 미만입니다.
힙 사용량이 너무 높음 공지	힙 사용률(사용량/최대치)이 5분 이상 90%를 초과했습니다.
대기 중인 작업 공지	온도 `elasticsearch.cluster.pending_tasks` 이 최소 5분 동안 5보다 높습니다.

추가 모니터링 알림

이러한 알림은 성능 및 운영 문제를 모니터링하는 데 도움이 됩니다.

공지 이름	주니어 값, 릴레이 이론적 근거 (예시 조건)
쿼리 시간이 느림 공지	`elasticsearch.node.operations.time` 의 95번째 백분위수는 최소 2분 동안 5ms를 초과합니다.
샤드 초기화에 너무 오래 걸립니다	최소 5분 동안 `elasticsearch.cluster.shards` (여기서 `state = 'initializing'` )이 0보다 큽니다.
파편을 너무 오래 옮기는 것	최소 5분 동안 `elasticsearch.cluster.shards` (여기서 `state = 'relocating'` )이 0보다 큽니다.

공지 이름

주니어 값, 릴레이 이론적 근거 (예시 조건)

쿼리 시간이 느림 공지

elasticsearch.node.operations.time

의 95번째 백분위수는 최소 2분 동안 5ms를 초과합니다.

샤드 초기화에 너무 오래 걸립니다

최소 5분 동안

elasticsearch.cluster.shards

(여기서

state = 'initializing'

)이 0보다 큽니다.

파편을 너무 오래 옮기는 것

최소 5분 동안

elasticsearch.cluster.shards

(여기서

state = 'relocating'

)이 0보다 큽니다.

문제점 해결

설치 중에 문제가 발생하거나 뉴렐릭에 데이터가 표시되지 않는 경우 일반적인 문제에 대한 단계별 해결 방법에 대한 포괄적인 문제 해결, 해결 가이드를 참조하세요.

사용자의 편의를 위해 제공되는 기계 번역입니다.

Elasticsearch OpenTelemetry 통합을 설치합니다.

1단계: 시작하기 전에

팁

2단계: OpenTelemetry Collector 구성

기본 지표 설정

중요

인증 및 SSL 설정

로그 활성화(파일로그 수신기)

Elasticsearch를 Linux(호스트)에서 실행하는 경우:

도커에서 Elasticsearch 실행하는 경우:

서비스 파이프라인에 파일 로그 수신기를 추가합니다.

사용자 지정 메타데이터 추가

팁

3단계: 환경 변수 설정

4단계: Elasticsearch 데이터 보기

팁

5단계: 알림 설정

중요 알림 (우선순위 높음)

추가 모니터링 알림

문제점 해결

사용자의 편의를 위해 제공되는 기계 번역입니다.

Elasticsearch OpenTelemetry 통합을 설치합니다.

1단계: 시작하기 전에 .css-21sua1{background:none;border:none;width:0;padding:0;}

팁

2단계: OpenTelemetry Collector 구성

인증 및 SSL 설정

로그 활성화(파일로그 수신기)

사용자 지정 메타데이터 추가

팁

3단계: 환경 변수 설정

4단계: Elasticsearch 데이터 보기

팁

5단계: 알림 설정

중요 알림 (우선순위 높음)

추가 모니터링 알림

문제점 해결

1단계: 시작하기 전에