Document

BackEnd/Elasticsearch API 2025. 3. 9. 10:00

Create

# create an index
PUT /my_index
{
  "settings": {
    "index": {
      "number_of_shards": 1,   // You can specify the number of shards here
      "number_of_replicas": 1  // Set to 0 for no replicas
    }
  }
}

# crate a document
POST /my_index/_doc/100
{
  "title": "Elasticsearch Advanced",
  "author": "Anthony K",
  "publish_date": "2024-05-20",
  "tags": ["search", "analytics"]
}

Retrieve

GET /my_index/_doc/100

Update

Updating the existing field

POST /my_index/_update/100
{
  "doc": {
    "author": "Anthony K2"
  }
}

Adding the new field

POST /my_index/_update/100
{
  "doc": {
    "price": 100
  }
}

update a document with script

POST /my_index/_update/100
{
  "script": {
    "source": "ctx._source.price += 10"
  }
}

POST /my_index/_update/100
{
  "script": {
    "source": "ctx._source.price += params.amount",
    "params": {
      "amount": 10
    }
  }
}

POST /my_index/_update/100
{
  "script": {
    "source": """
        if (ctx._source.price > 100) {
            ctx._source.price += 10;
        }
    """
  }
}

Elasticsearch Document는 변경 불가능(IMMUTABLE) 합니다. 즉, 문서가 한 번 생성되면, 그 내용을 직접 수정할 수 없습니다. 다음의 라이프사이클로 문서를 업데이트 합니다.

기존 문서 가져오기 (Fetch the Existing Document)
변경 사항 적용 (Apply Changes)
새 문서 인덱싱 (Index the New Document)
기존 문서 삭제 표시 (Mark the Old Document as Deleted)
세그먼트 병합 (Segment Merging)

Delete

DELETE /my_index/_doc/100

Upsert

combines both update and insert operations.

# Document가 존재하지 않으면 생성되고, 존재하면 author를 "John Doe"로 업데이트 합니다.
POST /my_index/_update/5
{
  "doc": {
    "author": "John Doe"
  },
  "upsert": {
    "title": "Elasticsearch Basics",
    "author": "Jane Doe",
    "publish_date": "2024-05-09",
    "tags": ["search", "analytics"]
  }
}

# Document가 존재하지 않으면 생성되고, 존재하면 script가 수행되어 price를 1 증가합니다.
POST /my_index/_update/7
{
  "script": {
    "source": "ctx._source.price++"
  },
  "upsert": {
    "title": "Elasticsearch Basics",
    "author": "Jane Doe",
    "publish_date": "2024-05-09",
    "price": 100,
    "tags": ["search", "analytics"]
  }
}

Replace

PUT /my_index/_doc/20
{
  "title": "Elasticsearch Basics",
  "author": "John Doe",
  "publish_date": "2024-05-09",
  "tags": ["search", "analytics"]
}

# Replace
PUT /my_index/_doc/20
{
  "title": "Updated Elasticsearch Basics",
  "author": "Jane Doe",
  "publish_date": "2024-06-01",
  "tags": ["search", "analytics", "updated"]
}

Flush

POST /my_index/_flush

Elasticsearch는 데이터를 디스크에 저장하기 전에 메모리 버퍼에 캐시합니다. flush 명령어는 이 메모리 버퍼의 내용을 디스크에 강제로 저장합니다. Elasticsearch는 기본적으로 자동으로 flush를 수행합니다.

시간 기반: 기본적으로 30초마다 자동으로 flush가 발생합니다.
메모리 사용량 기반: 메모리 버퍼가 가득 차면 자동으로 flush가 발생합니다.

수동으로 flush를 수행하면 데이터 일관성 보장, 검색 결과 반영 및 노드 장애 시 데이터 보호가 가능하지만 자주 수행할 경우, 성능에 부정적인 영향을 미칠 수 있습니다.

Routing

POST /my_index/_doc/1?routing=user123
{
  "title": "Elasticsearch Basics",
  "author": "John Doe",
  "publish_date": "2024-05-09",
  "tags": ["search", "analytics"]
}

GET /my_index/_search?routing=user123
{ 
  "query": {
    "match": {
      "author": "John Doe"
    }
  }
}

Elasticsearch에서 ROUTING은 문서가 특정 샤드로 라우팅되는 방식을 제어하는 기능입니다. 특정 샤드만을 대상으로 하기에 성능이 개선되고, 데이터를 분리하여 관리할 수 있습니다. 단, 특정 샤드에 데이터가 집중되면, 그 샤드가 '핫 샤드'가 되어 성능에 부정적인 영향을 미칠 수 있고, 한 번 설정된 ROUTING 키는 문서의 ID와 마찬가지로 변경할 수 없습니다.

Concurrency

Elasticsearch에서 동시성 이슈를 해결하기 위해서는 Primary Term과 Sequence Number를 사용합니다. 이 두 가지 매개변수는 Optimistic Concurrency Control을 구현하는 데 사용됩니다.

Primary Term: Primary Shard가 변경될 때 증가하는 수치입니다. 이는 Primary Shard의 변경을 추적하는 데 사용됩니다.
Sequence Number: 각 쓰기 작업 시 증가하는 수치입니다. 문서의 변경 순서를 관리하는 데 사용됩니다.

동시성 이슈 해결 방법

문서 읽기: 문서를 읽을 때, _seq_no와 _primary_term 값을 함께 가져옵니다.
업데이트 시 전달: 문서를 업데이트할 때, 이전에 읽은 _seq_no와 _primary_term 값을 if_seq_no와 if_primary_term 매개변수로 함께 전달합니다.
충돌 감지: Elasticsearch는 문서의 현재 _seq_no와 _primary_term 값과 전달된 값을 비교합니다. 만약 값이 다르면, 업데이트가 실패하고 VersionConflictException이 발생합니다.

단, 모든 쓰기 작업에 _seq_no와 _primary_term을 확인하고 전달하는 것은 성능에 약간의 영향을 미칠 수 있습니다. 또한, 충돌이 발생할 경우 적절한 예외 처리를 통해 재시도를 하거나 사용자에게 알림을 제공하는 등의 조치를 취해야 합니다.

POST /my_index/_update/1?if_primary_term=1&if_seq_no=5
{
  "doc": {
    "stock": 9
  }
}

# 문서 읽기
curl -X GET "localhost:9200/my_index/_doc/1"

# 응답에서 _seq_no와 _primary_term 확인
# 예: _seq_no = 5, _primary_term = 1

# 업데이트 시 전달
curl -X PUT "localhost:9200/my_index/_doc/1?if_seq_no=5&if_primary_term=1" -H 'Content-Type: application/json' -d '{
  "content": "Updated Content"
}'

[참고자료]

실리콘밸리 엔지니어와 함께하는 Elasticsearch

'BackEnd > Elasticsearch API' 카테고리의 다른 글

Mapping parameters (0)	2025.03.14
Mapping (0)	2025.03.13
Analysis (0)	2025.03.11
Multiple Documents (0)	2025.03.10
Introduction (0)	2025.03.08

ABOUT ME

Reference Reference

Create

Retrieve

Update

Delete

Upsert

Replace

Flush

Routing

Concurrency

'BackEnd > Elasticsearch API' 카테고리의 다른 글

티스토리툴바

ABOUT ME

Create

Retrieve

Update

Delete

Upsert

Replace

Flush

Routing

Concurrency

'BackEnd > Elasticsearch API' 카테고리의 다른 글

관련글 관련글 더보기

티스토리툴바