Document

hanseom 2025. 3. 9. 10:00

Create

# create an index
PUT /my_index
{
  "settings": {
    "index": {
      "number_of_shards": 1,   // You can specify the number of shards here
      "number_of_replicas": 1  // Set to 0 for no replicas
    }
  }
}

# crate a document
POST /my_index/_doc/100
{
  "title": "Elasticsearch Advanced",
  "author": "Anthony K",
  "publish_date": "2024-05-20",
  "tags": ["search", "analytics"]
}

Retrieve

GET /my_index/_doc/100

Update

Updating the existing field

POST /my_index/_update/100
{
  "doc": {
    "author": "Anthony K2"
  }
}

Adding the new field

POST /my_index/_update/100
{
  "doc": {
    "price": 100
  }
}

update a document with script

POST /my_index/_update/100
{
  "script": {
    "source": "ctx._source.price += 10"
  }
}

POST /my_index/_update/100
{
  "script": {
    "source": "ctx._source.price += params.amount",
    "params": {
      "amount": 10
    }
  }
}

POST /my_index/_update/100
{
  "script": {
    "source": """
        if (ctx._source.price > 100) {
            ctx._source.price += 10;
        }
    """
  }
}

Elasticsearch Document는 변경 불가능(IMMUTABLE) 합니다. 즉, 문서가 한 번 생성되면, 그 내용을 직접 수정할 수 없습니다. 다음의 라이프사이클로 문서를 업데이트 합니다.

기존 문서 가져오기 (Fetch the Existing Document)
변경 사항 적용 (Apply Changes)
새 문서 인덱싱 (Index the New Document)
기존 문서 삭제 표시 (Mark the Old Document as Deleted)
세그먼트 병합 (Segment Merging)

Delete

DELETE /my_index/_doc/100

Upsert

combines both update and insert operations.

# Document가 존재하지 않으면 생성되고, 존재하면 author를 "John Doe"로 업데이트 합니다.
POST /my_index/_update/5
{
  "doc": {
    "author": "John Doe"
  },
  "upsert": {
    "title": "Elasticsearch Basics",
    "author": "Jane Doe",
    "publish_date": "2024-05-09",
    "tags": ["search", "analytics"]
  }
}

# Document가 존재하지 않으면 생성되고, 존재하면 script가 수행되어 price를 1 증가합니다.
POST /my_index/_update/7
{
  "script": {
    "source": "ctx._source.price++"
  },
  "upsert": {
    "title": "Elasticsearch Basics",
    "author": "Jane Doe",
    "publish_date": "2024-05-09",
    "price": 100,
    "tags": ["search", "analytics"]
  }
}

Replace

PUT /my_index/_doc/20
{
  "title": "Elasticsearch Basics",
  "author": "John Doe",
  "publish_date": "2024-05-09",
  "tags": ["search", "analytics"]
}

# Replace
PUT /my_index/_doc/20
{
  "title": "Updated Elasticsearch Basics",
  "author": "Jane Doe",
  "publish_date": "2024-06-01",
  "tags": ["search", "analytics", "updated"]
}

Flush

POST /my_index/_flush

Elasticsearch는 데이터를 디스크에 저장하기 전에 메모리 버퍼에 캐시합니다. flush 명령어는 이 메모리 버퍼의 내용을 디스크에 강제로 저장합니다. Elasticsearch는 기본적으로 자동으로 flush를 수행합니다.

시간 기반: 기본적으로 30초마다 자동으로 flush가 발생합니다.
메모리 사용량 기반: 메모리 버퍼가 가득 차면 자동으로 flush가 발생합니다.

수동으로 flush를 수행하면 데이터 일관성 보장, 검색 결과 반영 및 노드 장애 시 데이터 보호가 가능하지만 자주 수행할 경우, 성능에 부정적인 영향을 미칠 수 있습니다.

Routing

POST /my_index/_doc/1?routing=user123
{
  "title": "Elasticsearch Basics",
  "author": "John Doe",
  "publish_date": "2024-05-09",
  "tags": ["search", "analytics"]
}

GET /my_index/_search?routing=user123
{ 
  "query": {
    "match": {
      "author": "John Doe"
    }
  }
}

Elasticsearch에서 ROUTING은 문서가 특정 샤드로 라우팅되는 방식을 제어하는 기능입니다. 특정 샤드만을 대상으로 하기에 성능이 개선되고, 데이터를 분리하여 관리할 수 있습니다. 단, 특정 샤드에 데이터가 집중되면, 그 샤드가 '핫 샤드'가 되어 성능에 부정적인 영향을 미칠 수 있고, 한 번 설정된 ROUTING 키는 문서의 ID와 마찬가지로 변경할 수 없습니다.

Concurrency

Elasticsearch에서 동시성 이슈를 해결하기 위해서는 Primary Term과 Sequence Number를 사용합니다. 이 두 가지 매개변수는 Optimistic Concurrency Control을 구현하는 데 사용됩니다.

Primary Term: Primary Shard가 변경될 때 증가하는 수치입니다. 이는 Primary Shard의 변경을 추적하는 데 사용됩니다.
Sequence Number: 각 쓰기 작업 시 증가하는 수치입니다. 문서의 변경 순서를 관리하는 데 사용됩니다.

동시성 이슈 해결 방법

문서 읽기: 문서를 읽을 때, _seq_no와 _primary_term 값을 함께 가져옵니다.
업데이트 시 전달: 문서를 업데이트할 때, 이전에 읽은 _seq_no와 _primary_term 값을 if_seq_no와 if_primary_term 매개변수로 함께 전달합니다.
충돌 감지: Elasticsearch는 문서의 현재 _seq_no와 _primary_term 값과 전달된 값을 비교합니다. 만약 값이 다르면, 업데이트가 실패하고 VersionConflictException이 발생합니다.

단, 모든 쓰기 작업에 _seq_no와 _primary_term을 확인하고 전달하는 것은 성능에 약간의 영향을 미칠 수 있습니다. 또한, 충돌이 발생할 경우 적절한 예외 처리를 통해 재시도를 하거나 사용자에게 알림을 제공하는 등의 조치를 취해야 합니다.

POST /my_index/_update/1?if_primary_term=1&if_seq_no=5
{
  "doc": {
    "stock": 9
  }
}

# 문서 읽기
curl -X GET "localhost:9200/my_index/_doc/1"

# 응답에서 _seq_no와 _primary_term 확인
# 예: _seq_no = 5, _primary_term = 1

# 업데이트 시 전달
curl -X PUT "localhost:9200/my_index/_doc/1?if_seq_no=5&if_primary_term=1" -H 'Content-Type: application/json' -d '{
  "content": "Updated Content"
}'

[참고자료]

실리콘밸리 엔지니어와 함께하는 Elasticsearch