Full Text Search

hanseom 2025. 3. 22. 16:00

정확한 일치 검색을 하는 Term-Level Queries와 달리 Full Text Search는 텍스트 필드에서 유사한 문서를 빠르고 효율적으로 검색하는 기능입니다.

특징	Term-Level Queries	Full Text Search
검색 방식	정확한 일치 검색	유사한 문서 검색
분석 여부	분석되지 않음	분석됨 (토큰화, 소문자 변환 등)
대소문자 구분	대소문자 구분	대소문자 구분하지 않음
사용 예시	고유 식별자, 상태 코드	전체 텍스트 검색

How Full-Text Search Works

Text Analysis: 텍스트를 토큰화하고, 소문자로 변환하며, Stop Words를 제거하는 등의 과정을 통해 검색을 최적화합니다.
Inverted Index: 각 토큰을 문서 ID와 연결하여 빠르게 검색할 수 있는 구조를 제공합니다.
Search Query: 사용자가 입력한 검색어를 분석하여, 이를 Inverted Index와 비교하여 관련 문서를 찾습니다.
Relevance Scoring: 문서의 관련성을 점수로 평가하여, 가장 관련성이 높은 결과를 반환합니다.

Relevance Scoring

Elasticsearch에서 Relevance Scoring은 문서의 관련성을 점수로 평가하여, 가장 관련성이 높은 결과를 반환하는데 사용됩니다. 이 과정에서 TF-IDF(Term Frequency-Inverse Document Frequency) 알고리즘이 중요한 역할을 합니다.

Term Frequency (TF): 문서 내에서 단어의 빈도를 계산합니다.
Inverse Document Frequency (IDF): 전체 문서 집합에서 단어가 등장하는 문서의 수를 역수로 계산합니다.
TF-IDF: TF x IDF, TF-IDF 점수가 높을수록, 문서가 검색어와 관련성이 높다는 의미입니다.

PUT /articles
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text"
      },
      "content": {
        "type": "text"
      },
      "summary": {
        "type": "text"
      }
    }
  }
}

POST /articles/_doc/1
{
  "title": "Introduction to Machine Learning",
  "content": "Machine learning is a field of artificial intelligence that uses statistical techniques to give computers the ability to learn from data. Neural networks are a key component of many machine learning algorithms.",
  "summary": "An introduction to machine learning and neural networks."
}

POST /articles/_doc/2
{
  "title": "Deep Learning with Neural Networks",
  "content": "Deep learning is a subset of machine learning that involves training large neural networks on vast amounts of data. It is used in applications like image and speech recognition.",
  "summary": "A deep dive into deep learning and its applications."
}

POST /articles/_doc/3
{
  "title": "Understanding Artificial Intelligence",
  "content": "Artificial intelligence encompasses machine learning, deep learning, and other techniques that enable machines to mimic human intelligence. Neural networks play a crucial role in AI development.",
  "summary": "A comprehensive guide to artificial intelligence."
}

POST /articles/_doc/4
{
  "title": "Getting Started with Data Science",
  "content": "Data science involves extracting insights from large datasets using various techniques, including machine learning. However, neural networks are not typically a primary focus in introductory data science.",
  "summary": "An overview of data science and its key concepts."
}

Full Text Search

# 1. 기본적으로 Data or Science가 들어간 모든 Documents를 검색합니다.
GET /articles/_search
{
  "query": {
    "match": {
      "content": "Data Science"
    }
  }
}

# 1과 동일합니다.
GET /articles/_search
{
  "query": {
    "match": {
      "content": {
        "query": "Data Science",
        "operator": "or"
      }
    }
  }
}

GET /articles/_search
{
  "query": {
    "match": {
      "content": {
        "query": "Data Science",
        "operator": "and"
      }
    }
  }
}

Multi Match Search

다중 필드 검색은 여러 필드에서 동일한 검색어를 사용하여 문서를 검색할 수 있는 기능입니다. 각 필드에 가중치를 부여하여, 특정 필드의 중요성을 높일 수 있습니다.

# Multi Match Search
GET /articles/_search
{
  "query": {
    "multi_match": {
      "query": "Data Science",
      "fields": ["title", "content", "summary"]
    }
  }
}

# boost relevance score (title 필드 * 2)
GET /articles/_search
{
  "query": {
    "multi_match": {
      "query": "Data Science",
      "fields": ["title^2", "content", "summary"]
    }
  }
}

# tie breaker: 여러 필드에서 동일한 관련성 점수를 가진 문서가 있을 때, 그 차이를 조정하는 데 사용
GET /articles/_search
{
  "query": {
    "multi_match": {
      "query": "Data Science",
      "fields": ["title^2", "content", "summary"],
      "tie_breaker": 0.3
    }
  }

[참고자료]

실리콘밸리 엔지니어와 함께하는 Elasticsearch