23种非常有用的ElasticSearch查询例子(6)

文章目录

1 Function Score: Field Value Factor
2 Function Score: Decay Functions
3 Function Score: Script Scoring

　　本系列文章将展示ElasticSearch中23种非常有用的查询使用方法。由于篇幅原因，本系列文章分为六篇，本文是此系列的第五篇文章。欢迎关注大数据技术博客微信公共账号:iteblog_hadoop。

《23种非常有用的ElasticSearch查询例子(1)》
《23种非常有用的ElasticSearch查询例子(2)》
《23种非常有用的ElasticSearch查询例子(3)》
《23种非常有用的ElasticSearch查询例子(4)》
《23种非常有用的ElasticSearch查询例子(5)》
《23种非常有用的ElasticSearch查询例子(6)》

Function Score: Field Value Factor

　　在某些场景下，你可能想对某个特定字段设置一个因子(factor)，并通过这个因子计算某个文档的相关度(relevance score)。这是典型地基于文档(document)的重要性来抬高其相关性的方式。在下面例子中，我们想找到更受欢迎的图书(是通过图书的评论实现的)，并将其权重抬高，这里可以通过使用field_value_factor来实现：

/////////////////////////////////////////////////////////////////////
 User: 过往记忆
 Date: 2016-10-02
 Time: 22:57
 bolg: https://www.iteblog.com
 本文地址：https://www.iteblog.com/archives/1768.html
 过往记忆博客，专注于hadoop、hive、spark、shark、flume的技术博客，大量的干货
 过往记忆博客微信公共帐号：iteblog_hadoop
/////////////////////////////////////////////////////////////////////
curl POST :9200/iteblog_book_index/book/_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "search engine",
                    "fields": ["title", "summary"]
                }
            },
            "field_value_factor": {
                "field" : "num_reviews",
                "modifier": "log1p",
                "factor" : 2
            }
        }
    },
    "_source": ["title", "summary", "publish_date", "num_reviews"]
}

[返回结果]

{
    "took": 26, 
    "timed_out": false, 
    "_shards": {
        "total": 3, 
        "successful": 3, 
        "failed": 0
    }, 
    "hits": [
        {
            "_index": "bookdb_index", 
            "_type": "book", 
            "_id": "1", 
            "_score": 0.44831306, 
            "_source": {
                "summary": "A distibuted real-time search and analytics engine", 
                "num_reviews": 20, 
                "title": "Elasticsearch: The Definitive Guide", 
                "publish_date": "2015-02-07"
            }
        }, 
        {
            "_index": "bookdb_index", 
            "_type": "book", 
            "_id": "4", 
            "_score": 0.3718407, 
            "_source": {
                "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", 
                "num_reviews": 23, 
                "title": "Solr in Action", 
                "publish_date": "2014-04-05"
            }
        }, 
        {
            "_index": "bookdb_index", 
            "_type": "book", 
            "_id": "3", 
            "_score": 0.046479136, 
            "_source": {
                "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", 
                "num_reviews": 18, 
                "title": "Elasticsearch in Action", 
                "publish_date": "2015-12-03"
            }
        }, 
        {
            "_index": "bookdb_index", 
            "_type": "book", 
            "_id": "2", 
            "_score": 0.041432835, 
            "_source": {
                "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", 
                "num_reviews": 12, 
                "title": "Taming Text: How to Find, Organize, and Manipulate It", 
                "publish_date": "2013-01-24"
            }
        }
    ]
}

Function Score: Decay Functions

　　在使用Decay Functions之前，我们需要了解Decay Functions的一些基础。Decay Functions主要有三种：分别是linear、exp以及gauss，分别用于操作数字字段(numeric fields)、日期字段(date fields)以及经/纬度的地理点。这三种Decay Functions都接收以下四种参数：

　　1、origin：中心点，或者是该字段最有可能的值。所有落在中心点的文档的得分(_score)都是1.0；

　　2、scale：衰减率。指的是一个文档距离origin获得_score的需要减少多少；

　　3、decay：衰减。指的是一个文档在相对于origin的scale距离应该得到的_score，默认值是0.5；

　　4、offset：偏移，所有落入-offset < = origin <= +offset范围的值都将得到1.0的_score。

下图展示了这三种Decay Functions的区别：

gauss 衰减速度先慢后快再慢，exp 衰减速度先快后慢，lin 直线衰减，在0分外的值都是0分，如何选择取决于你想要你的score以什么速度衰减。下面例子中我们搜索标题或者摘要中包含search engines的图书，并且希望图书的发行日期是在2014-06-15中心点范围内，如下：

curl POST :9200/iteblog_book_index/book/_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "search engine",
                    "fields": ["title", "summary"]
                }
            },
            "functions": [
                {
                    "exp": {
                        "publish_date" : {
                            "origin": "2014-06-15",
                            "offset": "7d",
                            "scale" : "30d"
                        }
                    }
                }
            ],
            "boost_mode" : "replace"
        }
    },
    "_source": ["title", "summary", "publish_date", "num_reviews"]
}

[返回结果]

{
    "took": 26, 
    "timed_out": false, 
    "_shards": {
        "total": 3, 
        "successful": 3, 
        "failed": 0
    }, 
    "hits": [
        {
            "_index": "bookdb_index", 
            "_type": "book", 
            "_id": "4", 
            "_score": 0.27420625, 
            "_source": {
                "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", 
                "num_reviews": 23, 
                "title": "Solr in Action", 
                "publish_date": "2014-04-05"
            }
        }, 
        {
            "_index": "bookdb_index", 
            "_type": "book", 
            "_id": "1", 
            "_score": 0.005920768, 
            "_source": {
                "summary": "A distibuted real-time search and analytics engine", 
                "num_reviews": 20, 
                "title": "Elasticsearch: The Definitive Guide", 
                "publish_date": "2015-02-07"
            }
        }, 
        {
            "_index": "bookdb_index", 
            "_type": "book", 
            "_id": "2", 
            "_score": 0.000011564, 
            "_source": {
                "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", 
                "num_reviews": 12, 
                "title": "Taming Text: How to Find, Organize, and Manipulate It", 
                "publish_date": "2013-01-24"
            }
        }, 
        {
            "_index": "bookdb_index", 
            "_type": "book", 
            "_id": "3", 
            "_score": 0.0000059171475, 
            "_source": {
                "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", 
                "num_reviews": 18, 
                "title": "Elasticsearch in Action", 
                "publish_date": "2015-12-03"
            }
        }
    ]
}

Function Score: Script Scoring

　　如果内置的scoring functions满足不了你的需求，我们就可以使用Script Scoring，通过指定一个Groovy script来计算分数。在下面的例子中，我们写了一个脚本首先考虑publish_date，其次再考虑图书的评论数，因为比较新出版的图书可能没有多少评论数，但是我们并不能不考虑它们。计算分数的脚本如下：

publish_date = doc['publish_date'].value
num_reviews = doc['num_reviews'].value
if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) {
  my_score = Math.log(2.5 + num_reviews)
} else {
  my_score = Math.log(1 + num_reviews)
}
return my_score

然后查询的时候使用script_score 参数：

curl POST https://www.iteblog.com:9200/iteblog_book_index/book/_search
{
    "query": {
        "function_score": {
            "query": {
                "multi_match" : {
                    "query" : "search engine",
                    "fields": ["title", "summary"]
                }
            },
            "functions": [
                {
                    "script_score": {
                        "params" : {
                            "threshold": "2015-07-30"
                        },
                        "script": "publish_date = doc['publish_date'].value; num_reviews = doc['num_reviews'].value; if (publish_date > Date.parse('yyyy-MM-dd', threshold).getTime()) { return log(2.5 + num_reviews) }; return log(1 + num_reviews);"
                    }
                }
            ]
        }
    },
    "_source": ["title", "summary", "publish_date", "num_reviews"]
}

[返回结果]

{
    "took": 26, 
    "timed_out": false, 
    "_shards": {
        "total": 3, 
        "successful": 3, 
        "failed": 0
    }, 
    "hits": {
        "total": 4, 
        "max_score": 0.8463001, 
        "hits": [
            {
                "_index": "bookdb_index", 
                "_type": "book", 
                "_id": "1", 
                "_score": 0.8463001, 
                "_source": {
                    "summary": "A distibuted real-time search and analytics engine", 
                    "num_reviews": 20, 
                    "title": "Elasticsearch: The Definitive Guide", 
                    "publish_date": "2015-02-07"
                }
            }, 
            {
                "_index": "bookdb_index", 
                "_type": "book", 
                "_id": "4", 
                "_score": 0.7067348, 
                "_source": {
                    "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", 
                    "num_reviews": 23, 
                    "title": "Solr in Action", 
                    "publish_date": "2014-04-05"
                }
            }, 
            {
                "_index": "bookdb_index", 
                "_type": "book", 
                "_id": "3", 
                "_score": 0.08952084, 
                "_source": {
                    "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", 
                    "num_reviews": 18, 
                    "title": "Elasticsearch in Action", 
                    "publish_date": "2015-12-03"
                }
            }, 
            {
                "_index": "bookdb_index", 
                "_type": "book", 
                "_id": "2", 
                "_score": 0.07602123, 
                "_source": {
                    "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", 
                    "num_reviews": 12, 
                    "title": "Taming Text: How to Find, Organize, and Manipulate It", 
                    "publish_date": "2013-01-24"
                }
            }
        ]
    }
}

　　注意：为了使用动态的脚本，我们必须先在 config/elasticsearch.yaml 文件中做好相应的配置，具体请参见：https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-scripting.html。

本博客文章除特别声明，全部都是原创！
原创文章版权归过往记忆大数据（过往记忆）所有，未经许可不得转载。
本文链接: 【23种非常有用的ElasticSearch查询例子(6)】（https://www.iteblog.com/archives/1768.html）