欢迎关注大数据技术架构与案例微信公众号:过往记忆大数据
过往记忆博客公众号iteblog_hadoop
欢迎关注微信公众号:
过往记忆大数据

23种非常有用的ElasticSearch查询例子(3)

  本系列文章将展示ElasticSearch中23种非常有用的查询使用方法。由于篇幅原因,本系列文章分为六篇,本文是此系列的第三篇文章。欢迎关注大数据技术博客微信公共账号:iteblog_hadoop

Match Phrase Query(匹配短语查询)

  匹配短语查询要求查询字符串中的trems要么都出现Document中、要么trems按照输入顺序依次出现在结果中。在默认情况下,查询输入的trems必须在搜索字符串紧挨着出现,否则将查询不到。不过我们可以指定slop参数,来控制输入的trems之间有多少个单词仍然能够搜索到,如下所示:

curl -XGET 'https://www.iteblog.com:9200/iteblog_book_index/book/_search' -d '
{
    "query": {
        "multi_match": {
            "query": "search engine", 
            "fields": [
                "title", 
                "summary"
            ], 
            "type": "phrase", 
            "slop": 3
        }
    }, 
    "_source": [
        "title", 
        "summary", 
        "publish_date"
    ]
}'

[返回结果]

{
    "took": 17, 
    "timed_out": false, 
    "_shards": {
        "total": 1, 
        "successful": 1, 
        "failed": 0
    }, 
    "hits": {
        "total": 2, 
        "max_score": 0.22327082, 
        "hits": [
            {
                "_index": "iteblog_book_index", 
                "_type": "book", 
                "_id": "4", 
                "_score": 0.22327082, 
                "_source": {
                    "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", 
                    "title": "Solr in Action", 
                    "publish_date": "2014-04-05"
                }
            }, 
            {
                "_index": "iteblog_book_index", 
                "_type": "book", 
                "_id": "1", 
                "_score": 0.16113183, 
                "_source": {
                    "summary": "A distibuted real-time search and analytics engine", 
                    "title": "Elasticsearch: The Definitive Guide", 
                    "publish_date": "2015-02-07"
                }
            }
        ]
    }
} 

从上面的例子可以看出,id为4的document被搜索(summary字段里面精确匹配到了search engine),并且分数比较高;而id为1的document也被搜索到了,虽然其summary中的search和engine单词并不是紧挨着的,但是我们指定了slop属性,所以被搜索到了。如果我们将"slop": 3条件删除,那么id为1的文档将不会被搜索到。


如果想及时了解Spark、Hadoop或者Hbase相关的文章,欢迎关注微信公共帐号:iteblog_hadoop

Match Phrase Prefix Query(匹配短语前缀查询)

  匹配短语前缀查询可以指定单词的一部分字符前缀即可查询到该单词,和match phrase query一样我们也可以指定slop参数;同时其还支持max_expansions参数限制被匹配到的terms数量来减少资源的使用。使用如下:

/////////////////////////////////////////////////////////////////////
 User: 过往记忆
 Date: 2016-08-17
 Time: 23:36
 bolg: https://www.iteblog.com
 本文地址:https://www.iteblog.com/archives/1747.html
 过往记忆博客,专注于hadoop、hive、spark、shark、flume的技术博客,大量的干货
 过往记忆博客微信公共帐号:iteblog_hadoop
/////////////////////////////////////////////////////////////////////

curl -XGET 'https://www.iteblog.com:9200/iteblog_book_index/book/_search' -d '
{
    "query": {
        "match_phrase_prefix": {
            "summary": {
                "query": "search en", 
                "slop": 3, 
                "max_expansions": 10
            }
        }
    }, 
    "_source": [
        "title", 
        "summary", 
        "publish_date"
    ]
}'

[返回结果]

{
    "took": 9, 
    "timed_out": false, 
    "_shards": {
        "total": 1, 
        "successful": 1, 
        "failed": 0
    }, 
    "hits": {
        "total": 2, 
        "max_score": 0.5161346, 
        "hits": [
            {
                "_index": "iteblog_book_index", 
                "_type": "book", 
                "_id": "4", 
                "_score": 0.5161346, 
                "_source": {
                    "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", 
                    "title": "Solr in Action", 
                    "publish_date": "2014-04-05"
                }
            }, 
            {
                "_index": "iteblog_book_index", 
                "_type": "book", 
                "_id": "1", 
                "_score": 0.37248808, 
                "_source": {
                    "summary": "A distibuted real-time search and analytics engine", 
                    "title": "Elasticsearch: The Definitive Guide", 
                    "publish_date": "2015-02-07"
                }
            }
        ]
    }
}

需要留意的是,匹配短语前缀查询是有性能消耗的,所有使用之前需要小心。

Query String

  query_string查询提供了一种手段可以使用一种简洁的方式运行multi_match queries, bool queries, boosting, fuzzy matching, wildcards, regexp以及range queries的组合查询。在下面的例子中,我们运行了一个模糊搜索(fuzzy search),搜索关键字是search algorithm,并且作者包含grant ingersoll或者tom morton。并且搜索了所有的字段,其中summary字段的权重为2:

/////////////////////////////////////////////////////////////////////
 User: 过往记忆
 Date: 2016-08-17
 Time: 23:36
 bolg: https://www.iteblog.com
 本文地址:https://www.iteblog.com/archives/1747.html
 过往记忆博客,专注于hadoop、hive、spark、shark、flume的技术博客,大量的干货
 过往记忆博客微信公共帐号:iteblog_hadoop
/////////////////////////////////////////////////////////////////////
curl -XGET 'https://www.iteblog.com:9200/iteblog_book_index/book/_search' -d '
{
    "query": {
        "query_string" : {
            "query": "(saerch~1 algorithm~1) AND (grant ingersoll)  OR (tom morton)",
            "fields": ["_all", "summary^2"]
        }
    },
    "_source": [ "title", "summary", "authors" ],
    "highlight": {
        "fields" : {
            "summary" : {}
        }
    }
}'

[返回结果]

{
    "took": 25, 
    "timed_out": false, 
    "_shards": {
        "total": 1, 
        "successful": 1, 
        "failed": 0
    }, 
    "hits": {
        "total": 1, 
        "max_score": 0.12186339, 
        "hits": [
            {
                "_index": "iteblog_book_index", 
                "_type": "book", 
                "_id": "2", 
                "_score": 0.12186339, 
                "_source": {
                    "summary": "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", 
                    "authors": [
                        "grant ingersoll", 
                        "thomas morton", 
                        "drew farris"
                    ], 
                    "title": "Taming Text: How to Find, Organize, and Manipulate It"
                }, 
                "highlight": {
                    "summary": [
                        "organize text using approaches such as full-text <em>search</em>, proper name recognition, clustering, tagging, information extraction, and summarization"
                    ]
                }
            }
        ]
    }
}

限于篇幅的原因,本系列文章分为六部分,欢迎关注过往记忆大数据技术博客及时了解大数据相关文章,微信公共账号:iteblog_hadoop

本博客文章除特别声明,全部都是原创!
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【23种非常有用的ElasticSearch查询例子(3)】(https://www.iteblog.com/archives/1747.html)
喜欢 (3)
分享 (0)
发表我的评论
取消评论

表情
本博客评论系统带有自动识别垃圾评论功能,请写一些有意义的评论,谢谢!