本系列文章将展示ElasticSearch中23种非常有用的查询使用方法。由于篇幅原因,本系列文章分为六篇,本文是此系列的第一篇文章。欢迎关注大数据技术博客微信公共账号:iteblog_hadoop。
《23种非常有用的ElasticSearch查询例子(2)》
《23种非常有用的ElasticSearch查询例子(3)》
《23种非常有用的ElasticSearch查询例子(4)》
《23种非常有用的ElasticSearch查询例子(5)》
《23种非常有用的ElasticSearch查询例子(6)》
为了展示Elasticsearch中不同查询的用法,我这里先在Elasticsearch里面创建了book相关的documents,每本书主要涉及以下字段: title, authors, summary, publish_date(发行日期),publisher以及评论条数。操作如下:
curl -XPUT 'https://www.iteblog.com:9200/iteblog_book_index' -d '{ "settings": { "number_of_shards": 1 }}' [返回结果] {"acknowledged":true} curl -XPOST 'https://www.iteblog.com:9200/iteblog_book_index/book/_bulk' -d ' { "index": { "_id": 1 }} { "title": "Elasticsearch: The Definitive Guide", "authors": ["clinton gormley", "zachary tong"], "summary" : "A distibuted real-time search and analytics engine", "publish_date" : "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } { "index": { "_id": 2 }} { "title": "Taming Text: How to Find, Organize, and Manipulate It", "authors": ["grant ingersoll", "thomas morton", "drew farris"], "summary" : "organize text using approaches such as full-text search, proper name recognition, clustering, tagging, information extraction, and summarization", "publish_date" : "2013-01-24", "num_reviews": 12, "publisher": "manning" } { "index": { "_id": 3 }} { "title": "Elasticsearch in Action", "authors": ["radu gheorge", "matthew lee hinman", "roy russo"], "summary" : "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date" : "2015-12-03", "num_reviews": 18, "publisher": "manning" } { "index": { "_id": 4 }} { "title": "Solr in Action", "authors": ["trey grainger", "timothy potter"], "summary" : "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date" : "2014-04-05", "num_reviews": 23, "publisher": "manning" } ' [返回结果] { "took": 33, "errors": false, "items": [ { "index": { "_index": "iteblog_book_index", "_type": "book", "_id": "1", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 201 } }, { "index": { "_index": "iteblog_book_index", "_type": "book", "_id": "2", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 201 } }, { "index": { "_index": "iteblog_book_index", "_type": "book", "_id": "3", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 201 } }, { "index": { "_index": "iteblog_book_index", "_type": "book", "_id": "4", "_version": 1, "_shards": { "total": 2, "successful": 1, "failed": 0 }, "status": 201 } } ] }
数据准备好了,现在我们可以查询ElasticSearch中的数据。
基本匹配查询(Basic Match Query)
基本匹配查询主要有两种形式:(1)、使用Search Lite API,并将所有的搜索参数都通过URL传递;(2)、使用Elasticsearch DSL,其可以通过传递一个JSON请求来获取结果。下面是在所有的字段中搜索带有"guide"的结果:
///////////////////////////////////////////////////////////////////// User: 过往记忆 Date: 2016-08-15 Time: 23:54 bolg: https://www.iteblog.com 本文地址:https://www.iteblog.com/archives/1741.html 过往记忆博客,专注于hadoop、hive、spark、shark、flume的技术博客,大量的干货 过往记忆博客微信公共帐号:iteblog_hadoop ///////////////////////////////////////////////////////////////////// :9200/iteblog_book_index/book/_search?q=guide [返回结果] { "took": 20, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 2, "max_score": 0.24144039, "hits": [ { "_index": "iteblog_book_index", "_type": "book", "_id": "1", "_score": 0.24144039, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } }, { "_index": "iteblog_book_index", "_type": "book", "_id": "4", "_score": 0.24144039, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ] } }
如果我们使用Query DSL来展示出上面一样的结果可以这么来写:
///////////////////////////////////////////////////////////////////// User: 过往记忆 Date: 2016-08-15 Time: 23:54 bolg: https://www.iteblog.com 本文地址:https://www.iteblog.com/archives/1741.html 过往记忆博客,专注于hadoop、hive、spark、shark、flume的技术博客,大量的干货 过往记忆博客微信公共帐号:iteblog_hadoop ///////////////////////////////////////////////////////////////////// curl -XGET ':9200/iteblog_book_index/book/_search' -d ' { "query": { "multi_match" : { "query" : "guide", "fields" : ["_all"] } } }'
其输出和上面使用/iteblog_book_index/book/_search?q=guide
的输出一样。上面的multi_match
关键字通常在查询多个fields的时候作为match
关键字的简写方式。fields
属性指定需要查询的字段,如果我们想查询所有的字段,这时候可以使用_all
关键字,正如上面的一样。
以上两种方式都允许我们指定查询哪些字段。比如,我们想查询title中出现in Action
的图书,那么我们可以这么查询:
///////////////////////////////////////////////////////////////////// User: 过往记忆 Date: 2016-08-15 Time: 23:54 bolg: https://www.iteblog.com 本文地址:https://www.iteblog.com/archives/1741.html 过往记忆博客,专注于hadoop、hive、spark、shark、flume的技术博客,大量的干货 过往记忆博客微信公共帐号:iteblog_hadoop ///////////////////////////////////////////////////////////////////// :9200/iteblog_book_index/book/_search?q=title:in%20action [返回结果] { "took": 27, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 2, "max_score": 0.6259885, "hits": [ { "_index": "iteblog_book_index", "_type": "book", "_id": "4", "_score": 0.6259885, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } }, { "_index": "iteblog_book_index", "_type": "book", "_id": "3", "_score": 0.5975345, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } } ] } }
然而,DSL方式提供了更加灵活的方式来构建更加复杂的查询(我们将在后面看到),甚至指定你想要的返回结果。下面的例子中,我将指定需要返回结果的数量,开始的偏移量(这在分页的情况下非常有用),需要返回document中的哪些字段以及高亮关键字:
curl -XGET 'https://www.iteblog.com:9200/iteblog_book_index/book/_search' -d ' { "query": { "match" : { "title" : "in action" } }, "size": 2, "from": 0, "_source": [ "title", "summary", "publish_date" ], "highlight": { "fields" : { "title" : {} } } }' [返回结果] { "took": 100, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 2, "max_score": 0.9105287, "hits": [ { "_index": "iteblog_book_index", "_type": "book", "_id": "3", "_score": 0.9105287, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" }, "highlight": { "title": [ "Elasticsearch <em>in</em> <em>Action</em>" ] } }, { "_index": "iteblog_book_index", "_type": "book", "_id": "4", "_score": 0.9105287, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" }, "highlight": { "title": [ "Solr <em>in</em> <em>Action</em>" ] } } ] } }
需要注意的是:对于查询多个关键字,match
关键字允许我们使用and操作符来代替默认的or操作符。你也可以指定minimum_should_match
操作符来调整返回结果的相关性(tweak relevance)。本文就不具体介绍,更多使用情况请参见ElasticSearch官方文档。
Multi-field Search
正如我们之前所看到的,想在一个搜索中查询多个 document field (比如使用同一个查询关键字同时在title和summary中查询),你可以使用multi_match
查询,使用如下:
curl -XGET 'https://www.iteblog.com:9200/iteblog_book_index/book/_search' -d ' { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary"] } } }' [返回结果] { "took": 13, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 3, "max_score": 0.9448582, "hits": [ { "_index": "iteblog_book_index", "_type": "book", "_id": "1", "_score": 0.9448582, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } }, { "_index": "iteblog_book_index", "_type": "book", "_id": "3", "_score": 0.17312013, "_source": { "title": "Elasticsearch in Action", "authors": [ "radu gheorge", "matthew lee hinman", "roy russo" ], "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "publish_date": "2015-12-03", "num_reviews": 18, "publisher": "manning" } }, { "_index": "iteblog_book_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "title": "Solr in Action", "authors": [ "trey grainger", "timothy potter" ], "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "publish_date": "2014-04-05", "num_reviews": 23, "publisher": "manning" } } ] } }
上面的查询一共返回了三个结果。
Boosting
我们上面使用同一个搜索请求在多个field中查询,你也许想提高某个field的查询权重。在下面的例子中,我们把summary field的权重调成3,这样就提高了其在结果中的权重,这样把_id=4
的文档相关性大大提高了,如下:
curl -XGET 'https://www.iteblog.com:9200/iteblog_book_index/book/_search' -d ' { "query": { "multi_match" : { "query" : "elasticsearch guide", "fields": ["title", "summary^3"] } }, "_source": ["title", "summary", "publish_date"] }' [返回结果] { "took": 8, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 3, "max_score": 0.31495273, "hits": [ { "_index": "iteblog_book_index", "_type": "book", "_id": "1", "_score": 0.31495273, "_source": { "summary": "A distibuted real-time search and analytics engine", "title": "Elasticsearch: The Definitive Guide", "publish_date": "2015-02-07" } }, { "_index": "iteblog_book_index", "_type": "book", "_id": "4", "_score": 0.14965448, "_source": { "summary": "Comprehensive guide to implementing a scalable search engine using Apache Solr", "title": "Solr in Action", "publish_date": "2014-04-05" } }, { "_index": "iteblog_book_index", "_type": "book", "_id": "3", "_score": 0.13094766, "_source": { "summary": "build scalable search applications using Elasticsearch without having to do complex low-level programming or understand advanced data science algorithms", "title": "Elasticsearch in Action", "publish_date": "2015-12-03" } } ] } }
大家可以对比一下这个查询结果和上面结果的不同。
需要注意的是:Boosting不仅仅意味着计算出来的分数(calculated score)直接乘以boost factor,最终的boost value会经过归一化以及其他一些内部的优化,可以参考官方文档了解更多详情。
Bool Query
我们可以在查询条件中使用AND/OR/NOT操作符,这就是布尔查询(Bool Query)。布尔查询可以接受一个must
参数(等价于AND),一个must_not
参数(等价于NOT),以及一个should
参数(等价于OR)。比如,我想查询title中出现Elasticsearch
或者Solr
关键字的图书,图书的作者是clinton gormley
,但没有radu gheorge
,我们可以这么来查询:
curl -XGET 'https://www.iteblog.com:9200/iteblog_book_index/book/_search' -d ' { "query": { "bool": { "must": { "bool" : { "should": [ { "match": { "title": "Elasticsearch" }}, { "match": { "title": "Solr" }} ] } }, "must": { "match": { "authors": "clinton gormely" }}, "must_not": { "match": {"authors": "radu gheorge" }} } } }' [返回结果] { "took": 26, "timed_out": false, "_shards": { "total": 1, "successful": 1, "failed": 0 }, "hits": { "total": 1, "max_score": 0.31271058, "hits": [ { "_index": "iteblog_book_index", "_type": "book", "_id": "1", "_score": 0.31271058, "_source": { "title": "Elasticsearch: The Definitive Guide", "authors": [ "clinton gormley", "zachary tong" ], "summary": "A distibuted real-time search and analytics engine", "publish_date": "2015-02-07", "num_reviews": 20, "publisher": "oreilly" } } ] } }
限于篇幅的原因,本系列文章分为六部分,欢迎关注过往记忆大数据技术博客及时了解大数据相关文章,微信公共账号:iteblog_hadoop
。
原创文章版权归过往记忆大数据(过往记忆)所有,未经许可不得转载。
本文链接: 【23种非常有用的ElasticSearch查询例子(1)】(https://www.iteblog.com/archives/1741.html)
大神的博客都不敢说话吗,哈哈
^_^
感谢博主分享好文