In ElasticSearch, removed stop words continue to have a small effect on scoring

Base Match Query: Billy Sue

Test Match Query #1: Billy Sue and

Test Match Query #2: Billy and Sue

We end up with identical scores between Base and #1, but Base and #2 have similar yet different scores.

Using the analyze API, the stop word and is removed on both test queries, but the start_offset and end_offset token properties differ for Sue between the Base query and Test Query #2.

Essentially, the pre-stop-word-removal distance between the remaining tokens is recorded and has a small yet finite impact on scoring.

The Question

Is there a way to delay the calculation of the start_offset and end_offset properties of tokens until after stop-words are removed, or otherwise prevent removed stop-words from influencing scoring in any fashion?


Perhaps disable position increments on the stop word filterand see if that helps? Especially if your mapping has some kind of filter after the stop word filter, you'll get strange artifacts from the position increments

E.g. something like this:

"analyzer": {
      "filter":["standard", "lowercase", "filter_stop"]
"filter": { 

Need Your Help

Should I share the entire commits history when creating a new open source project?

github open-source collaboration

I have a few projects I'd like to share in my GitHub Account, and I've been wondering what is a good practice:

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.