ES: implement real substring matching

...by splitting input into characters, instead of whitespace delimited
words. This means you can now match partial words, real substrings from
anywhere: "foo ba" will match "Foo Bar Baz", while previously you had to
have full words ("foo bar") to match anything.

My dev setup incurred an 8% increase in storage usage, from ~13MB to
~14MB (for ~40k torrents).
Small change, big improvement. Wonder why I didn't do this at first.
This commit is contained in:
TheAMM 2018-06-07 21:36:41 +03:00
parent d407f09cab
commit 9c3ac4dc67
2 changed files with 10 additions and 3 deletions

View File

@ -24,9 +24,9 @@ settings:
- my_ngram
- trim_zero
- unique
# For exact matching - simple lowercase + whitespace delimiter
# For exact matching - separate each character for substring matching + lowercase
exact_analyzer:
tokenizer: whitespace
tokenizer: exact_tokenizer
filter:
- lowercase
# For matching full words longer than the ngram limit (15 chars)
@ -43,6 +43,13 @@ settings:
- fullword_min
- unique
tokenizer:
# Splits input into characters, for exact substring matching
exact_tokenizer:
type: pattern
pattern: "(.)"
group: 1
filter:
my_ngram:
type: edgeNGram

View File

@ -46,7 +46,7 @@
name, but not those which have <em>bar</em> in the name as well.
</div>
<div>
If you want to search for a several-word expression in its entirety, you can
If you want to search for a several-word expression (substring) in its entirety, you can
surround searches with <kbd>"</kbd> (double quotes), such as
<kbd>"foo bar"</kbd>, which would match torrents named <em>foo bar</em> but not
those named <em>bar foo</em>. You may also use the aforementioned <kbd>|</kbd> to group