mirror of
https://gitlab.com/SIGBUS/nyaa.git
synced 2024-12-22 08:20:01 +00:00
ES: implement real substring matching (#500)
...by splitting input into characters, instead of whitespace delimited words. This means you can now match partial words, real substrings from anywhere: "foo ba" will match "Foo Bar Baz", while previously you had to have full words ("foo bar") to match anything. My dev setup incurred an 8% increase in storage usage, from ~13MB to ~14MB (for ~40k torrents). Small change, big improvement. Wonder why I didn't do this at first.
This commit is contained in:
parent
d407f09cab
commit
bc1901baa5
|
@ -24,9 +24,9 @@ settings:
|
|||
- my_ngram
|
||||
- trim_zero
|
||||
- unique
|
||||
# For exact matching - simple lowercase + whitespace delimiter
|
||||
# For exact matching - separate each character for substring matching + lowercase
|
||||
exact_analyzer:
|
||||
tokenizer: whitespace
|
||||
tokenizer: exact_tokenizer
|
||||
filter:
|
||||
- lowercase
|
||||
# For matching full words longer than the ngram limit (15 chars)
|
||||
|
@ -43,6 +43,13 @@ settings:
|
|||
- fullword_min
|
||||
- unique
|
||||
|
||||
tokenizer:
|
||||
# Splits input into characters, for exact substring matching
|
||||
exact_tokenizer:
|
||||
type: pattern
|
||||
pattern: "(.)"
|
||||
group: 1
|
||||
|
||||
filter:
|
||||
my_ngram:
|
||||
type: edgeNGram
|
||||
|
|
|
@ -46,7 +46,7 @@
|
|||
name, but not those which have <em>bar</em> in the name as well.
|
||||
</div>
|
||||
<div>
|
||||
If you want to search for a several-word expression in its entirety, you can
|
||||
If you want to search for a several-word expression (substring) in its entirety, you can
|
||||
surround searches with <kbd>"</kbd> (double quotes), such as
|
||||
<kbd>"foo bar"</kbd>, which would match torrents named <em>foo bar</em> but not
|
||||
those named <em>bar foo</em>. You may also use the aforementioned <kbd>|</kbd> to group
|
||||
|
|
Loading…
Reference in a new issue