mirror of
https://gitlab.com/SIGBUS/nyaa.git
synced 2024-12-22 13:39:59 +00:00
ES: implement real substring matching (#500)
...by splitting input into characters, instead of whitespace delimited words. This means you can now match partial words, real substrings from anywhere: "foo ba" will match "Foo Bar Baz", while previously you had to have full words ("foo bar") to match anything. My dev setup incurred an 8% increase in storage usage, from ~13MB to ~14MB (for ~40k torrents). Small change, big improvement. Wonder why I didn't do this at first.
This commit is contained in:
parent
d407f09cab
commit
bc1901baa5
|
@ -24,9 +24,9 @@ settings:
|
||||||
- my_ngram
|
- my_ngram
|
||||||
- trim_zero
|
- trim_zero
|
||||||
- unique
|
- unique
|
||||||
# For exact matching - simple lowercase + whitespace delimiter
|
# For exact matching - separate each character for substring matching + lowercase
|
||||||
exact_analyzer:
|
exact_analyzer:
|
||||||
tokenizer: whitespace
|
tokenizer: exact_tokenizer
|
||||||
filter:
|
filter:
|
||||||
- lowercase
|
- lowercase
|
||||||
# For matching full words longer than the ngram limit (15 chars)
|
# For matching full words longer than the ngram limit (15 chars)
|
||||||
|
@ -43,6 +43,13 @@ settings:
|
||||||
- fullword_min
|
- fullword_min
|
||||||
- unique
|
- unique
|
||||||
|
|
||||||
|
tokenizer:
|
||||||
|
# Splits input into characters, for exact substring matching
|
||||||
|
exact_tokenizer:
|
||||||
|
type: pattern
|
||||||
|
pattern: "(.)"
|
||||||
|
group: 1
|
||||||
|
|
||||||
filter:
|
filter:
|
||||||
my_ngram:
|
my_ngram:
|
||||||
type: edgeNGram
|
type: edgeNGram
|
||||||
|
|
|
@ -46,7 +46,7 @@
|
||||||
name, but not those which have <em>bar</em> in the name as well.
|
name, but not those which have <em>bar</em> in the name as well.
|
||||||
</div>
|
</div>
|
||||||
<div>
|
<div>
|
||||||
If you want to search for a several-word expression in its entirety, you can
|
If you want to search for a several-word expression (substring) in its entirety, you can
|
||||||
surround searches with <kbd>"</kbd> (double quotes), such as
|
surround searches with <kbd>"</kbd> (double quotes), such as
|
||||||
<kbd>"foo bar"</kbd>, which would match torrents named <em>foo bar</em> but not
|
<kbd>"foo bar"</kbd>, which would match torrents named <em>foo bar</em> but not
|
||||||
those named <em>bar foo</em>. You may also use the aforementioned <kbd>|</kbd> to group
|
those named <em>bar foo</em>. You may also use the aforementioned <kbd>|</kbd> to group
|
||||||
|
|
Loading…
Reference in a new issue