1
0
Fork 0
mirror of https://gitlab.com/SIGBUS/nyaa.git synced 2024-12-22 13:39:59 +00:00

ES: implement real substring matching (#500)

...by splitting input into characters, instead of whitespace delimited
words. This means you can now match partial words, real substrings from
anywhere: "foo ba" will match "Foo Bar Baz", while previously you had to
have full words ("foo bar") to match anything.

My dev setup incurred an 8% increase in storage usage, from ~13MB to
~14MB (for ~40k torrents).
Small change, big improvement. Wonder why I didn't do this at first.
This commit is contained in:
Anna-Maria Meriniemi 2018-06-08 10:59:19 +03:00 committed by Arylide
parent d407f09cab
commit bc1901baa5
2 changed files with 10 additions and 3 deletions

View file

@ -24,9 +24,9 @@ settings:
- my_ngram - my_ngram
- trim_zero - trim_zero
- unique - unique
# For exact matching - simple lowercase + whitespace delimiter # For exact matching - separate each character for substring matching + lowercase
exact_analyzer: exact_analyzer:
tokenizer: whitespace tokenizer: exact_tokenizer
filter: filter:
- lowercase - lowercase
# For matching full words longer than the ngram limit (15 chars) # For matching full words longer than the ngram limit (15 chars)
@ -43,6 +43,13 @@ settings:
- fullword_min - fullword_min
- unique - unique
tokenizer:
# Splits input into characters, for exact substring matching
exact_tokenizer:
type: pattern
pattern: "(.)"
group: 1
filter: filter:
my_ngram: my_ngram:
type: edgeNGram type: edgeNGram

View file

@ -46,7 +46,7 @@
name, but not those which have <em>bar</em> in the name as well. name, but not those which have <em>bar</em> in the name as well.
</div> </div>
<div> <div>
If you want to search for a several-word expression in its entirety, you can If you want to search for a several-word expression (substring) in its entirety, you can
surround searches with <kbd>"</kbd> (double quotes), such as surround searches with <kbd>"</kbd> (double quotes), such as
<kbd>"foo bar"</kbd>, which would match torrents named <em>foo bar</em> but not <kbd>"foo bar"</kbd>, which would match torrents named <em>foo bar</em> but not
those named <em>bar foo</em>. You may also use the aforementioned <kbd>|</kbd> to group those named <em>bar foo</em>. You may also use the aforementioned <kbd>|</kbd> to group