a3x/nyaa - Gitea: Git with a cup of tea

a3x/nyaa

mirror of https://gitlab.com/SIGBUS/nyaa.git synced 2024-12-22 23:40:00 +00:00

Author	SHA1	Message	Date
Anna-Maria Meriniemi	bc1901baa5	ES: implement real substring matching (#500 ) ...by splitting input into characters, instead of whitespace delimited words. This means you can now match partial words, real substrings from anywhere: "foo ba" will match "Foo Bar Baz", while previously you had to have full words ("foo bar") to match anything. My dev setup incurred an 8% increase in storage usage, from ~13MB to ~14MB (for ~40k torrents). Small change, big improvement. Wonder why I didn't do this at first.	2018-06-08 00:59:19 -07:00
Anna-Maria Meriniemi	59db958977	ES: delimit words before ngram, optimize tokens (#487 ) Before, long.tokens.with.dots.or.dashes would get edgengrammed up to the ngram limit, so we'd get to long.tokens.wit which would then be split - discarding "with.dots.or.dashes" completely. The fullword index would keep the complete large token, but without any ngramming, so incomplete searches (like "tokens") would not match it, only the full token. Now, we split words before ngramming them, so the main index will properly handle words up to the ngram limit. The fullword index will still handle the longer words for non-ngram matching. Also optimized away duplicate tokens from the indices (since we rely on boolean matching, not scoring) to save a couple megabytes of space.	2018-04-28 18:09:40 -07:00
Anna-Maria Meriniemi	0b78428abc	[ES Change] Improve Elasticsearch term quoting (#473 ) * Optimize Elasticsearch fullword field Since the main display_name field ngrams words up to 15 characters, anything to and under that will already be indexed - the fullword field (which we have for words longer than 15 characters) needs to index only words longer than that. * Preprocess ES terms for better literal matching This commit adds a new .exact subfield to display_name, which holds a barely-filtered version of the original title we can do "literal" matching against. This is not real substring matching, but quoting terms now actually does something! Implements a simple preprocessor for the search terms to extract quoted parts from the search terms, optionally prefixed with - to negate them. The preprocessor will create a query that'll join all three query-types: the simple_query_string, must-phrases and must-not-phrases.	2018-04-13 17:06:25 -07:00
TheAMM	2d0cf7cbb4	[ES Schema change] Multi-field search display_name to match words over ngram limit This fixes searching for "Machiavellianism", 16 chars ("Machiavellianis", 15 chars, worked previously). Does not (seem to!) break anything, but requires a re-indexing of ES.	2017-06-05 17:29:00 +03:00
aldacron	535be9c8bd	Fixes #227	2017-06-04 23:03:32 -07:00
TheAMM	9cd6c506ae	Update ElasticSeach index and scripts for comment_count	2017-05-26 16:12:47 +03:00
aldacron	142dd5359c	Resolves #129 and refactored create magnet es naming	2017-05-24 23:19:08 -07:00
aldacron	6b4d487314	updated indicies	2017-05-18 01:58:08 -07:00
aldacron	6ad43bbcaa	Reverted previous commit for mapping	2017-05-16 22:53:03 -07:00
aldacron	b2a7b49757	changed es mapping to disable fields that don't need querying	2017-05-16 22:12:58 -07:00
aldacron	c2c547e786	some more elasticsearch work, including index mapping and analyzer	2017-05-15 11:14:01 -07:00

11 commits