Optimize Elasticsearch fullword field

Since the main display_name field ngrams words up to 15 characters, anything to and under that will already be indexed - the fullword field (which we have for words longer than 15 characters) needs to index only words longer than that.
2025-05-03 03:21:00 +00:00 · 2018-04-13 14:37:01 +03:00 · 2018-04-13 14:37:01 +03:00 · f31af836d9
parent 81806d7bc9
commit f31af836d9
1 changed files with 7 additions and 1 deletions
--- a/es_mapping.yml
+++ b/es_mapping.yml
@ -32,13 +32,19 @@ settings:
        filter:
          - lowercase
          - word_delimit
-          # These should be enough, as my_index_analyzer will match the rest
+          # Skip tokens shorter than N characters,
+          # since they're already indexed in the main field
+          - fullword_min

    filter:
      my_ngram:
        type: edgeNGram
        min_gram: 1
        max_gram: 15
+      fullword_min:
+        type: length
+        # Remember to change this if you change the max_gram below!
+        min: 16
      resolution:
        type: pattern_capture
        patterns: ["(\\d+)[xX](\\d+)"]