Before, long.tokens.with.dots.or.dashes would get edgengrammed up to the
ngram limit, so we'd get to long.tokens.wit which would then be split -
discarding "with.dots.or.dashes" completely. The fullword index would
keep the complete large token, but without any ngramming, so incomplete
searches (like "tokens") would not match it, only the full token.
Now, we split words before ngramming them, so the main index will
properly handle words up to the ngram limit. The fullword index will
still handle the longer words for non-ngram matching.
Also optimized away duplicate tokens from the indices (since we rely on
boolean matching, not scoring) to save a couple megabytes of space.
* added es_sync_config.json and minified js to .gitignore
* reordered README.md to reflect that MySQL Binlogging must be enabled before running import_to_es.py
* Extend ES term preprocessing for OR groups
Implements handling "foo"|"bar" literal OR groups in the Elasticsearch
term preprocessor. Groups can be negated with -, but don't mesh with
precedence (like plain literals).
This is a partial hack, the real solution would be to parse the entire
search terms ourselves, with AND and OR groups, negations etc. But
having that work neatly with the simple_query_string would be bit of a
hassle.
* Update help.html search tips
since search (quoting strings) has changed a bit.
* Optimize Elasticsearch fullword field
Since the main display_name field ngrams words up to 15 characters,
anything to and under that will already be indexed - the fullword field
(which we have for words longer than 15 characters) needs to index only
words longer than that.
* Preprocess ES terms for better literal matching
This commit adds a new .exact subfield to display_name, which holds a
barely-filtered version of the original title we can do "literal"
matching against. This is not real substring matching, but quoting
terms now actually does something!
Implements a simple preprocessor for the search terms to extract quoted
parts from the search terms, optionally prefixed with - to negate them.
The preprocessor will create a query that'll join all three query-types:
the simple_query_string, must-phrases and must-not-phrases.
Hitting the cancel button does not return "", but null. Therefore
the toLowerCase() fails, and throwing an exception means "sure go
ahead submitting this" to JS for some godforsaken reason.
Just remove the toLowerCase for now, have people type the names
properly.
* Use Flask-Assets to minify self-hosted JS files
By having Flask-Assets minify the two JS files we ship, namely
main.js and bootstrap-select.js, we can shave off 28406 bytes.
The minified files are generated on startup. If one wishes to
manually clean them up or build them, they can use the
"flask assets" management command, e.g. "flask assets clean".
* Workaround to fix tests
State carries over in tests, which is the dumbest shit ever. Fix it
by clearing the bundles before setting them.
* Implement comment locking
This adds a new flags to torrents, which is only editable by
moderators and admins. If checked, it does not allow unprivileged
users to post, edit or delete comments on that torrent.
* Rename "locked" to "comment_locked".
* Shorter button and additional words on alt text
* Admin log: Change comment locking message
dude I love bikeshedding xd
* Bikeshedding over admin log messages
* >&
Also some bikeshedding
I currently don't differentiate between "trusted" markdown and
untrusted, but this should be good enough. Basically tells the
browser not to send a referrer, and (not sure if relevant here)
not to expose a window opener object. Also tells search engines
that the link is not endorsed with "nofollow".
This started out as a simple rebase, but then I rebased the wrong
branches and it all got confusing, so here it is as a new dank
commit.
We now have an @admin_only decorator, and we ask for confirmation
before we nuke. We can also see the nuke button when users are
banned, and nuking is a separate endpoint with a separate form.
Additionally, it now uses the new tracker API.
python-mysql-replication (or PyMySQL) would return less than 20 bytes
for info-hashes that had null bytes near the end, leaving incomplete
hashes in the ES index. Without delving too deep into the real issue
(be it lack of understanding MySQL storing binary data or a bug in
the libraries), thankfully we can just pad the fixed-size info-hashes
to be 20 bytes.
Padding in import_to_es.py may be erring on the side of caution, but
safe is established to be better than sorry.
(SQLAlchemy is unaffected by this bug)
Fixes#456
If users kept their page open for a while before reporting a
torrent, and mods got it in the meantime, users could still
submit reports for that torrent. This is silly and really doesn't
need to happen.
* Clean up PR #349
- Rely on os.makedirs(..., exist_ok=True) for "thread"-safety
- Remove the previous info_dict when we know the transaction went through.
- bytes.hex() will always be lowercase (unless we go off CPython):
c3d9508ff2/Python/pystrhex.c (L5-L49)c3d9508ff2/Python/codecs.c (L16)
- Reintroduce comments and meaningful creation dates in generated torrents:
Also make create_default_metadata_base set the correct metadata now
Because reading warnings is overrated.
This does not fix people using custom domains, but it's more likely
they'll know what's up when their email is thrown into the void.
Fixes#437.
Used TruePNG and zopflipng to optimise the images even more,
saving a whopping 4073 bytes.
The optimisation is lossless, i.e. the decoded pixel values do not
change at all.
With all trackers.txt trackers being included in generated .torrents,
we can now be certain the magnet (which use trackers.txt) and the .torrent
uses will not be split up in different swarms in case the main announce dies.
(That is, if uploaders add enough of their own trackers and additional trackers
were deemed unnecessary (at least 5 already), the magnet and .torrent would only
share the main site announce)