Commit Graph

14 Commits

Author SHA1 Message Date
nyaadev 8f9400bb5f Revert "[Schema change] Torrents flags bitflag column to indexed columns (#471)"
This reverts commit 41a2a32f66.

Performs worse in some cases than what we had before.
2018-04-08 08:36:42 +02:00
A nyaa developer 41a2a32f66 [Schema change] Torrents flags bitflag column to indexed columns (#471)
* convert torrent table flags column from bitflag to independent indexed columns

* elasticsearch integration (untested)

* improve performance
2018-04-07 22:44:53 -07:00
TheAMM 81806d7bc9 Pad info_hash in ElasticSearch sync scripts
python-mysql-replication (or PyMySQL) would return less than 20 bytes
for info-hashes that had null bytes near the end, leaving incomplete
hashes in the ES index. Without delving too deep into the real issue
(be it lack of understanding MySQL storing binary data or a bug in
the libraries), thankfully we can just pad the fixed-size info-hashes
to be 20 bytes.

Padding in import_to_es.py may be erring on the side of caution, but
safe is established to be better than sorry.

(SQLAlchemy is unaffected by this bug)

Fixes #456
2018-02-25 15:12:35 +02:00
queue eceb8824dc sync_es: fix flush_interval behavior during slow times
instead of flushing every N seconds, it flushed N seconds after
the last change, which could drag out to N seconds * M batch size
if there are few updates. Practically this doesn't change anything
since stuff is always happening.

Also fix not writing a save point if nothing is happening. Also
practically does nothing, but for correctness.
2017-05-28 20:14:14 -06:00
queue 33852a55bf sync_es: die when killed 2017-05-28 20:02:20 -06:00
TheAMM 9cd6c506ae Update ElasticSeach index and scripts for comment_count 2017-05-26 16:12:47 +03:00
nyaadev 152e547ac5 Add flask-Migrate + alembic for automated database migrations.
Update some dependencies to their latest version.
Make executable scripts executable (chmod +x).
2017-05-21 17:47:16 +02:00
queue ea2160a49d sync_es: move io to separate threads, config json
throughput is definitely massively improved, testing locally.
hopefully it'll be enough.

config moved a separate file by ops request. lazy lazy
2017-05-21 00:55:19 -06:00
queue 6a4ad827c1 sync_es: instrument with statsd, improve logging
also fixed the save time loop and spaced it out
to 10k events instead of 100.

Notably, the event no. of rows caps out at around 5 by default
because of default -binlog-row-event-max-size=8192 in mysql; that's
how many (torrent) rows fit into a single event.

We could increase that, but instead I think it's finally time to finally
multithread this thing; both the binlog read and the ES POST shouldn't
use the GIL so it'll actually work.
2017-05-20 23:19:35 -06:00
aldacron f27cf17478 added timeout to import and sync es 2017-05-16 23:15:48 -07:00
queue e38fe2575a sync_es.py: bulk actions per binlog event
mainly helps with the stat updates, that come in
a single INSERT VALUES (...) ON CONFLICT UPDATE event,
which helpfully translates to a bulk index event.

It seems like elasticsearch should still be buffering that up
internally, so maybe the refresh_interval: 30s change will help
more than this.
2017-05-16 22:47:34 -06:00
aldacron 40c34e7df0 add stats as upsert in case of binlog not being sequential 2017-05-16 03:20:38 -07:00
aldacron 899aa01473 hooked up ES... 90% done, need to figure out how to generate magnet URIs 2017-05-15 23:51:58 -07:00
queue 32b9170a81 es: add sync_es script for binlog maintenance
lightly documented.
2017-05-15 01:32:56 -06:00