Commit Graph

16 Commits

Author SHA1 Message Date
queue 4fcef92b94
elasticsearch 7.x compatability (#576)
* es_mapping: update turning off dynamic mappings

they changed it in 6.x

https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic.html
https://github.com/elastic/elasticsearch/pull/25734

* es_mapping: remove _all field

deprecated in 6.0 anyway

* es_mapping.yml: fix deprecated mapping type

https://www.elastic.co/guide/en/elasticsearch/reference/6.7/removal-of-types.html#_schedule_for_removal_of_mapping_types

it gives a really unhelpful error otherwise, oof.

* es: fix remaining 7.xisms

the enabled: false apparently only applies to
"object" fields now, need index: false

and the _type got removed everywhere. Seems to work now.

* Fix weird offset error with word_delimiter_graph

yet another es7-ism i guess

* Fix warning and some app stuff for ES 7.x

Co-authored-by: Arylide <Arylide@users.noreply.github.com>
2020-07-12 00:10:47 -07:00
Anna-Maria Meriniemi 5c8b119611 config: Add Elasticsearch hosts (#492) 2018-07-09 22:26:23 -07:00
nyaadev 8f9400bb5f Revert "[Schema change] Torrents flags bitflag column to indexed columns (#471)"
This reverts commit 41a2a32f66.

Performs worse in some cases than what we had before.
2018-04-08 08:36:42 +02:00
A nyaa developer 41a2a32f66 [Schema change] Torrents flags bitflag column to indexed columns (#471)
* convert torrent table flags column from bitflag to independent indexed columns

* elasticsearch integration (untested)

* improve performance
2018-04-07 22:44:53 -07:00
TheAMM 81806d7bc9 Pad info_hash in ElasticSearch sync scripts
python-mysql-replication (or PyMySQL) would return less than 20 bytes
for info-hashes that had null bytes near the end, leaving incomplete
hashes in the ES index. Without delving too deep into the real issue
(be it lack of understanding MySQL storing binary data or a bug in
the libraries), thankfully we can just pad the fixed-size info-hashes
to be 20 bytes.

Padding in import_to_es.py may be erring on the side of caution, but
safe is established to be better than sorry.

(SQLAlchemy is unaffected by this bug)

Fixes #456
2018-02-25 15:12:35 +02:00
queue eceb8824dc sync_es: fix flush_interval behavior during slow times
instead of flushing every N seconds, it flushed N seconds after
the last change, which could drag out to N seconds * M batch size
if there are few updates. Practically this doesn't change anything
since stuff is always happening.

Also fix not writing a save point if nothing is happening. Also
practically does nothing, but for correctness.
2017-05-28 20:14:14 -06:00
queue 33852a55bf sync_es: die when killed 2017-05-28 20:02:20 -06:00
TheAMM 9cd6c506ae Update ElasticSeach index and scripts for comment_count 2017-05-26 16:12:47 +03:00
nyaadev 152e547ac5 Add flask-Migrate + alembic for automated database migrations.
Update some dependencies to their latest version.
Make executable scripts executable (chmod +x).
2017-05-21 17:47:16 +02:00
queue ea2160a49d sync_es: move io to separate threads, config json
throughput is definitely massively improved, testing locally.
hopefully it'll be enough.

config moved a separate file by ops request. lazy lazy
2017-05-21 00:55:19 -06:00
queue 6a4ad827c1 sync_es: instrument with statsd, improve logging
also fixed the save time loop and spaced it out
to 10k events instead of 100.

Notably, the event no. of rows caps out at around 5 by default
because of default -binlog-row-event-max-size=8192 in mysql; that's
how many (torrent) rows fit into a single event.

We could increase that, but instead I think it's finally time to finally
multithread this thing; both the binlog read and the ES POST shouldn't
use the GIL so it'll actually work.
2017-05-20 23:19:35 -06:00
aldacron f27cf17478 added timeout to import and sync es 2017-05-16 23:15:48 -07:00
queue e38fe2575a sync_es.py: bulk actions per binlog event
mainly helps with the stat updates, that come in
a single INSERT VALUES (...) ON CONFLICT UPDATE event,
which helpfully translates to a bulk index event.

It seems like elasticsearch should still be buffering that up
internally, so maybe the refresh_interval: 30s change will help
more than this.
2017-05-16 22:47:34 -06:00
aldacron 40c34e7df0 add stats as upsert in case of binlog not being sequential 2017-05-16 03:20:38 -07:00
aldacron 899aa01473 hooked up ES... 90% done, need to figure out how to generate magnet URIs 2017-05-15 23:51:58 -07:00
queue 32b9170a81 es: add sync_es script for binlog maintenance
lightly documented.
2017-05-15 01:32:56 -06:00