a3x/nyaa - nyaa - Gitea: Git with a cup of tea

Commit Graph

Author	SHA1	Message	Date
queue	4fcef92b94	elasticsearch 7.x compatability (#576 ) * es_mapping: update turning off dynamic mappings they changed it in 6.x https://www.elastic.co/guide/en/elasticsearch/reference/current/dynamic.html https://github.com/elastic/elasticsearch/pull/25734 * es_mapping: remove _all field deprecated in 6.0 anyway * es_mapping.yml: fix deprecated mapping type https://www.elastic.co/guide/en/elasticsearch/reference/6.7/removal-of-types.html#_schedule_for_removal_of_mapping_types it gives a really unhelpful error otherwise, oof. * es: fix remaining 7.xisms the enabled: false apparently only applies to "object" fields now, need index: false and the _type got removed everywhere. Seems to work now. * Fix weird offset error with word_delimiter_graph yet another es7-ism i guess * Fix warning and some app stuff for ES 7.x Co-authored-by: Arylide <Arylide@users.noreply.github.com>	2020-07-12 00:10:47 -07:00
Anna-Maria Meriniemi	5c8b119611	config: Add Elasticsearch hosts (#492 )	2018-07-09 22:26:23 -07:00
nyaadev	8f9400bb5f	Revert "[Schema change] Torrents flags bitflag column to indexed columns (#471 )" This reverts commit `41a2a32f66`. Performs worse in some cases than what we had before.	2018-04-08 08:36:42 +02:00
A nyaa developer	41a2a32f66	[Schema change] Torrents flags bitflag column to indexed columns (#471 ) * convert torrent table flags column from bitflag to independent indexed columns * elasticsearch integration (untested) * improve performance	2018-04-07 22:44:53 -07:00
TheAMM	81806d7bc9	Pad info_hash in ElasticSearch sync scripts python-mysql-replication (or PyMySQL) would return less than 20 bytes for info-hashes that had null bytes near the end, leaving incomplete hashes in the ES index. Without delving too deep into the real issue (be it lack of understanding MySQL storing binary data or a bug in the libraries), thankfully we can just pad the fixed-size info-hashes to be 20 bytes. Padding in import_to_es.py may be erring on the side of caution, but safe is established to be better than sorry. (SQLAlchemy is unaffected by this bug) Fixes #456	2018-02-25 15:12:35 +02:00
queue	eceb8824dc	sync_es: fix flush_interval behavior during slow times instead of flushing every N seconds, it flushed N seconds after the last change, which could drag out to N seconds * M batch size if there are few updates. Practically this doesn't change anything since stuff is always happening. Also fix not writing a save point if nothing is happening. Also practically does nothing, but for correctness.	2017-05-28 20:14:14 -06:00
queue	33852a55bf	sync_es: die when killed	2017-05-28 20:02:20 -06:00
TheAMM	9cd6c506ae	Update ElasticSeach index and scripts for comment_count	2017-05-26 16:12:47 +03:00
nyaadev	152e547ac5	Add flask-Migrate + alembic for automated database migrations. Update some dependencies to their latest version. Make executable scripts executable (chmod +x).	2017-05-21 17:47:16 +02:00
queue	ea2160a49d	sync_es: move io to separate threads, config json throughput is definitely massively improved, testing locally. hopefully it'll be enough. config moved a separate file by ops request. lazy lazy	2017-05-21 00:55:19 -06:00
queue	6a4ad827c1	sync_es: instrument with statsd, improve logging also fixed the save time loop and spaced it out to 10k events instead of 100. Notably, the event no. of rows caps out at around 5 by default because of default -binlog-row-event-max-size=8192 in mysql; that's how many (torrent) rows fit into a single event. We could increase that, but instead I think it's finally time to finally multithread this thing; both the binlog read and the ES POST shouldn't use the GIL so it'll actually work.	2017-05-20 23:19:35 -06:00
aldacron	f27cf17478	added timeout to import and sync es	2017-05-16 23:15:48 -07:00
queue	e38fe2575a	sync_es.py: bulk actions per binlog event mainly helps with the stat updates, that come in a single INSERT VALUES (...) ON CONFLICT UPDATE event, which helpfully translates to a bulk index event. It seems like elasticsearch should still be buffering that up internally, so maybe the refresh_interval: 30s change will help more than this.	2017-05-16 22:47:34 -06:00
aldacron	40c34e7df0	add stats as upsert in case of binlog not being sequential	2017-05-16 03:20:38 -07:00
aldacron	899aa01473	hooked up ES... 90% done, need to figure out how to generate magnet URIs	2017-05-15 23:51:58 -07:00
queue	32b9170a81	es: add sync_es script for binlog maintenance lightly documented.	2017-05-15 01:32:56 -06:00

16 Commits