Go to file
queue 6a4ad827c1 sync_es: instrument with statsd, improve logging
also fixed the save time loop and spaced it out
to 10k events instead of 100.

Notably, the event no. of rows caps out at around 5 by default
because of default -binlog-row-event-max-size=8192 in mysql; that's
how many (torrent) rows fit into a single event.

We could increase that, but instead I think it's finally time to finally
multithread this thing; both the binlog read and the ES POST shouldn't
use the GIL so it'll actually work.
2017-05-20 23:19:35 -06:00
.github Update issue_template.md 2017-05-17 13:02:09 +03:00
configs moved some files 2017-05-17 00:53:54 -07:00
nyaa Clean up models.User.level helpers 2017-05-20 21:59:24 +03:00
torrent_cache Initial commit. 2017-05-12 20:51:49 +02:00
utils Point v2 upload script to v2 endpoint (oops) 2017-05-18 22:37:38 +03:00
.gitignore Make sure torrent backup directory exists before writing torrent 2017-05-13 02:41:52 +03:00
LICENSE Add license (GPLv3) 2017-05-13 01:03:42 +03:00
README.md README: Fix typos 2017-05-17 19:58:00 -07:00
WSGI.py Initial commit. 2017-05-12 20:51:49 +02:00
config.example.py hooked up ES... 90% done, need to figure out how to generate magnet URIs 2017-05-15 23:51:58 -07:00
create_es.sh hooked up ES... 90% done, need to figure out how to generate magnet URIs 2017-05-15 23:51:58 -07:00
db_create.py Ghetto migrate solution, in case you re-run db_create, it won't add categories again 2017-05-13 23:51:29 -07:00
es_mapping.yml updated indicies 2017-05-18 01:58:08 -07:00
import_to_es.py added timeout to import and sync es 2017-05-16 23:15:48 -07:00
lint.sh Initial commit. 2017-05-12 20:51:49 +02:00
requirements.txt hooked up ES... 90% done, need to figure out how to generate magnet URIs 2017-05-15 23:51:58 -07:00
run.py Initial commit. 2017-05-12 20:51:49 +02:00
sync_es.py sync_es: instrument with statsd, improve logging 2017-05-20 23:19:35 -06:00
trackers.txt removed explodie as suggested 2017-05-15 02:29:25 +02:00
uwsgi.ini Initial commit. 2017-05-12 20:51:49 +02:00

README.md

NyaaV2

Setup:

  • Create your virtualenv, for example with pyvenv venv
  • Enter your virtualenv with source venv/bin/activate
  • Install dependencies with pip install -r requirements.txt
  • Run python db_create.py to create the database
  • Start the dev server with python run.py

Updated Setup (python 3.6.1):

Setting up MySQL/MariaDB database for advanced functionality

  • Enable USE_MYSQL flag in config.py
  • Install latest mariadb by following instructions here https://downloads.mariadb.org/mariadb/repositories/
    • Tested versions: mysql Ver 15.1 Distrib 10.0.30-MariaDB, for debian-linux-gnu (x86_64) using readline 5.2
  • Run the following commands logged in as your root db user:
    • CREATE USER 'test'@'localhost' IDENTIFIED BY 'test123';
    • GRANT ALL PRIVILEGES ON * . * TO 'test'@'localhost';
    • FLUSH PRIVILEGES;
    • CREATE DATABASE nyaav2 DEFAULT CHARACTER SET utf8 COLLATE utf8_bin;
  • To setup and import nyaa_maria_vx.sql:
    • mysql -u <user> -p nyaav2
    • DROP DATABASE nyaav2;
    • CREATE DATABASE nyaav2 DEFAULT CHARACTER SET utf8 COLLATE utf8_bin;
    • SOURCE ~/path/to/database/nyaa_maria_vx.sql

Finishing up

  • Run python db_create.py to create the database
  • Load the .sql file
    • mysql -u user -p nyaav2
    • SOURCE cocks.sql
    • Remember to change the default user password to an empty string to disable logging in
  • Start the dev server with python run.py
  • Deactivate source deactivate

Enabling ElasticSearch

Basics

Enable MySQL Binlogging

  • Add the [mariadb] bin-log section to my.cnf and reload mysql server
  • Connect to mysql
  • SHOW VARIABLES LIKE 'binlog_format';
    • Make sure it shows ROW
  • Connect to root user
  • GRANT REPLICATION SLAVE ON *.* TO 'test'@'localhost'; where test is the user you will be running sync_es.py with

Setting up ES

  • Run ./create_es.sh and this creates two indicies: nyaa and sukebei
  • The output should show acknowledged: true twice
  • The safest bet is to disable the webapp here to ensure there's no database writes
  • Run python import_to_es.py with SITE_FLAVOR set to nyaa
  • Run python import_to_es.py with SITE_FLAVOR set to sukebei
  • These will take some time to run as it's indexing

Setting up sync_es.py

  • Sync_es.py keeps the ElasticSearch index updated by reading the BinLog
  • Configure the MySQL options with the user where you granted the REPLICATION permissions
  • Connect to MySQL, run SHOW MASTER STATUS;.
  • Copy the output to /var/lib/sync_es_position.json with the contents {"log_file": "FILE", "log_pos": POSITION} and replace FILENAME with File (something like master1-bin.000002) in the SQL output and POSITION (something like 892528513) with Position
  • Set up sync_es.py as a service and run it, preferably as the system/root
  • Make sure sync_es.py runs within venv with the right dependencies

Good to go!

  • After that, enable the USE_ELASTIC_SEARCH flag and restart the webapp and you're good to go

Code Quality:

  • Remember to follow PEP8 style guidelines and run ./lint.sh before committing.