spacer.png, 0 kB
  May 18, 2013, 23:29:21

 
Welcome, Guest. Please login or register.
Did you miss your activation email?

Login with username, password and session length


Pages: 1 2 [3]
  Print  
Author Topic: Some thoughts on possible ways to speed up set generation in the future  (Read 7128 times)
thorwak

Posts: 202


View Profile
« Reply #50 on: May 01, 2010, 17:57:14 »

It wouldn't be enough with a one huge archive per group, as it turns out. The 50M headers, that took over 48 hours to generate sets for, have another problem: Browsing it is (mostly) fine, it takes a few seconds to return search result. Pretty good. But when I select a download from that group it takes about an HOUR to just collect all the needed parts before it can even start downloading. I guess it has to go through everything again somehow.

This can be ignored - Currently testing on 30M tables and there is not the slightest hint towards this direction. Like you suggested Spearhead it was most likely an index problem - my guess would be keys was disabled because of set re-generation in progress. This will never be a problem in 1.0.5 and works fine in SVN.

Another problem down Smiley
Logged
thorwak

Posts: 202


View Profile
« Reply #51 on: May 24, 2010, 22:58:41 »

Bumping this thread a bit.. I commited yet another updated experimental gensets to testing branch (r1582)

5.4M headers: 5 mins 6 secs
50M headers: 3 hours 5 mins.

Also, I managed to get rid of the initial slow count query and still keep the status bar / ETA Smiley

It's still not completely linear, but seems faster than 1.0.4 even on small groups, FINALLY, which has been an annoyance so far with the new approach. If you have the option, try comparing numbers with current normal SVN and perhaps also 1.0.4 stable.

Something happens after 5-10M headers has been processed - chunk process speed drops from about 3-5 secs/chunk to 15-20 secs/chunk. I am now convinced this has to do with MySQL itself, or perhaps the underlying FS (ext4 in my case). Perhaps as simple as the indexes are too big to be completely cached on my system.

Still, it's actually a viable option now to DL ALL headers in a group at once, with full retention, and still not having to wait for many weeks or even months for indexing to finish (but quite a few hours). Also, when the drop in speed happens, it's quite sudden, and then it doesn't really go any worse than that. Would have to test with 500M header or something to be completely sure, but this seems to be the case as far as I can tell from the little testing I have done.

Things are moving forward Smiley

Lots of testing is needed though, but looking good so far. Note that this is not in the normal SVN but in the testing branch.

Users looking for "danger" or helping out in general are encouraged to check out the testing branch and post some numbers after testing various amount of headers, check sets integrity and so on, that would be a big help to me. But please don't use your production DB. This is alpha code, at best, with all that that brings.

« Last Edit: May 25, 2010, 00:12:45 by thorwak » Logged
Pages: 1 2 [3]
  Print  
 
Jump to:  

Powered by SMF 1.1.11 | SMF © 2006-2009, Simple Machines LLC
Amigri by Fakdordes
spacer.png, 0 kB
spacer.png, 0 kB
spacer.png, 0 kB