performance: async worker threads #6417

New Issue

flux · 2024-03-08T01:07:08Z

flux commented

2024-03-08 01:07:08 +00:00

i've got an idea. i think the reason why server performance has been bad when there's lots of players and spawnit is running, is that "The engine will scale the amount of worker threads automatically". there's no way to specify a maximum number of threads. i think this eventually causes the async worker threads to starve the main server process - lag gets bad not because the main thread is trying to do too much, but because there's just not enough available CPU time given the competing processes. i'm not sure how to adapt to that, will try talking to the core devs.

flux added the

1. kind/other

3. source/lag

labels 2024-03-08 01:07:08 +00:00

flux commented

2024-03-08 01:10:48 +00:00

yeah, verified that the max value is the number of "processors" (i.e. the number listed in /proc/cpuinfo). particularly, with cpu-bound tasks on a processor with hyperthreading, this over-allocates resources even without accounting for the main processing thread. then there's also the mapgen threads, and the database, and whatever else the system is trying to do...

flux added the

3. source/engine

label 2024-03-08 01:11:02 +00:00

flux added this to the flux's TODO list project 2024-03-08 01:54:12 +00:00

flux commented

2024-03-08 02:22:59 +00:00

thinking more, this is still weird, as spawnit limits the number of jobs queued for async processing. it still shouldn't depend on the number of players...

note to self: try tweaking max_queue_size from 300 to 30 when the server is busy, and see if that does anything.

thinking more, this is still weird, as spawnit limits the number of jobs queued for async processing. it still shouldn't depend on the number of players... note to self: try tweaking `max_queue_size` from 300 to 30 when the server is busy, and see if that does anything.

flux referenced this issue

2024-03-08 02:25:14 +00:00

whosit reports: led_marquee: seems to ignore m ... #6351

AliasAlreadyTaken commented

2024-03-09 00:14:17 +00:00

Feel free to add any logging or ask me for readouts of whatever linux has in the toolbox to see what's happening. Looking at htop, yes, sometimes all processes are a bit busy, but the machine is far from its limit.

Feel free to add any logging or ask me for readouts of whatever linux has in the toolbox to see what's happening. Looking at htop, yes, sometimes all processes are a bit busy, but the machine is far from its limit. ![grafik](/attachments/96cb88ed-a879-4b6f-a2b2-9dee4f41ce91)

grafik.png

214 KiB

flux commented

2024-03-09 23:35:58 +00:00

try tweaking max_queue_size

i tried tweaking that and a couple other parameters while the server was running, and it had no effect on average dtime. not every setting can currently be changed on the fly, though.

> try tweaking `max_queue_size` i tried tweaking that and a couple other parameters while the server was running, and it had no effect on average dtime. not every setting can currently be changed on the fly, though.

flux commented

2024-03-09 23:40:57 +00:00

the machine is far from its limit.

yeah, that seems to indicate this theory is wrong.

> the machine is far from its limit. yeah, that seems to indicate this theory is wrong.

AliasAlreadyTaken commented

2024-03-10 14:29:02 +00:00

Is there a command that would stop the entire spawnit process and empty the queue? With this command we could 100% prove it's spawnit - or not.

flux commented

2024-03-11 19:03:23 +00:00

Is there a command that would stop the entire spawnit process and empty the queue? With this command we could 100% prove it's spawnit - or not.

it doesn't empty the queue, but w/ the next update, spawnit will have a command to totally disable all its major actions.

> Is there a command that would stop the entire spawnit process and empty the queue? With this command we could 100% prove it's spawnit - or not. it doesn't empty the queue, but w/ the next update, spawnit will have a command to totally disable all its major actions.

👍 1

flux commented

2024-03-31 18:17:33 +00:00

disabling spawnit completely and letting the queue run out doesn't result in any significant effect on lag. i didn't run statistics, and it varies a lot from moment to moment, but it generally varied within the same bounds, just watching the numbers. i ran the tests several times in both moderate (ratio of ~5) and high (ratio of 20 or more) lag.

so i'm closing this as "can't reproduce".

disabling spawnit completely and letting the queue run out doesn't result in any significant effect on lag. i didn't run statistics, and it varies a lot from moment to moment, but it generally varied within the same bounds, just watching the numbers. i ran the tests several times in both moderate (ratio of ~5) and high (ratio of 20 or more) lag. so i'm closing this as "can't reproduce".

👍 1

flux closed this issue

2024-03-31 18:17:33 +00:00

flux added the

5. result/cannot reproduce

label 2024-03-31 18:17:42 +00:00

flux removed this from the flux's TODO list project 2024-03-31 18:17:46 +00:00

Sign in to join this conversation.