performance: async worker threads #6417

Closed
opened 2024-03-08 01:07:08 +00:00 by flux · 8 comments
Member

i've got an idea. i think the reason why server performance has been bad when there's lots of players and spawnit is running, is that "The engine will scale the amount of worker threads automatically". there's no way to specify a maximum number of threads. i think this eventually causes the async worker threads to starve the main server process - lag gets bad not because the main thread is trying to do too much, but because there's just not enough available CPU time given the competing processes. i'm not sure how to adapt to that, will try talking to the core devs.

i've got an idea. i think the reason why server performance has been bad when there's lots of players and spawnit is running, is that "The engine will scale the amount of worker threads automatically". there's no way to specify a maximum number of threads. i think this eventually causes the async worker threads to starve the main server process - lag gets bad not because the main thread is trying to do too much, but because there's just not enough available CPU time given the competing processes. i'm not sure how to adapt to that, will try talking to the core devs.
flux added the
1. kind/other
3. source/lag
labels 2024-03-08 01:07:08 +00:00
Author
Member

yeah, verified that the max value is the number of "processors" (i.e. the number listed in /proc/cpuinfo). particularly, with cpu-bound tasks on a processor with hyperthreading, this over-allocates resources even without accounting for the main processing thread. then there's also the mapgen threads, and the database, and whatever else the system is trying to do...

yeah, verified that the max value is the number of "processors" (i.e. the number listed in /proc/cpuinfo). particularly, with cpu-bound tasks on a processor with hyperthreading, this over-allocates resources even without accounting for the main processing thread. then there's also the mapgen threads, and the database, and whatever else the system is trying to do...
flux added the
3. source/engine
label 2024-03-08 01:11:02 +00:00
flux added this to the flux's TODO list project 2024-03-08 01:54:12 +00:00
Author
Member

thinking more, this is still weird, as spawnit limits the number of jobs queued for async processing. it still shouldn't depend on the number of players...

note to self: try tweaking max_queue_size from 300 to 30 when the server is busy, and see if that does anything.

thinking more, this is still weird, as spawnit limits the number of jobs queued for async processing. it still shouldn't depend on the number of players... note to self: try tweaking `max_queue_size` from 300 to 30 when the server is busy, and see if that does anything.

Feel free to add any logging or ask me for readouts of whatever linux has in the toolbox to see what's happening. Looking at htop, yes, sometimes all processes are a bit busy, but the machine is far from its limit.

grafik

Feel free to add any logging or ask me for readouts of whatever linux has in the toolbox to see what's happening. Looking at htop, yes, sometimes all processes are a bit busy, but the machine is far from its limit. ![grafik](/attachments/96cb88ed-a879-4b6f-a2b2-9dee4f41ce91)
214 KiB
Author
Member

try tweaking max_queue_size

i tried tweaking that and a couple other parameters while the server was running, and it had no effect on average dtime. not every setting can currently be changed on the fly, though.

> try tweaking `max_queue_size` i tried tweaking that and a couple other parameters while the server was running, and it had no effect on average dtime. not every setting can currently be changed on the fly, though.
Author
Member

the machine is far from its limit.

yeah, that seems to indicate this theory is wrong.

> the machine is far from its limit. yeah, that seems to indicate this theory is wrong.

Is there a command that would stop the entire spawnit process and empty the queue? With this command we could 100% prove it's spawnit - or not.

Is there a command that would stop the entire spawnit process and empty the queue? With this command we could 100% prove it's spawnit - or not.
Author
Member

Is there a command that would stop the entire spawnit process and empty the queue? With this command we could 100% prove it's spawnit - or not.

it doesn't empty the queue, but w/ the next update, spawnit will have a command to totally disable all its major actions.

> Is there a command that would stop the entire spawnit process and empty the queue? With this command we could 100% prove it's spawnit - or not. it doesn't empty the queue, but w/ the next update, spawnit will have a command to totally disable all its major actions.
Author
Member

disabling spawnit completely and letting the queue run out doesn't result in any significant effect on lag. i didn't run statistics, and it varies a lot from moment to moment, but it generally varied within the same bounds, just watching the numbers. i ran the tests several times in both moderate (ratio of ~5) and high (ratio of 20 or more) lag.

so i'm closing this as "can't reproduce".

disabling spawnit completely and letting the queue run out doesn't result in any significant effect on lag. i didn't run statistics, and it varies a lot from moment to moment, but it generally varied within the same bounds, just watching the numbers. i ran the tests several times in both moderate (ratio of ~5) and high (ratio of 20 or more) lag. so i'm closing this as "can't reproduce".
flux closed this issue 2024-03-31 18:17:33 +00:00
flux added the
5. result/cannot reproduce
label 2024-03-31 18:17:42 +00:00
flux removed this from the flux's TODO list project 2024-03-31 18:17:46 +00:00
Sign in to join this conversation.
No Milestone
No project
No Assignees
2 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: your-land/bugtracker#6417
No description provided.