OOM #4943
Labels
No Label
1. kind/balancing
1. kind/breaking
1. kind/bug
1. kind/construction
1. kind/documentation
1. kind/enhancement
1. kind/griefing
1. kind/invalid
1. kind/meme
1. kind/node limit
1. kind/other
1. kind/protocol
2. prio/controversial
2. prio/critical
2. prio/elevated
2. prio/good first issue
2. prio/interesting
2. prio/low
3. source/art
3. source/client
3. source/engine
3. source/ingame
3. source/integration
3. source/lag
3. source/license
3. source/mod upstream
3. source/unknown
3. source/website
4. step/approved
4. step/at work
4. step/blocked
4. step/discussion
4. step/help wanted
4. step/needs confirmation
4. step/partially fixed
4. step/question
4. step/ready to deploy
4. step/ready to QA test
4. step/want approval
5. result/cannot reproduce
5. result/duplicate
5. result/fixed
5. result/maybe
5. result/wontfix
ugh/petz
ugh/QA main
ugh/QA NOK
ugh/QA OK
No Milestone
No project
No Assignees
6 Participants
Notifications
Due Date
No due date set.
Dependencies
No dependencies set.
Reference: your-land/bugtracker#4943
Loading…
Reference in New Issue
No description provided.
Delete Branch "%!s(<nil>)"
Deleting a branch is permanent. Although the deleted branch may continue to exist for a short time before it actually gets removed, it CANNOT be undone in most cases. Continue?
[711040.808986] oom-kill:constraint=CONSTRAINT_NONE,nodemask=(null),cpuset=/,mems_allowed=0,global_oom,task_memcg=/system.slice/cron.service,task=minetestserver,pid=1894647,uid=1001
[711040.809254] Out of memory: Killed process 1894647 (minetestserver) total-vm:54905192kB, anon-rss:38028856kB, file-rss:2304kB, shmem-rss:0kB, UID:1001 pgtables:106184kB oom_score_adj:0
[711044.436208] oom_reaper: reaped process 1894647 (minetestserver), now anon-rss:416kB, file-rss:792kB, shmem-rss:0kB
could turning on or tweaking (transparent) huge pages help here?
I assume this is because of the voice attack earlier ?
Before the server crashed I saw Ravise fighting with three Voicers, ran there and saw, they were all freeze. I could ran for minimum twenty seconds around there and out of the danger zone, because I am knowing what will happen soon, before the server goes down.
most minetest mods are pretty memory conscious, because luajit used to have a 1GiB limit. it no longer has that limit, but it'd be useful to see how much memory lua was using before the crash, before blaming a mod for this issue. we log how much memory lua is using every 30 seconds or something.
Most likely yes, that's what flux did in what logs "lua is using ... MB". But ...
Is there a way to see the memory allocation of a single mod?
What's lj_alloc_free and release_unused_segments ? What's gc_sweep?
we can see what lua has allocated. i doubt we have a major lua memory leak. we log how much memory the lua environment is using already, see #4249. if you've got privs, you can also see how much memory lua is using w/ the
/memory
command. i don't remember ever seeing YL getting above 1GiB.while implementing the spawnit mod, i've created a tool to approximate how much memory a particular data structure is using. i can't guarantee how accurate it is, and it's very expensive to use on complicated data structures.
see https://github.com/fluxionary/minetest-futil/blob/main/util/memory.lua
i don't know what any of these terms mean exactly rn. the terms seem like things i should get to know better. gc_sweep makes me think of java and lisp, though shrug
gc_sweep would probably be part of garbage collector in lua
Those 3 are luajit functions:
as comment says. Seems to come from "incremental mark and sweep garbage collection strategy".
No comment, but my guess: marks allocated memory chuck as unused and consolidates it with surrounding unused chunks.
Purely just speculation:
If we are sure (are we?) that it's not lua memory usage, maybe memory is leaking on C side of things, but caused by some lua code creating massive amounts of objects (that are allocated/freed lua-side?). Flux mentioned entites trying to spawn in inactive mapblocks and disappearing, can that be the cause? Maybe some mod spamming
add_entity()
on each globalstep if it fails? Or something similar?This would be pure coincidence, but I learned that
moremesecons_entity_detector
does not have any limit on range and people use it for mob farms and such. Since it just allocates a list for all entities it found and people use it with ranges like5000
and in the nether mob farms, could that be the culprit?Or am I missing something?
Since we discussed it in chat, Chache reported it:
#4988
UPD: tried creating massive amounts of detectors on the test, it lags the server, but can't see any other effects by just looking at dtime. Will need Alias to run perf and top or something X)
Not sure about specific implementation of lua GC, but garbage collectors often run in background threads or intermittently always. So could be just coincidence, although one of the symptoms of approaching close to memory consumption limit is that the GC is run often and sometimes take relatively longer to run trying hard to free some memory.
We spent whole day running an array of almost 2000 entity detectors on the test server, it caused dtime to heavily depend on number of entities active (as expected), but did not seem to cause any leaks - memory usage also just corresponded to number of mobs.
Kinda obvious results, but at least that was checked.
I'd like to add "the release of unused mapblocks" to the list of suspects.
The release of unused blocks is governed by this setting:
Our setting is
server_unload_unused_data_timeout = 900
, that's 15 minutes. So (after a restart) I teleported two accounts to random location with this script:This resulted in a growing number of blocks loaded into memory (and mapgen running wild) and also a growing memory usage, both in a linear fashion. Maximum memory was 21.5GB virtual and 19.4GB reserved memory for Minetest_test and maximum number of loaded blocks around 1 Million.
While the number of loaded mapblocks was pretty much after the initial "filling phase", the memory did not return to anything below those maximum values, even though I logged off both accounts.
perf top:
It is possible that the Testserver is still busy with generating mapblocks and storing them in the database. So until MapSector::getBlocks goes down to a more reasonable value, I'll wait and see what the memory does at that point.
Edit: 3 hours after, with noone on the server, the virtual memory is still on 21.3GB, the reserved memory on 6.8GB. For comparison, the Main server requires 14.5GB virtual and 10.9GB reserved memory.
such large ranges shouldn't have meaningful effect on long-term memory usage, but the performance considerations aren't great. see #3723 for relevant discussion.
perhaps we should ask the engine folks to add a maximum memory usage limit for unused mapblock storage?
There is
client_mapblock_limit
with default of 7500 regulating how many blocks can client keep in memory. Implementing similar measure on server should be straightforward ... or maybe not. Client can afford to just discard a block, as server will simply send it again. But server must save 'dirty' blocks to database properly, so block discarding may cause lot of disk writes ...the # of mapblocks is not the same thing as the memory they use, which can vary wildly. w/ all the metadata that gets stored, single mapblocks can become performance bombs.