Please separate compound fields and add modification date to the database #4526

Open
opened 2023-05-18 16:38:09 +00:00 by AliasAlreadyTaken · 8 comments

Problem

Currently a stored mapblock is an opaque amalgamation of blocks, entities, metadata and more in the database. Without in depth knowledge of the format it is hard to select, debug or - when issues arise - modify. Not having compound fields is a widespread idea when it comes to databases.

Solutions

Please separate block data, entity data and metadata (like modification_date and creation_date). Maybe even node meta and everything else you could find that allows for reasonable separation.

Alternatives

A similar smaller approach was discussed here: https://github.com/minetest/minetest/issues/10671

Not sure if only mapblocks are affected, but those are the most mysterious to me ;)

A modification_date and creation_date already exist on the player table, I'd like those added to the others as well. Sure, I could do that myself from database POV, but IMO everyone who wants to maintain their DB needs those values on most, if not every table.

Additional context

This would break compat with older maps yet again, so - if implemented - needs much more thought than I most likely put into this request, by smarter people than me, in an effort to only break compat once and then hopefully live with it for a while.

Adding more fields might make the database use more storage for the same amount of world.

A default modification date could defend against all those "my harddisk was full, now my sqlite database is broken" and similar. Throwing away entities for /clearobjects would result in deleting the entity field, instead of having to unpack the mapblock, doing its thing and repacking.

## Problem Currently a stored mapblock is an opaque amalgamation of blocks, entities, metadata and more in the database. Without in depth knowledge of the format it is hard to select, debug or - when issues arise - modify. Not having compound fields is a widespread idea when it comes to databases. ## Solutions Please separate block data, entity data and metadata (like modification_date and creation_date). Maybe even node meta and everything else you could find that allows for reasonable separation. ## Alternatives A similar smaller approach was discussed here: https://github.com/minetest/minetest/issues/10671 Not sure if only mapblocks are affected, but those are the most mysterious to me ;) A modification_date and creation_date already exist on the player table, I'd like those added to the others as well. Sure, I could do that myself from database POV, but IMO everyone who wants to maintain their DB needs those values on most, if not every table. ## Additional context This would break compat with older maps yet again, so - if implemented - needs much more thought than I most likely put into this request, by smarter people than me, in an effort to only break compat once and then hopefully live with it for a while. Adding more fields might make the database use more storage for the same amount of world. A default modification date could defend against all those "my harddisk was full, now my sqlite database is broken" and similar. Throwing away entities for /clearobjects would result in deleting the entity field, instead of having to unpack the mapblock, doing its thing and repacking.
flux added the
2. prio/interesting
label 2023-05-18 17:48:51 +00:00
Member

while i'm theoretically in support of this, there's some technical issues that need to be addressed. the data in mapblocks are deliberately chunked together so that they can be compressed more efficiently.

but there's certainly an argument to be made that chunking heterogenous data like node IDs, node metadata, and entity data, are not actually a good target for compression at the same time

while i'm theoretically in support of this, there's some technical issues that need to be addressed. the data in mapblocks are deliberately chunked together so that they can be compressed more efficiently. but there's certainly an argument to be made that chunking heterogenous data like node IDs, node metadata, and entity data, are not actually a good target for compression at the same time
Author
Owner

When it's a question of compression, I doubt entities and metadata and blocks have any overlap in their compression dictionaries (assuming the modern compression algos still use such a thing)

When it's a question of compression, I doubt entities and metadata and blocks have any overlap in their compression dictionaries (assuming the modern compression algos still use such a thing)
Member

I don't know enough about MT or modern DBs, so take this with with a grain (or lots) of salt:

If we assume that mapblocks are loaded only when someone is near that block and saved when player leaves - then in this scenario both nodes and entities inside a mapblock always load/unload together.
And if nodes/ents are loaded/unloaded always at the same time, then it makes sense to keep them closer "on a disk" too. And you can't get much closer than just putting them in one blob: you get both with just one seek() and one read().
There is probably a way to force RDBMS to optimize for this, but it's beyond my knowledge. (I tried searching "interleaving" or something, I don't really know)

Just my assumption about why the blob may make sense.

I don't know enough about MT or modern DBs, so take this with with a grain (or lots) of salt: If we assume that mapblocks are loaded only when someone is near that block and saved when player leaves - then in this scenario both nodes and entities inside a mapblock always load/unload together. And if nodes/ents are loaded/unloaded always at the same time, then it makes sense to keep them closer "on a disk" too. And you can't get much closer than just putting them in one blob: you get both with just one `seek()` and one `read()`. There is probably a way to force RDBMS to optimize for this, but it's beyond my knowledge. ~~(I tried searching "interleaving" or something, I don't really know)~~ Just my assumption about why the blob may make sense.

Alias: When it's a question of compression, I doubt entities and metadata and blocks have any overlap in their compression dictionaries (assuming the modern compression algos still use such a thing)

They do overlap because the compression algorithm doesn´t care what kind of data it gets or what they are used for as long as it can find patterns in that data that get repeated.
Thats why some algorithms have a static dictionary of common patterns.

whosit: If we assume that mapblocks are loaded only when someone is near that block and saved when player leaves - then in this scenario both nodes and entities inside a mapblock always load/unload together.

Yep in that case it wouldn´t make sense to separate them and it would increase the size cause now you got 2 dictionaries etc to save.
Minetest already caches changes in memory and only sends the whole mapblock at a fixed interval to the DB to reduce load and to make sure changes that happened between loading and unloading aren´t completely lost if something goes wrong.

In general it is a complex thing to balance with tradeoffs.
For example
a mob walking through a mapblock

  • only entities changed, no need to save nodes data
    yeah but most mapblocks contain plants that grow or nodes that change based on daytime etc
  • ok lets check and if save node data too
    do we really need to save a whole mapblock(4096 nodes) cause 1 plant node changed
  • ....
    In the end you might come up with a acceptable balance between all those cases.
    Just to notice that it works great with backend "A" but has poor performance on "B" etc.
>Alias: When it's a question of compression, I doubt entities and metadata and blocks have any overlap in their compression dictionaries (assuming the modern compression algos still use such a thing) They do overlap because the compression algorithm doesn´t care what kind of data it gets or what they are used for as long as it can find patterns in that data that get repeated. Thats why some algorithms have a static dictionary of common patterns. >whosit: If we assume that mapblocks are loaded only when someone is near that block and saved when player leaves - then in this scenario both nodes and entities inside a mapblock always load/unload together. Yep in that case it wouldn´t make sense to separate them and it would increase the size cause now you got 2 dictionaries etc to save. Minetest already caches changes in memory and only sends the whole mapblock at a fixed interval to the DB to reduce load and to make sure changes that happened between loading and unloading aren´t completely lost if something goes wrong. In general it is a complex thing to balance with tradeoffs. For example a mob walking through a mapblock - only entities changed, no need to save nodes data yeah but most mapblocks contain plants that grow or nodes that change based on daytime etc - ok lets check and if save node data too do we really need to save a whole mapblock(4096 nodes) cause 1 plant node changed - .... In the end you might come up with a acceptable balance between all those cases. Just to notice that it works great with backend "A" but has poor performance on "B" etc.
Member

some links

current mapblock storage format documentation: 180ec92ef9/doc/world_format.txt (L230-L475)

map parser in go: https://pkg.go.dev/github.com/minetest-go/mapparser

rust library: https://lib.rs/crates/minetestworld

i thought there was a python library somewhere, but i can't currently find it.

postgres can have compressed tablespaces, obviating the need to store compressed data in the db https://postgrespro.com/docs/enterprise/12/cfs-usage

there's also citus, which seems capable of creating compressed columns, though it seems like it's aimed at professional DBAs and might be hard to set up.

some links current mapblock storage format documentation: https://github.com/minetest/minetest/blob/180ec92ef982d9fb5c6bdc789f381335f77823c1/doc/world_format.txt#L230-L475 map parser in go: https://pkg.go.dev/github.com/minetest-go/mapparser rust library: https://lib.rs/crates/minetestworld i thought there was a python library somewhere, but i can't currently find it. postgres can have compressed [tablespaces](https://www.postgresql.org/docs/current/manage-ag-tablespaces.html), obviating the need to store compressed data in the db https://postgrespro.com/docs/enterprise/12/cfs-usage there's also [citus](https://docs.citusdata.com/en/stable/), which seems capable of creating compressed *columns*, though it seems like it's aimed at professional DBAs and might be hard to set up.
Member

i think trying to make major structural changes to the minetest database would require a huge effort, particularly if you want to create a way to migrate old databases to the new format. you'd have to write new code for every possible map backend as well.

now you got 2 dictionaries etc to save.

i dug into the mapblock format again, and it's not the whole thing that's compressed, it just compresses the node data and node metadata, and it compresses those separately.

i think trying to make major structural changes to the minetest database would require a huge effort, particularly if you want to create a way to migrate old databases to the new format. you'd have to write new code for every possible map backend as well. > now you got 2 dictionaries etc to save. i dug into the mapblock format again, and it's not the whole thing that's compressed, it just compresses the node data and node metadata, and it compresses those separately.

And here we go again

now you got 2 dictionaries etc to save.

i dug into the mapblock format again, and it's not the whole thing that's compressed, it just compresses the node data and node metadata, and it compresses those separately.

You might also want to dig into my comment again and hopefully notice this time that it was a reply to whosit assumption and I never said minetest does that.
But yes with that carefully quoted snippet of my sentence/text it might appear so.

And here we go again > > now you got 2 dictionaries etc to save. > > i dug into the mapblock format again, and it's not the whole thing that's compressed, it just compresses the node data and node metadata, and it compresses those separately. You might also want to dig into my comment again and hopefully notice this time that it was a reply to whosit assumption and I never said minetest does that. But yes with that carefully quoted snippet of my sentence/text it might appear so.
Member

You might also want to dig into my comment again and hopefully notice this time that it was a reply to whosit assumption and I never said minetest does that.
But yes with that carefully quoted snippet of my sentence/text it might appear so.

apologies, i wasn't meaning to correct you, i was just trying to make it clear what was currently happening. i didn't fully remember myself.

> You might also want to dig into my comment again and hopefully notice this time that it was a reply to whosit assumption and I never said minetest does that. > But yes with that carefully quoted snippet of my sentence/text it might appear so. apologies, i wasn't meaning to correct you, i was just trying to make it clear what was currently happening. i didn't fully remember myself.
Sign in to join this conversation.
No Milestone
No project
No Assignees
4 Participants
Notifications
Due Date
The due date is invalid or out of range. Please use the format 'yyyy-mm-dd'.

No due date set.

Dependencies

No dependencies set.

Reference: your-land/bugtracker#4526
No description provided.