| Time |
Nick |
Message |
| 01:05 |
|
Eragon joined #luanti |
| 01:57 |
|
Verticen joined #luanti |
| 01:59 |
|
Verticen_ joined #luanti |
| 02:05 |
|
amfl2 joined #luanti |
| 02:14 |
|
dabbill joined #luanti |
| 02:20 |
|
lemonzest joined #luanti |
| 03:10 |
repetitivestrain |
tx_miner: why is local ipairs = ipairs the first line... because this copies the global ipairs into an upvalue and removes a specialized reference to the global environment from hot paths, which yields a measurable performance improvement in loops that execute for millions if not tens of millions of times per chunk |
| 03:11 |
[MatrxMT] |
<Blockhead256> yes, "localizing" variables was discussed |
| 03:11 |
repetitivestrain |
luajit will not localize `ipairs' by itself, for doing so would interfere with lua's semantics, but it performs table reference specialization on the global environment |
| 03:12 |
repetitivestrain |
however an HREFK is still six to seven instructions and every one of the functions there is liable to be invoked millions of times per chunk, and localizing all such cfuncs into upvalues yields a very appreciable performance increase |
| 03:43 |
MTDiscord |
<tx_miner> that was my question |
| 03:43 |
MTDiscord |
<tx_miner> but its not my code\ |
| 03:43 |
repetitivestrain |
i'm answering your question, yes |
| 03:43 |
repetitivestrain |
it's for performance |
| 04:00 |
|
MTDiscord joined #luanti |
| 04:43 |
pgimeno |
ipairs does not run on every loop iteration though |
| 04:43 |
pgimeno |
in my tests, numeric loops are usually faster than ipairs |
| 04:44 |
pgimeno |
I've have to retest, though, as the last time I tried was with LuaJIT 2.0 |
| 04:47 |
repetitivestrain |
pgimeno: the loops initialized with ipairs run on every iteration |
| 04:48 |
repetitivestrain |
of the terrain sampling loop |
| 04:48 |
repetitivestrain |
also, ipairs and numeric loops generate identical or nearly identical assembly |
| 04:48 |
repetitivestrain |
they cannot be faster or slower than each other, provided that the tables provided are indeed arrays |
| 04:50 |
repetitivestrain |
https://paste.debian.net/1401901/ |
| 04:50 |
repetitivestrain |
e.g., running this with luajit foo.lua yields: |
| 04:51 |
repetitivestrain |
running this four separate times with luajit yields: 36.521, 37.824, then 37.187, 36.856, and 36.775, 36.745, and 37.202, 37.001 |
| 05:02 |
|
SFENCE_arch joined #luanti |
| 05:13 |
|
FeXoR joined #luanti |
| 05:55 |
pgimeno |
the loops run on every iteration (of course, that's what loops do), but the ipairs function itself is called just once per loop, not per iteration, so it makes little sense to localize it |
| 05:57 |
repetitivestrain |
pgimeno: the loops themselves are _initialized_ on every iteration of the terrain sampling loop |
| 05:57 |
cheapie |
I tested with repetitivestrain's benchmark code, localization was only like a 5% speed boost |
| 05:57 |
repetitivestrain |
5% or 1.3 seconds is already 65 ms |
| 05:58 |
repetitivestrain |
of 1.3 seconds* |
| 05:58 |
repetitivestrain |
and the loops in question are short enough that luajit unrolls most of them, so that there is no loop at all, but an unnecessary HREFK and comparison against ipairs_aux before they are entered |
| 06:00 |
repetitivestrain |
https://codeberg.org/mineclonia/mineclonia/src/branch/main/mods/MAPGEN/mcl_levelgen/terrain.lua#L974 |
| 06:00 |
repetitivestrain |
here, for example |
| 06:03 |
repetitivestrain |
on a completely idle system, for instance, generating 60 mapchunks in a standalone testing environment requires 44 seconds with all instances of ipairs localized, and 47 without |
| 06:05 |
|
nekobit joined #luanti |
| 06:11 |
|
nekobit joined #luanti |
| 06:13 |
pgimeno |
I consistently get a 10% slowdown when using ipairs here: http://www.formauri.es/personal/pgimeno/pastes/benchmark-ipairs.lua |
| 06:15 |
pgimeno |
and the localization of ipairs does not influence the outcome |
| 06:16 |
repetitivestrain |
obviously, since your loop only executes once |
| 06:17 |
repetitivestrain |
but your benchmark is far too fast to yield any meaningful results. over five runs, i received: 7.1e-05, 7e-05, then 7.1e-05, and 7.2e-05, and 7.2e-05, 7e-05, and 6.9e-05, 7e-05 |
| 06:18 |
pgimeno |
in my system it's pretty consistently 10% (linux) |
| 06:19 |
repetitivestrain |
my system is fedora 42 on an amd zen 4 cpu |
| 06:19 |
pgimeno |
0.000104 vs 0.000121; then 0.000111 vs 0.00012; then 0.000102 vs 0.000132; then 0.000102 vs 0.000119, then 0.000109 vs 0.00012, and so on |
| 06:21 |
pgimeno |
"<repetitivestrain> and the loops in question are short enough that luajit unrolls most of them" - I think you got this wrong, short loops *hurt* performance as they can't be compiled (unless things have changed pretty radically lately) |
| 06:22 |
repetitivestrain |
how do you mean? luajit will unroll short loops completely, and in fact this is how it treats all loops it encounters during the execution of a root trace |
| 06:22 |
|
nekobit joined #luanti |
| 06:23 |
repetitivestrain |
because the only alternative is to abort the trace and wait for the loop's hotcount to trigger and begin recording there |
| 06:23 |
|
nekobit joined #luanti |
| 06:27 |
pgimeno |
this has been retired and is six years old, but many of the principles still apply: https://web.archive.org/web/20190309163035/http://wiki.luajit.org/Numerical-Computing-Performance-Guide |
| 06:27 |
repetitivestrain |
i've read this, and i don't see how it contradicts anything i've said |
| 06:28 |
pgimeno |
- Avoid inner loops with low iteration count (< 10). |
| 06:28 |
pgimeno |
- Use plain 'for i=start,stop,step do ... end' loops. |
| 06:28 |
pgimeno |
- Prefer plain array indexing, e.g. 'a[i+2]'. |
| 06:30 |
repetitivestrain |
the first instance does not apply when the iteration count is static, and the second and the third are meant to be an injunction against pointer arithmetic, not ipairs |
| 06:33 |
repetitivestrain |
https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_record.c#L625 |
| 06:34 |
repetitivestrain |
all of my low-trip-count unrolled loops either hit this branch (as luajit was designed to do) or originate in side traces, where loops must be unrolled anyway |
| 06:43 |
repetitivestrain |
here, for example, is an instance of interpolator_update_y (an ipairs loop with a tripcount of 8) being unrolled in a side trace occasioned by interpolator_update_z returning (which is manually unrolled, as carefully studying the jit compiler's traces revealed it to be necessary): https://paste.debian.net/1401911/ |
| 06:47 |
pgimeno |
you may be right about inner loops with a low static iteration count, however I don't see anything that suggests that "Use plain 'for i=start,stop,step do ... end' loops" applies to pointer arithmetic, and in fact my benchmarks suggest otherwise |
| 06:48 |
repetitivestrain |
well perhaps because, on my system, both types of loops generate almost identical assembly with indistinguishable performance? |
| 06:50 |
repetitivestrain |
the only difference between the IR and assembly generated from ipairs is an additional ABC, because it is perfectly legitimate for the length of an array to be altered if it is accessed within an iterator, which is completely predictable for the cpu and will probably be eliminated if no array is written to within the loop body |
| 06:53 |
pgimeno |
I modified your code like this: http://www.formauri.es/personal/pgimeno/pastes/1401901.lua |
| 06:53 |
pgimeno |
I got this result over 100 runs: http://www.formauri.es/personal/pgimeno/pastes/ipairs-vs-numeric-benchmark.txt |
| 06:56 |
MinetestBot |
[git] sfan5 -> luanti-org/luanti: Restore BlendOperation in shadow rendering 0f943e5 https://github.com/luanti-org/luanti/commit/0f943e5810e408444ce08023de090bdbaab29705 (2025-10-21T06:56:05Z) |
| 06:56 |
pgimeno |
(the only difference between your code and mine is the formatting of the results) |
| 06:56 |
repetitivestrain |
37.148 vs 37.279, 0.3514% difference |
| 06:56 |
repetitivestrain |
36.84 vs 37.153, 0.8425% difference |
| 06:56 |
repetitivestrain |
43.214 vs 37.727, -14.54% difference |
| 06:56 |
repetitivestrain |
39.077 vs 37.707, -3.633% difference |
| 06:57 |
repetitivestrain |
36.698 vs 37.008, 0.8377% difference |
| 06:57 |
repetitivestrain |
36.642 vs 37.033, 1.056% difference |
| 06:57 |
repetitivestrain |
37.082 vs 39.76, 6.735% difference |
| 06:57 |
repetitivestrain |
36.489 vs 36.808, 0.8667% difference |
| 06:57 |
repetitivestrain |
36.659 vs 38.337, 4.377% difference |
| 06:57 |
repetitivestrain |
36.396 vs 37.51, 2.97% difference |
| 06:57 |
repetitivestrain |
on my laptop, which is currently quite idle, there are outliers in both directions |
| 06:57 |
repetitivestrain |
but they come within a hair's breadth of each other |
| 06:58 |
repetitivestrain |
in the instances that are not obviously anomalous |
| 07:00 |
pgimeno |
I'll fetch current head and recompile luajit to retest, this is a version from last year |
| 07:00 |
repetitivestrain |
i don't think anything has changed much in luajit since |
| 07:01 |
repetitivestrain |
neither the ipairs_aux ffunc recorder nor lj_record_idx & company have changed since last year |
| 07:01 |
pgimeno |
then what do you think makes the difference? the CPU? |
| 07:01 |
repetitivestrain |
it's possible, yeah |
| 07:02 |
pgimeno |
model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz |
| 07:02 |
repetitivestrain |
model name: AMD Ryzen 9 7940HX with Radeon Graphics |
| 07:04 |
repetitivestrain |
as i read the assembler and the abc_invar fold rule cannot fold the extra abc operation that is emitted in the ipairs case, but it does in the purely numeric loops that exist in my terrain generator (and obviously there are no bounds checks when they are unrolled completely) |
| 07:08 |
repetitivestrain |
the assembly generated is functionally equivalent, but register allocation is poorer in ipairs's case, as the array value is moved into rbx before the loop body is executed in ipairs's case before being moved into r13, and from r13 into (%rsi) |
| 07:09 |
repetitivestrain |
my cpu probably is just better at register renaming |
| 07:09 |
repetitivestrain |
on sparc (a risc architecture, to which i ported luajit a year ago for professional reasons) there is no difference between the generated assembly at all |
| 07:11 |
repetitivestrain |
rsi holding the address of z's array part + 1 |
| 07:12 |
|
YuGiOhJCJ joined #luanti |
| 07:54 |
|
fluxionary joined #luanti |
| 09:29 |
|
jluc joined #luanti |
| 10:01 |
|
mrkubax10 joined #luanti |
| 10:05 |
|
mrkubax10 joined #luanti |
| 10:44 |
MTDiscord |
<weaselwells> What is this? |
| 11:01 |
FeXoR |
weaselwells: This is a chat, Luanti is a voxel game. |
| 11:02 |
SwissalpS |
*game engine ;) |
| 11:03 |
[MatrxMT] |
<Blockhead256> and this is the IRC channel, which has the most people but the fewest features, and is the only one you can view on the world wide web https://irc.luanti.org |
| 11:27 |
|
erle joined #luanti |
| 11:32 |
MTDiscord |
<bastrabun> What might be the cause that after 16 days of uptime a server eats up so much memory? https://ibb.co/wNkPMXvL |
| 11:34 |
|
whosita joined #luanti |
| 11:34 |
MTDiscord |
<bastrabun> Especially comparing "main" to the way less busy "test" and "build" servers. |
| 11:42 |
MTDiscord |
<bastrabun> collectgarbage("count") says betwen 400 MB and 800 MB are used. |
| 11:43 |
MTDiscord |
<luatic> then there probably is a memory leak (or gross inefficiency) on the C++ side# |
| 11:44 |
MTDiscord |
<luatic> i suppose such a leak should, to a lesser extent, also apply to the test server. |
| 11:45 |
MTDiscord |
<luatic> so you could try to compile luanti with leaksan, mess around a bit on the test server, and then shut down the test server to get the leaksan report. |
| 11:45 |
MTDiscord |
<luatic> you can then send us that report and we can try to fix the leaks. |
| 11:47 |
MTDiscord |
<luatic> if the issue persists, then there might be a trickier to find memory leak that is cleaned up at shutdown, but not while the server is still running. that would probably require heap profiling to resolve. (last i checked these tools were still a bit nasty to use, so i hope this isn't the case.) |
| 11:48 |
whosita |
what if it's not lost pointers, but other kind of leak? I tried running luanti under valgrind: there were a couple of leaks, but those were graphics related... |
| 11:49 |
MTDiscord |
<luatic> whosita: as i said, in that case we would need to do heap profiling. |
| 11:49 |
whosita |
maybe there's some easier hacky way than a heap profiler... |
| 11:52 |
MTDiscord |
<bastrabun> We can throw any kind of profiling or other ways at the server as long as it does not interfere with the main server operation. Read: lag. |
| 11:53 |
erle |
be aware that there are ways the lua side can *also* be leaky. cora once made a patch that created a burn timer for every fire node. running it for like 45 min or so on a machine with 2GB RAM revealed that, yes, fire replicating itself and creating more timers means it is allocating more and more memory. |
| 11:54 |
erle |
coras solution was to extinguish fire probabilistically, as that is constant-memory and essentially stateless |
| 11:55 |
erle |
granted, it's a pathological case |
| 11:56 |
MTDiscord |
<luatic> Bastrabun: I think a heap profiler will be expensive unfortunately |
| 11:56 |
MTDiscord |
<luatic> If you want to play around with it on the test server, you could follow the instructions on https://postbits.de/heap-profiling-with-tcmalloc.html |
| 11:57 |
whosita |
I suspect it's not something we can easily reproduce on test server, since it takes weeks of ~20-30 players actively doing all sorts of things |
| 11:57 |
erle |
luatic do you have any knowledge about the death loop that the garbage collection gets in when it can't free memory? like, yes, you can avoid it by not filling RAM with crap, but it seems to me like turning lack of RAM into lag is kinda weird. |
| 11:57 |
whosita |
I wonder if there's a good way to guess what it might be by just directly examining memory of the running server |
| 11:58 |
MTDiscord |
<luatic> erle: Of course, the problem is that there are many plausible culprits in a large game. So I'd rather collect some data. |
| 11:58 |
erle |
whosita if you find it, tell me! most live debugging i have done with unmodified bunaries was just basically scanmem. |
| 11:58 |
MTDiscord |
<luatic> whosita: I'd hope that the phenomenon scales and can be observed to a lesser extent on the test server. |
| 11:59 |
MTDiscord |
<bastrabun> We could log in a couple of accounts to the testserver and teleport them around, to load mapblocks. |
| 11:59 |
MTDiscord |
<bastrabun> Just accounts standing at spawn does not create the same amount of memory allocation |
| 11:59 |
MTDiscord |
<luatic> erle: As for GC, the alternative is terminating the application entirely? |
| 12:00 |
whosita |
if it's some obscure mod doing something stupid which allocates memory C-side, that won't reproduce it |
| 12:00 |
erle |
luatic or halting it, idk. i am a fan of productive crashes (i.e. crash if you *really* can't get out of it), as you may have noticed. not a fan of crashing when something is recoverable. |
| 12:01 |
MTDiscord |
<luatic> well yes, it is recoverable: it is a classic time-memory tradeoff |
| 12:01 |
MTDiscord |
<luatic> the more memory you have left over, the less frequently you can afford to do GC |
| 12:01 |
MTDiscord |
<luatic> the problem is that the time a mark-and-sweep GC takes is proportional to the live memory |
| 12:02 |
MTDiscord |
<luatic> so if you have very little memory left, you need to GC much more often, and each time you spend a lot of time to free a little memory |
| 12:04 |
MTDiscord |
<luatic> generally, if you want to run GC'd languages performantly, as a rule of thumb, you should give the program at least twice as much memory as it really needs. the more you can give it, the better. |
| 12:13 |
|
heavygale joined #luanti |
| 12:26 |
whosita |
I wonder if most servers crash more often than a week, have less players, or miss some mod we have |
| 12:27 |
[MatrxMT] |
<Blockhead256> a lot of servers can run for a long time, but a lot of them will restart for mod updates and so on |
| 12:45 |
MTDiscord |
<bastrabun> All of that can be measured. The log even knows how many dig or place operations the players do. Logfile for the past 16 days is 4.6 GB, but rollback database is 51GB. We don't store that in memory, right? |
| 12:47 |
|
SFENCE_arch joined #luanti |
| 12:52 |
|
mrkubax10 joined #luanti |
| 13:12 |
|
PoochInquisitor joined #luanti |
| 13:26 |
SwissalpS |
bastrabun considering luanti runs on RasPi and most servers don't have much more RAM, it's very unlikely that log or rollback db is in RAM |
| 13:28 |
MTDiscord |
<luatic> logs certainly aren't stored in memory (a small buffer will be, but that is negligible) |
| 13:28 |
MTDiscord |
<bastrabun> Servers that run on raspi most likely don't have that amount of data |
| 13:29 |
MTDiscord |
<luatic> rollback db, would have to look into it to be sure, but it most probably isn't either. though it's always possible that there is some cache that isn't being evicted properly. |
| 13:31 |
MTDiscord |
<bastrabun> That's what I'm asking. 30 GB resident memory in 16 days can't really come from a chat cache, dig or place cache or similar. One way or another, full mapblocks must be involved. Either network cache, mapblock list or similar |
| 13:31 |
MTDiscord |
<bastrabun> Looking at prometheus, I can see how many mapblocks are in memory curently |
| 13:33 |
MTDiscord |
<bastrabun> Could be "active blocks" too. With ~15 players we currently have 40k loaded blocks and 3k active blocks |
| 13:41 |
|
SFENCE joined #luanti |
| 14:13 |
|
jaca122 joined #luanti |
| 14:22 |
|
jaca122 joined #luanti |
| 14:57 |
[MatrxMT] |
<birdlover32767> just found out you can lock yourself out of a singleplayer world by doing `/setpassword singleplayer abcdef` |
| 14:57 |
[MatrxMT] |
<Blockhead256> heh |
| 14:58 |
[MatrxMT] |
<Blockhead256> I remember that time we fixed banning yourself in singleplayer |
| 14:58 |
[MatrxMT] |
<birdlover32767> at least it was a test world |
| 14:58 |
[MatrxMT] |
<Blockhead256> umm maybe you can hack a Lua authentication handler together that resets the password, in pre-join |
| 14:59 |
[MatrxMT] |
<Blockhead256> or, you could launch the world in multiplayer |
| 14:59 |
[MatrxMT] |
<birdlover32767> i was gonna say that |
| 15:07 |
[MatrxMT] |
<birdlover32767> now while thinking about it... ban messages could be fancier if the disconnect menu was `hypertext[` instead of `label[` |
| 15:29 |
user333_ |
or just try not to ban yourself from your own worlds |
| 15:29 |
user333_ |
it can't be that hard |
| 15:30 |
[MatrxMT] |
<Blockhead256> idk, what if your brother logged in and griefed you, you'd have to ban him |
| 15:31 |
user333_ |
don't let your brother use your PC ig |
| 15:31 |
[MatrxMT] |
<Blockhead256> speaking of things to definitely not do, https://github.com/luanti-org/luanti/issues/16346 |
| 15:32 |
user333_ |
yeah that freezes/crashes luanti when i try it in singleplayer... |
| 15:37 |
user333_ |
another thing that causes a ton of lag is when someone tries to lavacast on a server in generic MTG |
| 15:38 |
[MatrxMT] |
<Blockhead256> it can certainly bring some of the potatoes that people run Luanti on to their knees |
| 15:38 |
|
cx384 joined #luanti |
| 15:38 |
user333_ |
i ran a server on an RPi1, that's 512 MB of RAM and a 700MHz CPU |
| 15:38 |
[MatrxMT] |
<birdlover32767> another one is abusing +100% bouncy nodes to speed up and launch, generating many blocks in the process |
| 15:39 |
[MatrxMT] |
<Blockhead256> yeah I was playing on one of the few australian servers, pretty low end hardware.. then I came across a load of lava cooling interactions, and the server never recovered. Dropped off the list not long after I think... |
| 15:40 |
[MatrxMT] |
<Blockhead256> with the bouncy nodes you can bounce your head and feet quite rapidly... |
| 15:40 |
[MatrxMT] |
<birdlover32767> with enough bounciness you can make your velocity go to infinity |
| 15:40 |
[MatrxMT] |
<Blockhead256> launch yourself so high you bonk your head on unloaded blocks |
| 15:40 |
[MatrxMT] |
<birdlover32767> and this breaks your player |
| 15:41 |
user333_ |
you can obtain a ton of downwards velocity with https://github.com/luanti-org/luanti/issues/16197 , that gives a similar result |
| 15:42 |
[MatrxMT] |
<birdlover32767> yahoo got a segfault |
| 15:43 |
[MatrxMT] |
<birdlover32767> i got it while calling a nil function in on_deactivate |
| 15:43 |
|
Coolyo joined #luanti |
| 15:43 |
user333_ |
you sound happy about it :P |
| 15:44 |
[MatrxMT] |
<birdlover32767> at least that was an easy fix |
| 15:44 |
|
Coolyo joined #luanti |
| 15:46 |
|
cx384 joined #luanti |
| 15:58 |
MTDiscord |
<luatic> huh? lua errors normally shouldn't be segfaults. |
| 15:58 |
[MatrxMT] |
<birdlover32767> it probably had to do with being in an entity function |
| 15:59 |
[MatrxMT] |
<birdlover32767> chatbridge: test |
| 15:59 |
[MatrxMT] |
<Blockhead256> congrats, you pinged.. a bot account |
| 16:03 |
|
SFENCE joined #luanti |
| 16:06 |
|
SFENCE joined #luanti |
| 16:39 |
|
SFENCE joined #luanti |
| 16:50 |
|
___nick___ joined #luanti |
| 17:01 |
|
Talkless joined #luanti |
| 17:19 |
|
mrkubax10 joined #luanti |
| 17:29 |
|
mrkubax10 joined #luanti |
| 18:27 |
|
SFENCE joined #luanti |
| 18:35 |
MTDiscord |
<the4spaceconstants2181> that's just a case of faulty agent distinction |
| 18:36 |
|
SFENCE joined #luanti |
| 18:53 |
|
SFENCE joined #luanti |
| 19:03 |
|
SFENCE joined #luanti |
| 19:13 |
|
mrcheese joined #luanti |
| 19:23 |
|
SFENCE joined #luanti |
| 19:45 |
|
SFENCE joined #luanti |
| 19:59 |
|
SFENCE joined #luanti |
| 19:59 |
|
bwarden joined #luanti |
| 20:10 |
|
SFENCE joined #luanti |
| 20:16 |
|
silverwolf73827 joined #luanti |
| 20:21 |
|
bgstack15 joined #luanti |
| 20:30 |
|
SFENCE joined #luanti |
| 22:32 |
|
panwolfram joined #luanti |
| 22:40 |
|
j4n joined #luanti |
| 23:16 |
|
stg-developer joined #luanti |
| 23:40 |
|
SFENCE joined #luanti |