Time Nick Message 03:10 repetitivestrain tx_miner: why is local ipairs = ipairs the first line... because this copies the global ipairs into an upvalue and removes a specialized reference to the global environment from hot paths, which yields a measurable performance improvement in loops that execute for millions if not tens of millions of times per chunk 03:11 [MatrxMT] yes, "localizing" variables was discussed 03:11 repetitivestrain luajit will not localize `ipairs' by itself, for doing so would interfere with lua's semantics, but it performs table reference specialization on the global environment 03:12 repetitivestrain however an HREFK is still six to seven instructions and every one of the functions there is liable to be invoked millions of times per chunk, and localizing all such cfuncs into upvalues yields a very appreciable performance increase 03:43 MTDiscord that was my question 03:43 MTDiscord but its not my code\ 03:43 repetitivestrain i'm answering your question, yes 03:43 repetitivestrain it's for performance 04:43 pgimeno ipairs does not run on every loop iteration though 04:43 pgimeno in my tests, numeric loops are usually faster than ipairs 04:44 pgimeno I've have to retest, though, as the last time I tried was with LuaJIT 2.0 04:47 repetitivestrain pgimeno: the loops initialized with ipairs run on every iteration 04:48 repetitivestrain of the terrain sampling loop 04:48 repetitivestrain also, ipairs and numeric loops generate identical or nearly identical assembly 04:48 repetitivestrain they cannot be faster or slower than each other, provided that the tables provided are indeed arrays 04:50 repetitivestrain https://paste.debian.net/1401901/ 04:50 repetitivestrain e.g., running this with luajit foo.lua yields: 04:51 repetitivestrain running this four separate times with luajit yields: 36.521, 37.824, then 37.187, 36.856, and 36.775, 36.745, and 37.202, 37.001 05:55 pgimeno the loops run on every iteration (of course, that's what loops do), but the ipairs function itself is called just once per loop, not per iteration, so it makes little sense to localize it 05:57 repetitivestrain pgimeno: the loops themselves are _initialized_ on every iteration of the terrain sampling loop 05:57 cheapie I tested with repetitivestrain's benchmark code, localization was only like a 5% speed boost 05:57 repetitivestrain 5% or 1.3 seconds is already 65 ms 05:58 repetitivestrain of 1.3 seconds* 05:58 repetitivestrain and the loops in question are short enough that luajit unrolls most of them, so that there is no loop at all, but an unnecessary HREFK and comparison against ipairs_aux before they are entered 06:00 repetitivestrain https://codeberg.org/mineclonia/mineclonia/src/branch/main/mods/MAPGEN/mcl_levelgen/terrain.lua#L974 06:00 repetitivestrain here, for example 06:03 repetitivestrain on a completely idle system, for instance, generating 60 mapchunks in a standalone testing environment requires 44 seconds with all instances of ipairs localized, and 47 without 06:13 pgimeno I consistently get a 10% slowdown when using ipairs here: http://www.formauri.es/personal/pgimeno/pastes/benchmark-ipairs.lua 06:15 pgimeno and the localization of ipairs does not influence the outcome 06:16 repetitivestrain obviously, since your loop only executes once 06:17 repetitivestrain but your benchmark is far too fast to yield any meaningful results. over five runs, i received: 7.1e-05, 7e-05, then 7.1e-05, and 7.2e-05, and 7.2e-05, 7e-05, and 6.9e-05, 7e-05 06:18 pgimeno in my system it's pretty consistently 10% (linux) 06:19 repetitivestrain my system is fedora 42 on an amd zen 4 cpu 06:19 pgimeno 0.000104 vs 0.000121; then 0.000111 vs 0.00012; then 0.000102 vs 0.000132; then 0.000102 vs 0.000119, then 0.000109 vs 0.00012, and so on 06:21 pgimeno " and the loops in question are short enough that luajit unrolls most of them" - I think you got this wrong, short loops *hurt* performance as they can't be compiled (unless things have changed pretty radically lately) 06:22 repetitivestrain how do you mean? luajit will unroll short loops completely, and in fact this is how it treats all loops it encounters during the execution of a root trace 06:23 repetitivestrain because the only alternative is to abort the trace and wait for the loop's hotcount to trigger and begin recording there 06:27 pgimeno this has been retired and is six years old, but many of the principles still apply: https://web.archive.org/web/20190309163035/http://wiki.luajit.org/Numerical-Computing-Performance-Guide 06:27 repetitivestrain i've read this, and i don't see how it contradicts anything i've said 06:28 pgimeno - Avoid inner loops with low iteration count (< 10). 06:28 pgimeno - Use plain 'for i=start,stop,step do ... end' loops. 06:28 pgimeno - Prefer plain array indexing, e.g. 'a[i+2]'. 06:30 repetitivestrain the first instance does not apply when the iteration count is static, and the second and the third are meant to be an injunction against pointer arithmetic, not ipairs 06:33 repetitivestrain https://github.com/LuaJIT/LuaJIT/blob/v2.1/src/lj_record.c#L625 06:34 repetitivestrain all of my low-trip-count unrolled loops either hit this branch (as luajit was designed to do) or originate in side traces, where loops must be unrolled anyway 06:43 repetitivestrain here, for example, is an instance of interpolator_update_y (an ipairs loop with a tripcount of 8) being unrolled in a side trace occasioned by interpolator_update_z returning (which is manually unrolled, as carefully studying the jit compiler's traces revealed it to be necessary): https://paste.debian.net/1401911/ 06:47 pgimeno you may be right about inner loops with a low static iteration count, however I don't see anything that suggests that "Use plain 'for i=start,stop,step do ... end' loops" applies to pointer arithmetic, and in fact my benchmarks suggest otherwise 06:48 repetitivestrain well perhaps because, on my system, both types of loops generate almost identical assembly with indistinguishable performance? 06:50 repetitivestrain the only difference between the IR and assembly generated from ipairs is an additional ABC, because it is perfectly legitimate for the length of an array to be altered if it is accessed within an iterator, which is completely predictable for the cpu and will probably be eliminated if no array is written to within the loop body 06:53 pgimeno I modified your code like this: http://www.formauri.es/personal/pgimeno/pastes/1401901.lua 06:53 pgimeno I got this result over 100 runs: http://www.formauri.es/personal/pgimeno/pastes/ipairs-vs-numeric-benchmark.txt 06:56 MinetestBot 02[git] 04sfan5 -> 03luanti-org/luanti: Restore BlendOperation in shadow rendering 130f943e5 https://github.com/luanti-org/luanti/commit/0f943e5810e408444ce08023de090bdbaab29705 (152025-10-21T06:56:05Z) 06:56 pgimeno (the only difference between your code and mine is the formatting of the results) 06:56 repetitivestrain 37.148 vs 37.279, 0.3514% difference 06:56 repetitivestrain 36.84 vs 37.153, 0.8425% difference 06:56 repetitivestrain 43.214 vs 37.727, -14.54% difference 06:56 repetitivestrain 39.077 vs 37.707, -3.633% difference 06:57 repetitivestrain 36.698 vs 37.008, 0.8377% difference 06:57 repetitivestrain 36.642 vs 37.033, 1.056% difference 06:57 repetitivestrain 37.082 vs 39.76, 6.735% difference 06:57 repetitivestrain 36.489 vs 36.808, 0.8667% difference 06:57 repetitivestrain 36.659 vs 38.337, 4.377% difference 06:57 repetitivestrain 36.396 vs 37.51, 2.97% difference 06:57 repetitivestrain on my laptop, which is currently quite idle, there are outliers in both directions 06:57 repetitivestrain but they come within a hair's breadth of each other 06:58 repetitivestrain in the instances that are not obviously anomalous 07:00 pgimeno I'll fetch current head and recompile luajit to retest, this is a version from last year 07:00 repetitivestrain i don't think anything has changed much in luajit since 07:01 repetitivestrain neither the ipairs_aux ffunc recorder nor lj_record_idx & company have changed since last year 07:01 pgimeno then what do you think makes the difference? the CPU? 07:01 repetitivestrain it's possible, yeah 07:02 pgimeno model name : Intel(R) Core(TM) i7-4790 CPU @ 3.60GHz 07:02 repetitivestrain model name : AMD Ryzen 9 7940HX with Radeon Graphics 07:04 repetitivestrain as i read the assembler and the abc_invar fold rule cannot fold the extra abc operation that is emitted in the ipairs case, but it does in the purely numeric loops that exist in my terrain generator (and obviously there are no bounds checks when they are unrolled completely) 07:08 repetitivestrain the assembly generated is functionally equivalent, but register allocation is poorer in ipairs's case, as the array value is moved into rbx before the loop body is executed in ipairs's case before being moved into r13, and from r13 into (%rsi) 07:09 repetitivestrain my cpu probably is just better at register renaming 07:09 repetitivestrain on sparc (a risc architecture, to which i ported luajit a year ago for professional reasons) there is no difference between the generated assembly at all 07:11 repetitivestrain rsi holding the address of z's array part + 1 10:44 MTDiscord What is this? 11:01 FeXoR weaselwells: This is a chat, Luanti is a voxel game. 11:02 SwissalpS *game engine ;) 11:03 [MatrxMT] and this is the IRC channel, which has the most people but the fewest features, and is the only one you can view on the world wide web https://irc.luanti.org 11:32 MTDiscord What might be the cause that after 16 days of uptime a server eats up so much memory? https://ibb.co/wNkPMXvL 11:34 MTDiscord Especially comparing "main" to the way less busy "test" and "build" servers. 11:42 MTDiscord collectgarbage("count") says betwen 400 MB and 800 MB are used. 11:43 MTDiscord then there probably is a memory leak (or gross inefficiency) on the C++ side# 11:44 MTDiscord i suppose such a leak should, to a lesser extent, also apply to the test server. 11:45 MTDiscord so you could try to compile luanti with leaksan, mess around a bit on the test server, and then shut down the test server to get the leaksan report. 11:45 MTDiscord you can then send us that report and we can try to fix the leaks. 11:47 MTDiscord if the issue persists, then there might be a trickier to find memory leak that is cleaned up at shutdown, but not while the server is still running. that would probably require heap profiling to resolve. (last i checked these tools were still a bit nasty to use, so i hope this isn't the case.) 11:48 whosita what if it's not lost pointers, but other kind of leak? I tried running luanti under valgrind: there were a couple of leaks, but those were graphics related... 11:49 MTDiscord whosita: as i said, in that case we would need to do heap profiling. 11:49 whosita maybe there's some easier hacky way than a heap profiler... 11:52 MTDiscord We can throw any kind of profiling or other ways at the server as long as it does not interfere with the main server operation. Read: lag. 11:53 erle be aware that there are ways the lua side can *also* be leaky. cora once made a patch that created a burn timer for every fire node. running it for like 45 min or so on a machine with 2GB RAM revealed that, yes, fire replicating itself and creating more timers means it is allocating more and more memory. 11:54 erle coras solution was to extinguish fire probabilistically, as that is constant-memory and essentially stateless 11:55 erle granted, it's a pathological case 11:56 MTDiscord Bastrabun: I think a heap profiler will be expensive unfortunately 11:56 MTDiscord If you want to play around with it on the test server, you could follow the instructions on https://postbits.de/heap-profiling-with-tcmalloc.html 11:57 whosita I suspect it's not something we can easily reproduce on test server, since it takes weeks of ~20-30 players actively doing all sorts of things 11:57 erle luatic do you have any knowledge about the death loop that the garbage collection gets in when it can't free memory? like, yes, you can avoid it by not filling RAM with crap, but it seems to me like turning lack of RAM into lag is kinda weird. 11:57 whosita I wonder if there's a good way to guess what it might be by just directly examining memory of the running server 11:58 MTDiscord erle: Of course, the problem is that there are many plausible culprits in a large game. So I'd rather collect some data. 11:58 erle whosita if you find it, tell me! most live debugging i have done with unmodified bunaries was just basically scanmem. 11:58 MTDiscord whosita: I'd hope that the phenomenon scales and can be observed to a lesser extent on the test server. 11:59 MTDiscord We could log in a couple of accounts to the testserver and teleport them around, to load mapblocks. 11:59 MTDiscord Just accounts standing at spawn does not create the same amount of memory allocation 11:59 MTDiscord erle: As for GC, the alternative is terminating the application entirely? 12:00 whosita if it's some obscure mod doing something stupid which allocates memory C-side, that won't reproduce it 12:00 erle luatic or halting it, idk. i am a fan of productive crashes (i.e. crash if you *really* can't get out of it), as you may have noticed. not a fan of crashing when something is recoverable. 12:01 MTDiscord well yes, it is recoverable: it is a classic time-memory tradeoff 12:01 MTDiscord the more memory you have left over, the less frequently you can afford to do GC 12:01 MTDiscord the problem is that the time a mark-and-sweep GC takes is proportional to the live memory 12:02 MTDiscord so if you have very little memory left, you need to GC much more often, and each time you spend a lot of time to free a little memory 12:04 MTDiscord generally, if you want to run GC'd languages performantly, as a rule of thumb, you should give the program at least twice as much memory as it really needs. the more you can give it, the better. 12:26 whosita I wonder if most servers crash more often than a week, have less players, or miss some mod we have 12:27 [MatrxMT] a lot of servers can run for a long time, but a lot of them will restart for mod updates and so on 12:45 MTDiscord All of that can be measured. The log even knows how many dig or place operations the players do. Logfile for the past 16 days is 4.6 GB, but rollback database is 51GB. We don't store that in memory, right? 13:26 SwissalpS bastrabun considering luanti runs on RasPi and most servers don't have much more RAM, it's very unlikely that log or rollback db is in RAM 13:28 MTDiscord logs certainly aren't stored in memory (a small buffer will be, but that is negligible) 13:28 MTDiscord Servers that run on raspi most likely don't have that amount of data 13:29 MTDiscord rollback db, would have to look into it to be sure, but it most probably isn't either. though it's always possible that there is some cache that isn't being evicted properly. 13:31 MTDiscord That's what I'm asking. 30 GB resident memory in 16 days can't really come from a chat cache, dig or place cache or similar. One way or another, full mapblocks must be involved. Either network cache, mapblock list or similar 13:31 MTDiscord Looking at prometheus, I can see how many mapblocks are in memory curently 13:33 MTDiscord Could be "active blocks" too. With ~15 players we currently have 40k loaded blocks and 3k active blocks 14:57 [MatrxMT] just found out you can lock yourself out of a singleplayer world by doing `/setpassword singleplayer abcdef` 14:57 [MatrxMT] heh 14:58 [MatrxMT] I remember that time we fixed banning yourself in singleplayer 14:58 [MatrxMT] at least it was a test world 14:58 [MatrxMT] umm maybe you can hack a Lua authentication handler together that resets the password, in pre-join 14:59 [MatrxMT] or, you could launch the world in multiplayer 14:59 [MatrxMT] i was gonna say that 15:07 [MatrxMT] now while thinking about it... ban messages could be fancier if the disconnect menu was `hypertext[` instead of `label[` 15:29 user333_ or just try not to ban yourself from your own worlds 15:29 user333_ it can't be that hard 15:30 [MatrxMT] idk, what if your brother logged in and griefed you, you'd have to ban him 15:31 user333_ don't let your brother use your PC ig 15:31 [MatrxMT] speaking of things to definitely not do, https://github.com/luanti-org/luanti/issues/16346 15:32 user333_ yeah that freezes/crashes luanti when i try it in singleplayer... 15:37 user333_ another thing that causes a ton of lag is when someone tries to lavacast on a server in generic MTG 15:38 [MatrxMT] it can certainly bring some of the potatoes that people run Luanti on to their knees 15:38 user333_ i ran a server on an RPi1, that's 512 MB of RAM and a 700MHz CPU 15:38 [MatrxMT] another one is abusing +100% bouncy nodes to speed up and launch, generating many blocks in the process 15:39 [MatrxMT] yeah I was playing on one of the few australian servers, pretty low end hardware.. then I came across a load of lava cooling interactions, and the server never recovered. Dropped off the list not long after I think... 15:40 [MatrxMT] with the bouncy nodes you can bounce your head and feet quite rapidly... 15:40 [MatrxMT] with enough bounciness you can make your velocity go to infinity 15:40 [MatrxMT] launch yourself so high you bonk your head on unloaded blocks 15:40 [MatrxMT] and this breaks your player 15:41 user333_ you can obtain a ton of downwards velocity with https://github.com/luanti-org/luanti/issues/16197 , that gives a similar result 15:42 [MatrxMT] yahoo got a segfault 15:43 [MatrxMT] i got it while calling a nil function in on_deactivate 15:43 user333_ you sound happy about it :P 15:44 [MatrxMT] at least that was an easy fix 15:58 MTDiscord huh? lua errors normally shouldn't be segfaults. 15:58 [MatrxMT] it probably had to do with being in an entity function 15:59 [MatrxMT] chatbridge: test 15:59 [MatrxMT] congrats, you pinged.. a bot account 18:35 MTDiscord that's just a case of faulty agent distinction