Memory Leak - Production Mud

So we’re having an issue on a production mud with a clear memory leak in the current version of FluffOS. I’ll be honest - I’m not sure where to begin to try to help track it down, but coming here for a bit of guidance in figuring it out.

Reproducing is very, very easy - taking any large directory with this config/setup - loading everything / then destructing it doesn’t seem to free up memory. mud_status(0) and (1) seems to clear up, but the memory usage server side keeps climbing slowly.

After about 15 days or so, we hit about 16G of ram on the process and the kernel kills it off.

Boot time memory usage on the system usually is around the 20-100 meg range. After 2ish days of uptime, here’s where we stand:

system memory info:

     PID USER      PRI  NI  VIRT   RES   SHR S CPU% MEM%   TIME+  Command
    1205 bing       20   0 1742M  997M 10828 S  0.7 12.7 43:03.81 /OBFUSCATED/bin/driver /OBFUSCATED/bin/config

mud_status(0)

Sentences:                         15156   606240
Objects:                            4197  3711152
Prog blocks:                        1833  3027160
Arrays:                            44942  4134560
Classes:                            6086   576848
Mappings:                          14201  3437144
Mappings(nodes):                   41641
Interactives:                          4     9376
Memory used(bytes):     29504
All strings:                       94381  2254559 + 2330680 overhead
call out:                             10      640 (load_factor 0.078740)
                                         --------
Total:                                   20117863

mud_status(1)

current working directory: /OBFUSCATED/lib

add_message statistics
------------------------------
Calls to add_message:  1136568   Packets:  1136568   Average packet size: 96.62 bytes

Hash table of living objects:
-----------------------------
242 living named objects, average search length: 0.54

Apply lookup cache information
-------------------------------
% cache hits:         28.48
total lookup:     1492552388
cache hits:       425044657
cache size (bytes w/o overhead):     8319168

Object name hash table status:
------------------------------
Elements:        3899
Memory(bytes):     31192
Bucket count:    10273
Load factor:     0.379539

Heart beat information:
-----------------------
Number of objects with heart beat: 582.

All strings:
-------------------------        Strings    Bytes
All strings:                      102190  2513647 + 2518096 overhead
Total asked for                   338039  6182490
Space actually required/total string bytes 81%
Searches: 1823321741    Average search length: -1.046

Call out information:
---------------------
Number of allocated call outs:       11,      704 bytes.
Current handle map bucket: 127
Current handle map load_factor: 0.086614
Current object map bucket: 5087
Current object map load_factor: 0.082563
Number of garbage entry in object map: 409

Boot log:

========================================================================
Full Command Line: /OBFUSCATED/bin/driver /OBFUSCATED/bin/config
Boot Time: Sat Feb 13 08:03:59 2021
Version: fluffos v2019.20201121-27-g1aaafdcc (Linux/x86-64)
jemalloc Version: 5.2.1-0-gea6b3e973b477b8061e0076bb257dbd7f3faa756
ICU Version: 67.1
Backtrace support: libdw.
Core Dump: No, Max FD: 65535.
========================================================================
Final Debug Level: 0
Processing config file: /OBFUSCATED/config
maximum local variables: invalid new value, resetting to default.
New Debug log location: "log/debug.log".
Initializing internal stuff ....
Event backend in use: epoll
==== Runtime Config Table ====
time to clean up : 3600 # default: 600
time to reset : 1800 # default: 900
time to swap : 2700 # default: 300
evaluator stack size : 65536
inherit chain size : 30
maximum evaluation cost : 300000000 # default: 30000000
maximum local variables : 64
maximum call depth : 150
maximum array size : 15000
maximum buffer size : 400000
maximum mapping size : 15000 # default: 150000
maximum string length : 200000
maximum bits in a bitfield : 1200 # default: 12000
maximum byte transfer : 200000
maximum read file size : 200000
hash table size : 7001
object table size : 1501
living hash table size : 256
gametick msec : 100 # default: 1000
heartbeat interval msec : 2000 # default: 1000
sane explode string : 0 # default: 1
reversible explode string : 0
sane sorting : 1
warn tab : 0
wombles : 1 # default: 0
call other type check : 0
call other warn : 1 # default: 0
mudlib error handler : 1
no resets : 0
lazy resets : 1 # default: 0
randomized resets : 0 # default: 1
no ansi : 1
strip before process input : 1
this_player in call_out : 1
trace : 0 # default: 1
trace code : 0
interactive catch tell : 0
receive snoop : 0 # default: 1
snoop shadowed : 0
reverse defer : 0
has console : 1
noninteractive stderr write : 1 # default: 0
trap crashes : 1
old type behavior : 0
old range behavior : 0
warn old range behavior : 1
suppress argument warnings : 1
enable_commands call init : 0 # default: 1
sprintf add_justified ignore ANSI colors : 1
call_out(0) nest level : 10000 # default: 1000
trace lpc execution context : 0
trace lpc instructions : 0
enable mxp : 0
enable gmcp : 0
enable zmp : 0
enable mssp : 1
==============================
==== LPC Predefines ====
#define FLUFFOS
#define HAS_DEBUG_LEVEL
#define HAS_ED
#define HAS_PRINTF
#define MAX_FLOAT 179769313486231570814527423731704356798070567525844996598917476803157260780028538760589558632766878171540458953514382464234321326889464182768467546703537516986049910576551282076245490090389328944075868508455133942304583236903222948165808559332123348274797826204144723168738177180919299881250404026184124858368.000000
#define MAX_INT 9223372036854775807
#define MIN_FLOAT 0.000000
#define MIN_INT -9223372036854775808
#define MUDOS
#define MUD_NAME "Dawn"
#define SIZEOFINT 8
#define __ARCH__ "Linux/x86-64"
#define __ARGUMENTS_IN_TRACEBACK__
#define __ARRAY_STATS__
#define __AUTO_SETEUID__
#define __AUTO_TRUST_BACKBONE__
#define __CACHE_STATS__
#define __CALLOUT_HANDLES__
#define __CALL_OTHER_WARN__
#define __CFG_COMPILER_STACK_SIZE__ 600
#define __CFG_EVALUATOR_STACK_SIZE__ 65536
#define __CFG_LIVING_HASH_SIZE__ 256
#define __CFG_MAX_CALL_DEPTH__ 150
#define __CFG_MAX_GLOBAL_VARIABLES__ 65536
#define __CLASS_STATS__
#define __COMMAND_BUF_SIZE__ 2000
#define __COMPILER__ "/usr/bin/c++"
#define __CXXFLAGS__ "Broken"
#define __DEBUG_MACRO__
#define __DEFAULT_DB__ 1
#define __DEFAULT_PRAGMAS__ PRAGMA_WARNINGS + PRAGMA_ERROR_CONTEXT + PRAGMA_OPTIMIZE
#define __DSLIB__
#define __ED_INDENT_SPACES__ 4
#define __ED_TAB_WIDTH__ 8
#define __GET_CHAR_IS_BUFFERED__
#define __HAS_CONSOLE__
#define __HAVE_DIRENT_H__ 1
#define __HAVE_JEMALLOC__ 1
#define __HAVE_SIGNAL_H__ 1
#define __HAVE_SYS_RESOURCE_H__ 1
#define __HAVE_SYS_STAT_H__ 1
#define __HAVE_SYS_TIME_H__ 1
#define __HAVE_TIME_H__ 1
#define __LARGEST_PRINTABLE_STRING__ 65535
#define __LARGE_STRING_SIZE__ 1000
#define __LAZY_RESETS__
#define __LOCALS_IN_TRACEBACK__
#define __MAX_SAVE_SVALUE_DEPTH__ 100
#define __MUDLIB_ERROR_HANDLER__
#define __NONINTERACTIVE_STDERR_WRITE__
#define __NO_LIGHT__
#define __OLD_ED__
#define __PACKAGES_PACKAGES_H__
#define __PACKAGE_ASYNC__
#define __PACKAGE_COMPRESS__
#define __PACKAGE_CONTRIB__
#define __PACKAGE_CORE__
#define __PACKAGE_CRYPTO__
#define __PACKAGE_DB__
#define __PACKAGE_DEVELOP__
#define __PACKAGE_MATH__
#define __PACKAGE_MATRIX__
#define __PACKAGE_MUDLIB_STATS__
#define __PACKAGE_OPS__
#define __PACKAGE_PARSER__
#define __PACKAGE_PCRE__
#define __PACKAGE_SHA1__
#define __PACKAGE_SOCKETS__
#define __PACKAGE_TRIM__
#define __PACKAGE_UIDS__
#define __PARSE_DEBUG__
#define __PORT__ 3000
#define __PROJECT_VERSION__ "fluffos v2019.20201121-27-g1aaafdcc"
#define __REF_RESERVED_WORD__
#define __RESTRICTED_ED__
#define __SANE_SORTING__
#define __SAVE_EXTENSION__ ".o"
#define __SAVE_GZ_EXTENSION__ ".o.gz"
#define __SMALL_STRING_SIZE__ 100
#define __STRING_STATS__
#define __STRUCT_CLASS__
#define __STRUCT_STRUCT__
#define __SUPPRESS_ARGUMENT_WARNINGS__
#define __THIS_PLAYER_IN_CALL_OUT__
#define __TIME_WITH_SYS_TIME__ 1
#define __TRAP_CRASHES__
#define __USE_32BIT_ADDRESSES__
#define __USE_MYSQL__ 1
#define __VERSION__ "fluffos v2019.20201121-27-g1aaafdcc"
#define __WARN_OLD_RANGE_BEHAVIOR__
#define __WOMBLES__
========================

Config File:

name : Dawn
port number : 3000
mudlib directory : /OBFUSCATED/lib
log directory : /log
include directories : /include
save binaries directory : /binaries
master file : /secure/master
simulated efun file : /secure/simul_efun
debug log file : debug.log
global include file : <standard.h>
time to clean up : 3600
time to swap : 2700
time to reset : 1800
maximum bits in a bitfield : 1200
maximum local variables : 30
maximum evaluation cost : 300000000
maximum array size : 15000
maximum buffer size : 400000
maximum mapping size : 15000
inherit chain size : 30
maximum string length : 200000
maximum read file size : 200000
maximum byte transfer : 200000
hash table size : 7001
object table size : 1501
default fail message : What?
default error message :  A terrible breach in the fabric of space has occurred.
gametick msec : 100
heartbeat interval msec : 2000
sane explode string : 0
reversible explode string : 0
sane sorting : 1
warn tab : 0
wombles : 1
call other type check : 0
call other warn : 1
mudlib error handler : 1
no resets : 0
lazy resets : 1
randomized resets : 0
no ansi : 1
strip before process input: 1
this_player in call_out : 1
trace : 0
trace code : 0
interactive catch tell : 0
receive snoop : 0
snoop shadowed : 0
reverse defer : 0
has console : 1
noninteractive stderr write : 1
trap crashes : 1
old type behavior : 0
old range behavior : 0
warn old range behavior : 1
suppress argument warnings : 1
enable_commands call init : 0
sprintf add_justified ignore ANSI colors : 1
call_out(0) nest level : 10000
# maximum users : 40
# evaluator stack size : 1000
# compiler stack size : 200
# maximum call depth : 30
# living hash table size : 100

local_options

#define _LOCAL_OPTIONS_H_

#undef NO_ADD_ACTION
#undef NO_SNOOP
#undef NO_ENVIRONMENT
#undef NO_WIZARDS
#define NO_LIGHT
#define OLD_ED
#undef ED_INDENT_CASE
#define ED_INDENT_SPACES 4
#undef ED_USE_TABS
#define ED_TAB_WIDTH 8
#undef RECEIVE_ED
#define RESTRICTED_ED
#undef SENSIBLE_MODIFIERS

#undef COMPAT_32
#define DEFAULT_PRAGMAS PRAGMA_WARNINGS + PRAGMA_ERROR_CONTEXT
#define SAVE_EXTENSION ".o"
#undef PRIVS
#undef NO_SHADOWS
#undef USE_ICONV
#undef IPV6

#define SAVE_GZ_EXTENSION ".o.gz"

#define AUTO_SETEUID
#define AUTO_TRUST_BACKBONE

#define PARSE_DEBUG /* Only take effect in DEBUG build */

#endif /* _LOCAL_OPTIONS_H_ */```

Thanks! One thing I noticed is that your apply cache hit rate is remarkably low, any idea why that is? Does it stay low all the time?

Not really quite sure. Fairly certain it’s usually low. I’ll keep an eye on it and let you know if there’s an increase as we get a few more people logging in / moving around, it has been a bit quiet the past few days

We’ve done a bit of testing - we’re on ubuntu 20.04, but I did replicate it under 18.04 and 20.10 just to see if maybe it was distro related somehow, but I’m really at a loss as to where to go. Whatever you want/need me to provide, we’ll get you.

Recently moved from MudOS to FluffOS, so there’s some growing pains, and we really appreciate the hasty responses. :smile:

Yeah don’t worry, I am working on a memory accounting module that will get released this week, we will have much better information on where those are

Apply cache is supposed to be hit around 90%+ in stable mud

Hello, thank you very much for your quick responses. Just wanted to quick chime in with a little more information regarding this from testing on our development site.

A typical repeatably example would be me forcibly loading and destructing (looping via exec) one of our core objects (be it npc, player, weapon, room, armour etc) until both the server-side memory usage and game-side memory usage are above 2gigs. After 5 minutes the memory reported by memory_info will drop to its original value (say 10mb) as expected, while the server-side usage will only drop to 1.5gigs.

I was curious how this would look if we isolated this to single variable types, by say running the following:

int *garray_int;
void populate_int( int flag )
{
    if( !flag )
    {
        int *tmp_int;tmp_int = allocate( 10000 , (: $1 :) );
    }
    else garray_int = allocate( 10000 , (: $1 :) );
    destruct( TO );
}

vis something like exec for( int i = 0 ; i<N; i++ ) load_object( __DIR__ "populate_int()" )->populate_int(X);

I did this for float, int, string, maps and mixed (see below for full code) so that memory usage would be ~2gigs for the global case (generally N~=5000) ~0.2gigs for the local case (N~= 100000).

In all the “local” cases system memory would only drop about 30% after memory_info() returned to baseline. (Start at 80mb, go to 280mb, lowest return to 150mb)

In the “global” cases memory was restored to about normal (start at 80mb, go to 2080mb, return to 81mb) except for mappings, where server-side memory would only drop 10% afte r(start at 80mb, goto 2080mb, return to 1800mb)

mixed variables would perform as expected depending on if they were populated by mappings or not.

For all cases memory_info() returned to baseline as expected.

Full test code:

 //Globals
mapping gmap;
int *garray_int;
string *garray_str;
float *garray_float;
mixed *garray_mixed;
object *garray_object;

//flag = global
void populate_map( int flag )
{
    if( !flag )
    {
        mapping tmp_map = ([]);
        for( int i = 0 ; i < 10000 ; i ++ ) tmp_map[i]=i;
    }
    else
    {
        gmap = ([]);
        for( int i = 0 ; i < 10000 ; i ++ ) gmap[i]=i;
    }

    destruct( TO );
}

void populate_int( int flag )
{
    if( !flag )
    {
        int *tmp_int;tmp_int = allocate( 10000 , (: $1 :) );
    }
    else garray_int = allocate( 10000 , (: $1 :) );
    destruct( TO );
}

void populate_string( int flag )
{
    if( !flag )
    {
        string *tmp_string;tmp_string = allocate( 10000 , (: $1+"" :) );
    }
    else garray_str = allocate( 10000 , (: $1+"" :) );
    destruct( TO );
}

void populate_float( int flag )
{
    if( !flag )
    {
        float *tmp_float;tmp_float = allocate( 10000 , (: $1*1.0:) );
    }
    else garray_float = allocate( 10000 , (: $1*1.0 :) );
    destruct( TO );
}

void populate_mixed( int flag )
{
    if( !flag )
    {
        mixed *tmp_mixed=({});
        for( int i = 0 ; i < 10000 ; i++ )
        {
            switch( random(6) )
            {
            case 0:
                tmp_mixed += ({ i });
                break;
            case 1:
                tmp_mixed += ({ i*1.0 });
                break;
            case 2:
                tmp_mixed += ({ i+"" });
                break;
            case 3:
                tmp_mixed += ({ TO });
                break;
            case 4:
                tmp_mixed += ({ i });
                break;
            case 5:
                tmp_mixed += ({ ([ i : i ]) });
                break;
            }
        }
    }
    else
    {
        garray_mixed = ({});
        for( int i = 0 ; i < 10000 ; i++ )
        {
            switch( random(6) )
            {
            case 0:
                garray_mixed += ({ i });
                break;
            case 1:
                garray_mixed += ({ i*1.0 });
                break;
            case 2:
                garray_mixed += ({ i+"" });
                break;
            case 3:
                garray_mixed += ({ TO });
                break;
            case 4:
                garray_mixed += ({ i });
                break;
            case 5:
                garray_mixed += ({ ([ i : i ]) });
                break;
            }
        }
    }
    destruct( TO );
}

Here is data from two examples:
A) using N=15000 and populate_int(1) where memory usage goes from ~80mb->2500mb->~88mb
B) using N=5000 and popualte_map(1) where memory usages from from ~88mb->3100mb->~2435mb

N=15000 and populate_int(1)
Before:
        Sentences:                          2996   119840
        Objects:                             652   556224
        Prog blocks:                         375   781864
        Arrays:                            10045  1293960
        Classes:                            1976   162064
        Mappings:                           2461   824032
        Mappings(nodes):                   11830
        Interactives:                          1     2344
        Memory used(bytes):     4712
        All strings:                       29081   680512 + 763480 overhead
        call out:                              4      256 (load_factor 0.031496)
                                                 --------
        Total:                                    5189288


During:
        Sentences:                          2996   119840
        Objects:                           15967  5162432
        Prog blocks:                       15381 14827328
        Arrays:                            26514 2402190816
        Classes:                            2267   183048
        Mappings:                           3815  1192768
        Mappings(nodes):                   17158
        Interactives:                          1     2344
        Memory used(bytes):     4760
        All strings:                       34907   935544 + 903304 overhead
        call out:                             12      768 (load_factor 0.094488)
                                                 --------
        Total:                                   2425522952


After:
        Sentences:                          2996   119840
        Objects:                             606   513296
        Prog blocks:                         379   789232
        Arrays:                             9861  1202680
        Classes:                            1921   158024
        Mappings:                           2467   829008
        Mappings(nodes):                   11848
        Interactives:                          1     2344
        Memory used(bytes):     4816
        All strings:                       28359   643882 + 746152 overhead
        call out:                              8      512 (load_factor 0.062992)
                                                 --------
        Total:                                    5009786

  
          USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Before:   XXXX       25538  3.9  1.0 2541648 80980 ?       S    11:23   0:21  \_ /XXX/driver /XXX/config
During:   XXXX       25538  4.7 31.2 3016784 2515024 ?     S    11:23   0:23  \_ /XXX/driver /XXX/config
After:    XXXX       25538  3.8  1.0 3016784 88424 ?       S    11:23   0:25  \_ /XXX/driver /XXX/config 
N=5000 populate_map(1)

Before:
        Sentences:                          2996   119840
        Objects:                             606   513296
        Prog blocks:                         379   789232
        Arrays:                             9861  1202680
        Classes:                            1921   158024
        Mappings:                           2467   829008
        Mappings(nodes):                   11848
        Interactives:                          1     2344
        Memory used(bytes):     4816
        All strings:                       28359   643882 + 746152 overhead
        call out:                              8      512 (load_factor 0.062992)
                                                 --------
        Total:                                    5009786

During:
        Sentences:                          2996   119840
        Objects:                            5881  2217344
        Prog blocks:                        5381  5469872
        Arrays:                            11256  1725952
        Classes:                            2194   177728
        Mappings:                           8563 2656725648
        Mappings(nodes):                50016184
        Interactives:                          1     2344
        Memory used(bytes):     4832
        All strings:                       33765   882113 + 875896 overhead
        call out:                              6      384 (load_factor 0.047244)
                                                 --------
        Total:                                   2668201953

After:
        Sentences:                          2996   119840
        Objects:                             619   525760
        Prog blocks:                         379   789232
        Arrays:                             9934  1228016
        Classes:                            1933   158856
        Mappings:                           2517   842552
        Mappings(nodes):                   12045
        Interactives:                          1     2344
        Memory used(bytes):     4832
        All strings:                       28611   656001 + 752200 overhead
        call out:                              9      576 (load_factor 0.070866)
                                                 --------
        Total:                                    5080209    
    
    
           USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
Before:   XXXX       25538  3.8  1.0 3016784 88424 ?       S    11:23   0:25  \_ /XXX/driver /XXX/config 
During:   XXXX       25538  3.8 38.5 3541072 3102424 ?     S    11:23   0:30  \_ /XXX/driver /XXX/config
After:    XXXX       25538  3.2 30.2 3541072 2435260 ?     S    11:23   0:32  \_ /XXX/driver /XXX/config

I apologize for being spammy, but I was curious if the difference in the above examples was just from allocate vs manually assignment via a loop… so i did (15000x):

void populate_int_for( int flag )
{
    if( !flag )
    {
        int *tmp_int = ({});
        for( int i = 0 ; i < 10000 ; i ++ ) 
            tmp_int += ({ i }); 
    }
    else
    {
        garray_int = ({});
        for( int i = 0 ; i < 10000 ; i ++ ) 
            garray_int += ({ i }); 
    }
    destruct( TO );
}

and am happy to report it performed just as the allocate example - that is memory was free’d on both the server and game sides.

want to quickly check

I assume you are using jemalloc, right? (and the latest version?) In any program the memory allocator doesn’t necessarily grantee to return memory when things are freed, so we need to consider if somethings are dangling inside fluffos VM (aka leaking) or just dangling inside allocator freelist. One way I guess is that keep running the program and see if the process will be able to reach a stable state or keep growing until it fails. so that would the my suggestion for the next experiment.

Form the result it seems the maps operations are probably the culprit, my guess there is some missed reference check somewhere, but let me run your example and check the code

Up to date on jemalloc/current verson.

I did a test/recompile with jemalloc off to see if maybe that was the culprit - same thing happened with jemalloc off.

This is what initiated the whole digging into this, so to speak

Feb 13 07:58:56 server kernel: [1110820.898375] Out of memory: Killed process 1289 (driver) total-vm:16635976kB, anon-rss:7476792kB, file-rss:2676kB, shmem-rss:0kB, UID:1000 pgtables:31624kB oom_score_adj:0
Jan 31 11:21:14 server kernel: [1225333.378420] Out of memory: Killed process 1238 (driver) total-vm:16492616kB, anon-rss:7476384kB, file-rss:1912kB, shmem-rss:0kB, UID:1000 pgtables:29980kB oom_score_adj:0

so I have looked into the example you posted earlier.

Basically what happens is that after destruct() call, the driver will put the object into a “dangling” state, (disappeared from objects() , but not yet freed.)

the driver then have to go through a process of “reclaim_objects()” (every 1 minute) , as well as “remove_destructed_objects()” (every 5 minutes).

What reclaim_objects() does is that it scan all the objects and remove any reference to the destructed objects.

and “remove_destructed_objects()” actually free up memories.

I’ve verified, if wait for 5 minutes, the memory is properly cleaned up, so , did your test last longer than 5minutes?

Yes, unfortunately our testing lasted longer than five minutes. The above reported results were all for

  1. Before executing the exec loop
  2. Immediately after the exec loop (via a return mud_status() in the exec)
  3. After (now I know) “remove_destructed_objects()” is called by the driver

I had noticed the driver was freeing memory on the VM side roughly every 5 minutes of uptime. I was in fact running execs at roughly 2.5 minutes, 7.5 minutes, of uptime to split the difference when doing the above tests. We’ve run the int vs mapping test for: (A) multiple versions of ubuntu (B) recompiles with/without jemalloc and © with and without swap files.

To be very clear: in the mapping case above we saw the memory reported in-game (via memory_info or mud_status) go from (corresponding to 1/2/3 above):

  1. 5009786
  2. 2668201953
  3. 5080209

However on the server side (using say ps) we saw the RSS memory usage go from (in Kb)

  1. 88424
  2. 3192424
  3. 2435260

Over 24 hours later the RSS memory usage has only climbed another 250Mb for that instance (there are no players on the development site, so I assume it is just from NPCs wandering around and rooms being cleaned up). The live site seems to be rising at roughly twice that rate.

As Mordred-Dawn posted above the driver’s memory usage will climb until the process is killed on the scale of 15 days of uptime.

Thanks again for your help!

Thanks for reporting back. So I added a bunch of memory accounting changes in the latest origin/v2019 branch. can you clone that branch and re-run your tests?

Do a debug build, the things you will notice include:

In testsuite, I added a “stress_memory” command that use your code to fill_map(1) on “/clone/memory_holder”, and the mud_status(0) before/after the clean-up, shows that after multiple run the memory is properly cleaned up (mapping / mapping notes drop to 0 as before)

  1. I cleaned up mud_status(0) and mud_status(1) output to be more accurate , you should see if it is being accounted. it will also show how many “dangling object” is waiting to be cleaned-up by the driver.

  2. debugmalloc("/DUMP", 0) will output every single resident malloc with their time-of-creation, type and function that creates it to a file called “DUMP”

  3. I am continue investigating of course, and will add different style of tests to see what is keeping things to be properly cleaned up.

Ok, I think I’ve found the issue, the bug corrupts object’s ref count , which then cause them to never be deallocated, please re-test with this and see if things improves

Testsuite won’t load due to our unusual (see: legacy) config, but it appears you’ve made progress. I’ll get to testing in a day or so.

Will report back.

Also: I can’t tell you how appreciative I am at your super duper quick responses. <3

The changes pushed to the v2019 branch appear to have resolved the issue. Going to continue testing for the rest of the day, but I think it’s a safe bet this is fixed.

Letting:

for(int i=0; i<1000; i++) { load_object("/path/test_secnario")->populate_map(1); reclaim_objects(); }

run every 30 seconds for the rest of the day, but after 30 minutes memory is holding steady.

Doing that on the old compile resulted in a constant steady increase. (And would usually bring the driver to 6gig+ within 10-15 minutes)

1 Like

I think you missed the ”destruct()” part.

had destruct() in the populate_map() function

Ah the self destructive object :smiley: classic