28 Jul 2007Fixed maximum space sizes
A few days ago I started to look into what it would take to get the Jikes RVM running as a server with a large number of threads. Oh and by large, I mean somewhere in the vicinity of 80,000 threads on a modern desktop machine with 4GB of memory available. Almost every thread would maintain an open network connection (with an 8KB kernel buffer) and intermittently transmit a message.
At the time I did not think it was an unreasonable demand. Erlang can reportedly create a software isolated process in 3-400 bytes and Haskell can create a thread in about 1KB. I have not investigated these claims but lets assume that these claims include runtime structures required to implement scheduling and garbage collection.
I wanted to get each Jikes RVM thread down to about 1KB for runtime structures, 4KB for the call stack, 4KB for the thread local allocator and another 8KB for each network connection. This would mean that I was looking at about 1.3GB “overhead” to create the threads before I added in any application logic. Not ideal but not too bad. I expected that that big challenge I was going to face was implementing a scheduler that had a decent response time and reasonable throughput. Unfortunately I hit a few problems before even getting to that step.
When creating the stacks the Jikes RVM allocates them into the Large Object Space (or LOS). The LOS is used as the runtime attempts to allocate a byte array in the default space. The allocation request first checks whether the allocation exceeds a threshold and if it does attempts to allocate the object in the LOS.
Currently spaces in MMTk are assigned an address range at build time. The address range is either based on a specific size (i.e. 32MB for the Immortal Space) or based on a proportion of the available space (i.e. 3% for the LOS). The available space is from the end of the code in the bootimage (i.e. 0x57000000 + 24<<20) to the maximum mappable address (0xB0000000 on ia32-linux by default). Why it is not from the end of the reference map in the bootimage I am not entirely sure. Based on this the maximum size of the LOS is around 40MB. Nowhere near enough for my purposes.
Luckily it looks like MMTk will undergo some evolution in the near future. RVM-157 describes the proposed changes. It looks like the management of address ranges and the management of memory chunks will be separated. Address ranges for spaces may still be defined at build time but it will be possible for spaces to have overlapping address ranges. So the set of memory chunks allocated to a space can grow or shrink based on actual usage rather than restricted by some artificial limitations.
Some Spaces may still need to have exclusive ownership of an address range for performance reasons. If a Space exclusively owns an address range the cost for checking if an address is within a Space is just comparing an address to a start and end address. Non-exclusive ownership of an address range means that MMTk must maintain a map between memory chunks and the owning Space and the cost of checking if an address is within a Space is much higher. In a generational collector the (frequently called) write barriers must check whether an address is within the nursery space and thus the nursery space should have exclusive ownership of an address range to maintain an acceptable performance level.
It was my first time looking at the MMTk code and in many ways it seems much nicer than the rest of the Jikes RVM codebase and much more consistent. There is still a few uglies there and a bunch of undocumented assumptions but I guess that is going to be the case in any large codebase. There is a bunch of strange terminology in it but it could just be because I am unfamiliar with gc literature. i.e. A Space defines; (1) an address range, (2) a set of allocated memory chunks and (3) a management policy. Personally I would have used the term area as space seems to easy to confuse with other aspects. Ohwell!
I had planned to do most of the work to enable this myself but my first commit that I believed was largely performance neutral and was just a first step in refactoring seemed to raise the hackles of the ANU contingent. Not surprising given they didn’t really know where I was going with it and I am a self confessed ignoramus regarding garbage collection. They seem to have come up with a much more thought out plan than I had and Daniel may even tackle it in a few weeks. If not I will have to start hacking at it at a later stage.
In the meantime I attempted to change the RVM so that stacks are allocated in the Primitive Large Object Space (PLOS) as they are basically a large byte array and the PLOS can grow to 7% of the available space. Unfortunately that caused complete failure on the 64-bit build target, most likely due to alignment issues. I don’t have any hardware available to test on so I reverted to allocating in the LOS. Ugh.
The last week I have been out of action due to illness and next week it looks like other parts of my PhD will need to be tackled. But hopefully I will get back to trying to enable large numbers of threads soon!