Narrow bounds on kernel malloc allocations. #2261

qwattash · 2024-12-05T14:34:07Z

This is required to prevent memory leaks through the extra space in the allocations.
In general, kernel malloc uses a number of fixed-size UMA zones to back small allocations (<= 64K) and a dedicated malloc_large for large allocations that grabs memory directly from kmem.
In both cases, the requested allocation size may be smaller than the actual committed size if the requested size is not representable. Representablity is platform-dependent and we can not make assumptions about the minimum representable size (although I think it will always be >= PAGE_SIZE).

With CHERI this is problematic because the caller will receive a capability that might be larger than the requested size and some of the additional committed space is reachable but uninitialized (unless M_ZERO).
This patch does the following:

Add an interface function to UMA to recover the original item bounds (provisionally named uma_zgrow_bounds, I'm open to better names).
Narrow bounds to the closest representable size in kernel malloc and realloc variants.
Recover original bounds for both small and large allocations on free, so that we maintain the invariant that the lower-level allocator always receives on free() the same capability returned on alloc. In order to do this, we (continue to) abuse the vtoslab infrastructure for malloc_large to stash the original pointer in place of the zone pointer in the vm_page_t descriptor. I really think we should have a cleaner interface for this, but it is beyond the point of this patch.
Explicitly zero the committed allocation space that lies between the requested allocation size and the actual capability top. In other words, we zero the representability padding of the allocation.

The uma_zgrow_bounds function allows to recover the full bounds on an item from nested allocators that rely on UMA.

This prevents possible overflows into the slab header.

- Narrow bounds on kmalloc allocations backed by UMA zones, this prevents extra space in items to be accessible. - When the requested size is not representable, imply M_ZERO for UMA allocations. This ensures that no data is accidentally leaked through extra space in the item. Note that the current malloc KPI does not promise to protect from memory leaks due to improper initialization if M_ZERO is not set, therefore it might be sensible to only zero-initialize the padding space beyond the requested allocation size, which will likely not be initialized by the client code. Patch co-authored by: Alfredo Mazzinghi <[email protected]>

This handles malloc_large allocations in the same way of UMA-backed allocations. When the capability would not be representable, imply M_ZERO to ensure no data leaks through the padding. Patch co-authored by: Alfredo Mazzinghi <[email protected]>

This is only affecting memstat, which only uses the private UMA structure definitions.

Add assertion to check this.

Only zero the padding due to representable length rounding for the requested allocation size. This should generally be cheaper than zeroing the whole allocation.

When growing an existing allocation with an unrepresentable length, we need to initialize any padding space to ensure there are no data leaks. When copying data to a new allocation, we must ensure that no data past the original capability length is copied.

brooksdavis · 2024-12-05T22:45:00Z

sys/kern/kern_malloc.c

@@ -680,7 +680,7 @@ void *
 	int indx;
 	caddr_t va;
 	uma_zone_t zone;
-#if defined(DEBUG_REDZONE) || defined(KASAN)
+#if defined(DEBUG_REDZONE) || defined(KASAN) || defined(__CHERI_PURE_CAPABILITY__)
 	unsigned long osize = size;


I admit a temptation to slap a __diagused on at least the local variables and make them unconditional.

bsdjhb · 2024-12-06T14:57:20Z

Hmm, note that in userspace we have recently switched to not forcing tight bounds but in effect setting the bounds to what malloc_usable_size() would return for non-CHERI, that is whatever the backing allocation is. This gives better performance for realloc() loops to grow an allocation. I think we still have a knob to enforce tight bounds for demo purposes, but for practical purposes the wider bounds are safe (there is no danger of memory reuse). From the description it sounds like you are using exact bounds when e.g. a 17 byte allocation allocates from a 32-byte UMA bucket rather than just leaving the bounds at 32? If that is the case, I'd be inclined to just leave the bounds at 32.

qwattash · 2024-12-10T16:54:22Z

Hmm, note that in userspace we have recently switched to not forcing tight bounds but in effect setting the bounds to what malloc_usable_size() would return for non-CHERI, that is whatever the backing allocation is. This gives better performance for realloc() loops to grow an allocation. I think we still have a knob to enforce tight bounds for demo purposes, but for practical purposes the wider bounds are safe (there is no danger of memory reuse). From the description it sounds like you are using exact bounds when e.g. a 17 byte allocation allocates from a 32-byte UMA bucket rather than just leaving the bounds at 32? If that is the case, I'd be inclined to just leave the bounds at 32.

This was my original reasoning as well, however there is an interesting interaction with the caller responsibility for initialization. This patch is not really about bounds and more about memory leaks through uninit memory.

I will explain better:
In general it is never a spatial safety violation to give out more space than requested but the caller will not know (nor it should be required to know) how much extra memory is committed to the allocation.
In the absence of M_ZERO, this creates a situation in which the caller can be expected to take responsibility for initializing the memory it requests. If there are leaks through padding, for instance, it is not the allocator's fault.

However, in the CHERI world (and arguably also in the non-CHERI world), the caller can access outside of the requested bounds if the allocator committed a larger chunk of memory (e.g. a 64 bytes UMA block for a 48 bytes requested size).
Again, in the absence of M_ZERO, the allocator will not initialize this memory, so it is practically possible to leak data and capabilities from a previous kmalloc allocation that used the whole 64 bytes block. Interestingly this is picked up by KASAN.

This becomes increasingly tricky with unrepresentable sizes, because you can not guarantee that it is possible to narrow bounds exactly in all cases, so you still need zero-initialization to protect against data/capability leaks across kmalloc callers.

This patch use of bounds is really about reducing the need for zero-initialization. I think there are essentially 3 options:

Imply M_ZERO for every allocation where real_size > requested_size
When not M_ZERO, clear the extra committed space between alloc + requested_size and alloc + real_size. Note that this can end up zeroing a significant chunk of memory (e.g. requested_size = 32K + 1 will use a 64K block and clear 32K - 1 bytes.
When not M_ZERO, set bounds of the capability up to the next representable boundary and only clear the region [alloc + requested_size, alloc + representable_size]. For small allocations < PAGE_SIZE we never have to zero anything and for larger one we may have to clear some bytes, but it should generally be less than the whole padding region.

The root cause is using shared UMA zones across different C types, but this is a deeper issue.

Anyway, I agree that having a knob to turn it off is useful, and I will add one.

bsdjhb · 2024-12-10T18:13:35Z

Hmmm, I guess part of what threw me off is that in the original PR comment, bullet point 4 seems to be option 2, so it read like we were doing both 1 and 2. I do think of the available options I prefer something along the lines of option 2) for allocations <= PAGE_SIZE (which avoids the need for the extra complications to re-derive bounds for allocations using the small UMA buckets which I think are most likely to be potential hot paths) and option 3) for allocations > PAGE_SIZE. In the case of option 3, one thing you might consider is seeing if you can arrange for the layer that allocates VM pages to know about the padding area and try to allocate zeroed pages for the padding in the !M_ZERO case to avoid having to zero if the pages are already zeroed.

qwattash · 2024-12-10T18:27:43Z

Hmmm, I guess part of what threw me off is that in the original PR comment, bullet point 4 seems to be option 2, so it read like we were doing both 1 and 2. I do think of the available options I prefer something along the lines of option 2) for allocations <= PAGE_SIZE (which avoids the need for the extra complications to re-derive bounds for allocations using the small UMA buckets which I think are most likely to be potential hot paths) and option 3) for allocations > PAGE_SIZE. In the case of option 3, one thing you might consider is seeing if you can arrange for the layer that allocates VM pages to know about the padding area and try to allocate zeroed pages for the padding in the !M_ZERO case to avoid having to zero if the pages are already zeroed.

I could have phrased it a bit better probably. I think allocating zeroed pages though malloc_large is feasible, although I wonder whether it is just better to imply M_ZERO at that point and let/teach the underlying allocation pick zeroed pages. Plumbing it through UMA zones up to 64K is probably complicated because items will never return to the VM allocator (I need to re-check what UMA actually does, but I think it tries at least to cache small buckets of items larger than PAGE_SIZE). I would need to teach UMA to swap the physical pages from underneath the allocation with something zeroed and would have implications for ctors/dtors.

qwattash and others added 9 commits December 5, 2024 13:44

uma: Add CHERI-specific uma_zgrow_bounds interface function.

fb06d9c

The uma_zgrow_bounds function allows to recover the full bounds on an item from nested allocators that rely on UMA.

uma: Set bounds on slab_data().

3b947dd

This prevents possible overflows into the slab header.

uma: Do not expose uma_slab inline functions to userspace.

c4e4776

This is only affecting memstat, which only uses the private UMA structure definitions.

uma: Mark missing support for pcpu and smr zones in uma_zgrow_bounds.

6d78732

Add assertion to check this.

malloc(9): Zero less space in malloc_large with CHERI.

96f85c4

Only zero the padding due to representable length rounding for the requested allocation size. This should generally be cheaper than zeroing the whole allocation.

malloc(9): Zero less space in UMA-backed allocations with CHERI.

3328083

Only zero the padding due to representable length rounding for the requested allocation size. This should generally be cheaper than zeroing the whole allocation.

qwattash requested review from markjdb, jrtc27, gvnn3, brooksdavis, bsdjhb and RoundofThree December 5, 2024 14:34

brooksdavis reviewed Dec 5, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Narrow bounds on kernel malloc allocations. #2261

Narrow bounds on kernel malloc allocations. #2261

qwattash commented Dec 5, 2024

brooksdavis Dec 5, 2024

bsdjhb commented Dec 6, 2024

qwattash commented Dec 10, 2024 •

edited

Loading

bsdjhb commented Dec 10, 2024

qwattash commented Dec 10, 2024 •

edited

Loading

Narrow bounds on kernel malloc allocations. #2261

Are you sure you want to change the base?

Narrow bounds on kernel malloc allocations. #2261

Conversation

qwattash commented Dec 5, 2024

brooksdavis Dec 5, 2024

Choose a reason for hiding this comment

bsdjhb commented Dec 6, 2024

qwattash commented Dec 10, 2024 • edited Loading

bsdjhb commented Dec 10, 2024

qwattash commented Dec 10, 2024 • edited Loading

qwattash commented Dec 10, 2024 •

edited

Loading

qwattash commented Dec 10, 2024 •

edited

Loading