Mergesort Investigation 7 - node size
I checked the effect of list element size on my implementation of mergesort of a linked list.
My linked list, which I piously believe is a typical linked list, comprises a series of Golang structures:
type Node struct {
Data uint
Next *Node
}
A quick Go program reveals that struct to take up 16 bytes in memory,
apparently 8 bytes for the uint
type .Data
field,
and 8 bytes for the *Node
.Next
field.
Extrapolating from a superstitious belief that virtual memory,
paging, page sizes and page tables, can make a difference,
I ran a series of mergesort benchmarks with different Node
structures:
type Node struct {
Data uint
Next *Node
Padding [16]byte
}
The .Padding
element, at 16 bytes, changes the struct size to 32 bytes.
I recompiled the mergesort program with 16, 112, 240, 368 and 496 byte Padding
array element sizes to get 32, 128, 256, 384 and 512 byte struct Node
sizes.
You do have to use a Go array, not a slice.
The storage backing a slice is not usually contiguous with the slice header,
and so doesn’t directly contribute to the size of the Node
.
I did all the node size benchmarking on a Macbook M2 ARM. I used the iterative mergesort algorithm, the algorithm with the abrupt performance drops at certain list lengths. I used the “idiomatic” in-memory list layout.
Something really goes wrong at nodes 384 bytes and up. The abrupt drops in performance happen at different list lengths. Abrupt increases in performance happen at some list lengths.
Above, a y-axis zoom-in on the first graph. The 512-byte node size causes long benchmark times, which compresses the graph vertically if all points are shown. Looks like the performance drops at 4, 8 and 16 million node list happen when nodes are 16 or 32 bytes. Performance drops happen at 4 and 8 million node list for 128 and 256 byte nodes, but list lengths about 4 million nodes cause garbage results for 256 byte nodes, and list lengths about 8 million nodes do the same for 128 byte nodes.
I’m calling this some kind of paging related issues complicating the iterative
mergesort performance question.
Just like address space layout
(via .Next
pointer) issues,
this is interesting in some larger context, in that real linked lists might have nodes
much larger than 256-bytes,
or have very poor locality of reference when following .Next
pointers.
In the context of figuring out what’s wrong with my iterative mergesort,
I think this is a distraction.