Of course vMotion is NUMA aware.
At least for how I’d assume the question is most commonly interpreted, that is:
Assuming the source and destination host have the same NUMA topology and each node has enough available memory, does vMotion precopy the memory to the same NUMA client(s) on the destination and is the VM 100 % NUMA local when resumed, esp. for wide VMs?
I’m not even sure what an alternative interpretation could be, maybe whether the stream worlds are local to the NUMA clients during precopy? I guess that depends more on the relation scheduler and I’d think the worlds would be closer to the NIC. But worth checking … at some future, undefined point in time, maybe.
The pearl clutch worthy question in the title was asked by one of our customers, who wanted to verify documentation from a 3rd party that stated it wasn’t. While I was pretty sure that this was wrong (or at least severely outdated), testing it is fairly straightforward so I’m saving my luck for gambling on answers that are harder to verify. Let’s walk through the verification step by step, first establish the state of the source:
[root@esxi_source:~] for numaOption in $(sched-stats -h | > sed -n 's/^[ \t]\+: \{4\}\(n.*\)$/\1/p'); > do echo; > sched-stats -t ${numaOption}; > done 24 PCPUs 12 cores 2 LLCs 2 packages 2 NUMA nodes groupName groupID clientID homeNode affinity nWorlds vmmWorlds localMem remoteMem currLocal cummLocal vm.1466269 2913542 0 0 3 10 10 20905984 0 100 100 vm.1466269 2913542 1 1 3 10 10 20967424 0 100 100 groupName groupID clientID balanceMig loadMig localityMig longTermMig monitorMig loadSwap localitySwap pageMigRate vm.1466269 2913542 0 0 0 0 0 0 0 0 0 vm.1466269 2913542 1 0 0 0 0 0 0 0 0 groupName groupID clientID nodeID time timePct memory memoryPct anonMem anonMemPct avgEpochs memMigHere vm.1466269 2913542 0 0 128131 100 20905984 100 43524 50 9 0 vm.1466269 2913542 0 1 0 0 0 0 42812 49 0 0 vm.1466269 2913542 1 0 0 0 0 0 43524 50 0 0 vm.1466269 2913542 1 1 128131 100 20967424 100 42812 49 9 0 nodeID used idle entitled owed loadAvgPct nVcpu freeMem totalMem 0 122 11878 0 0 0 10 10495200 33456872 1 149 11851 0 0 0 10 11182620 33554432 The format for the stats is corrected/hop/slit NodeId 0 1 0 212/ 0/ 0 297/ 0/ 0 1 297/ 0/ 0 212/ 0/ 0
I like this one-liner because it gives me most of the information about a host and the current state of the VMs from a NUMA perspective. In ESXi 7.0, the sched-stats
option numa-global
(sum of all past NUMA migrations on the host) was dropped in favor of numa-latency
, so instead of a list of options that differs for current and previous builds, I decided to match against what’s available in the sched-stats
help (-h
) output.
[root@esxi_source:~] sched-stats -h | sed -n 's/^[ \t]\+: \{4\}\(n.*\)$/\1/p' ncpus numa-clients numa-migration numa-cnode numa-pnode numa-latency
Some of this has and will be repeated over other articles but let’s dissect the sed
:
-n
basically means “don’t print what isn’t explicitly meant to be printed” but if you care, it is better explained e.g. here.- the beginning of the line (
^
) is a bunch (escaped+
) of whitespaces ([ /t]
)- character class of SPACEs and TABs
- followed by a
:
followed by more whiteSPACEs(\{4\}
) to be exact - followed by anything starting with an n (
n.*
) until the end of the line ($
)- (escaped) parentheses enclosing the substring to be printed later
p
prints the\1
st remembered part of the match
Without covering too much of the actual sched-stats
output, there is currently one VM running on the host (no vCLS VM since the hosts aren’t in a cluster). It is wide, i.e. it has two NUMA clients (PPDs / Physical Proximity Domains, two clientID
s for the same groupID
). Both clients are 100% NUMA local (currLocal
), which has nothing to do with the actual guest OS or application level locality, just that our scheduling abstraction is (a range of vCPUs and memory with regards to each other). The amount of free physical memory (freeMem
) on the two nodes is approximately the same. Don’t forget to scroll to the right for all the output. While some other stuff in there is interesting too, we’ll cover that another time, let’s look at that VM:
[root@esxi_source:~] vmdumper -l | cut -d \/ -f 2-5 | while read path; > do egrep -oi "DICT.*(displayname.*|numa.*|cores.*|vcpu.*|memsize.*|affinity.*)= .*| > numa:.*|numaHost:.*|Log for VMware ESX.*" "/$path/vmware.log"; > echo -e; > done Log for VMware ESX pid=1466269 version=7.0.3 build=build-20036589 option=Release DICT numvcpus = "20" DICT memSize = "40960" DICT displayName = "test-vm" DICT numa.autosize.cookie = "200102" DICT numa.autosize.vcpu.maxPerVirtualNode = "10" DICT cpuid.coresPerSocket = "10" numaHost: NUMA config: consolidation= 1 preferHT= 1 partitionByMemory = 0 numa: Resuming from checkpoint using VPD = 10 numaHost: 20 VCPUs 2 VPDs 2 PPDs numaHost: VCPU 0 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 1 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 2 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 3 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 4 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 5 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 6 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 7 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 8 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 9 VPD 0 PPD 0 NodeMask ffffffffffffffff numaHost: VCPU 10 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 11 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 12 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 13 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 14 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 15 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 16 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 17 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 18 VPD 1 PPD 1 NodeMask ffffffffffffffff numaHost: VCPU 19 VPD 1 PPD 1 NodeMask ffffffffffffffff
This one-liner is sometimes called “vmdumper command”, it’s a personal pet peeve since vmdumper -l
really just lists information for the running (instead e.g. registered) VMs from which we can then deduce the working directory:
[root@esxi_source:~] vmdumper -l wid=1466270 pid=-1 cfgFile="/vmfs/volumes/5cded272-3c5304fc-2308-109836041d9b/test-vm/test-vm.vmx" uuid="56 4d d6 9a 24 57 89 38-b8 cb 3f 21 b7 db 55 82" displayName="test-vm" vmxCartelID=1466269 [root@esxi_source:~] vmdumper -l | cut -d \/ -f 2-5 vmfs/volumes/5cded272-3c5304fc-2308-109836041d9b/test-vm
cut
on the (escaped) delimiter/
and print the2
nd until the5
th field (-f
)
The name of the binary implies the actual major use case, dumping of vmm
/ vmx
memory and associated debugging tasks like sending NMI
s (Non-Maskable Interrupts) to the VM, enabling IP (Instruction Pointer) logging or printing relevant information from VMs whos memory can be dumped, i.e. from those that are running …
[root@exi_source:~] vmdumper -h vmdumper: [options] <unsync|sync|vmx|vmx_force|samples_on|samples_off|nmi|backtrace> -f: ignore vsi version check -h: print friendly help message -l: print information about running VMs -g: log specified text to the vmkernel log
You could replace the part before the while
loop with something else that lists the .vmx
file or working directory of running VMs:
[root@esxi_source:~] esxcli vm process list | grep "Config" | cut -d \/ -f 2-5 vmfs/volumes/5cded272-3c5304fc-2308-109836041d9b/test-vm
The rest is really just an egrep
against all running VM’s vmware.log
(which is why over-zealous disabling or reducing VM logging can cause that instrumental information to not be available, see KB 8182749 whether you are doing that). egrep
is the same as grep -E
(extended RegExp) and for me, mostly a matter of muscle memory, -i
means case insensitive, -o
only prints the match, nothing before or after, instead of the whole line.
We want to match a couple of options in the DICTionary, i.e. the non-default vmx options the VM was started with. That is why a grep
against the vmware.log
is more accurate than against the .vmx
, which might have changed since the power on (although the circumstance to that are limited, another long story). You can graph regular expressions in web apps like Regexper or Debuggex btw.
Anyhow, the key facts in the above output for this one VM are:
- more vCPU than cores in a single physical NUMA node
- refer back to the
sched-stats -t ncpus
output
- refer back to the
- more memory than available in a single physical NUMA node
- not that is matters for autosizing
- no advanced settings besides setting coresPerSocket to the VPD / PPD size
- which is a good thing in 95% of cases, long story
- according to the log, preferHT is true, despite not fulfilling either of the conditions:
numa.vcpu.preferHT = TRUE
(vmx
option)/Numa/PreferHT = 1
(host advanced option)- not visible in
vmware.log
, I checked on the host
- not visible in
- vCPUs > cores per host
- (some other internal advanced settings that aren’t relevant)
I’m pretty sure that is a bug and I’ll update this another day- edit: if you paid attention, unlike me who thought this was on a different, bigger host, you’ll have noticed that the VM indeed has more vCPUs than the host has cores. Why leave this in before even initially publishing? To keep you on your feet, let this be a reminder that you shouldn’t believe everything you read on the internet
What did we want to do again? Ah yes, vMotion the VM to another host. Let’s check the state there too, we just care about some of the options though:
root@esxi_destination:~] for numaOption in ncpus numa-clients numa-pnode; > do echo; > sched-stats -t ${numaOption}; > done 24 PCPUs 12 cores 2 LLCs 2 packages 2 NUMA nodes groupName groupID clientID homeNode affinity nWorlds vmmWorlds localMem remoteMem currLocal cummLocal nodeID used idle entitled owed loadAvgPct nVcpu freeMem totalMem 0 18 11982 0 0 0 0 31990120 33456872 1 6 11993 0 0 0 0 32561460 33554432
Same topology, no VM running and hence more than enough free memory on each node. Before we kick off the vMotion and check the NUMA locality, let’s make sure the VM’s memory isn’t just a bunch of zeros, or even unaccessed from the guest’s point of view, by filling it with random data. There are of course many ways and some are more precise than others, here we don’t need to worry too much so let’s look at the first Google result and use the first solution, mostly because I really like stress-ng
(although the custom program, filling memory from e.g. /dev/urandom
would be more “surgical”, esp. if you create two instances, one for each vNUMA node).
The link does explain the MemAvailable
metric in /proc/meminfo
but if you want to know details, check out the actual commit (via StackExchange). The initial question asked for 90% (of free) but I’d say given that we aren’t planning anything else, we should upgrade that to 95% (of available).
root@test-vm:~# cat /proc/meminfo MemTotal: 41186192 kB MemFree: 40797100 kB MemAvailable: 40619916 kB Buffers: 0 kB Cached: 39608 kB SwapCached: 0 kB (...) root@test-vm:~# echo $((40619916 / 1024)) 39667 root@test-vm:~# free -m total used free shared buff/cache available Mem: 40220 342 39839 0 38 39666 Swap: 0 0 0 root@test-vm:~# awk '/MemAvailable/{printf "%d\n", $2 * 0.95;}' < /proc/meminfo 38585135 root@test-vm:~# echo $((38585135 / 1024)) 37680
We could go higher but I want to avoid constant swapping / IO at all cost. Well, not at all cost apparently, 5% of available memory would be the precise figure here.
Before we kick it off, let’s look at the stress-ng
man page for the test we are planning on running, you’ll see that if you don’t also specify --vm-method
, it will cycle through all available ones i.e. what is workload is running at a given point isn’t exactly predictable. That level of determinism might not be necessary here but it could be some other time, let’s specify --vm-method rand-set
out of an abundance for caution. And just to avoid any confusion, lower case vm here means virtual memory.
root@test-vm:~# stress-ng --vm-bytes \ > $(awk '/MemAvailable/{printf "%d\n", $2 * 0.95;}' < /proc/meminfo)k \ > --vm-keep --vm-method rand-set --vm 1 stress-ng: info: [739] defaulting to a 86400 second (1 day, 0.00 secs) run per stressor stress-ng: info: [739] dispatching hogs: 1 vm
You can kind of tell from the logging that this isn’t just going to fill the memory and stop, it will continuously stress the memory, we specified 1 worker thread so it will be some time until all memory is touched / filled. And would you believe it, I actually prepared something on the ESXi host before I started stress-ng
to showcase this:
[root@esxi_source:~] memstats -r vm-stats -s name:touched -u mb VIRTUAL MACHINE STATS: Sun Aug 14 13:50:08 2022 ----------------------------------------------- Start Group ID : 0 No. of levels : 12 Unit : MB Selected columns : name:touched -------------------------- name touched -------------------------- vm.1466269 410 -------------------------- Total 410 -------------------------- [root@esxi_source:~] for i in $(seq 1000); > do memstats -r vm-stats -s name:touched -u mb | > awk -v date=$(date -Iseconds) '$1 ~ /vm.[0-9]+/ {print date","$2}'; > sleep 1; > done 2022-08-14T14:01:36+0000,410 2022-08-14T14:01:37+0000,410 2022-08-14T14:01:38+0000,410 2022-08-14T14:01:39+0000,410 2022-08-14T14:01:40+0000,820 2022-08-14T14:01:41+0000,1229 2022-08-14T14:01:42+0000,1639 2022-08-14T14:01:43+0000,1639 2022-08-14T14:01:44+0000,2458 2022-08-14T14:01:45+0000,3687 2022-08-14T14:01:46+0000,3687 2022-08-14T14:01:47+0000,4506 2022-08-14T14:01:48+0000,4916 2022-08-14T14:01:49+0000,6144 2022-08-14T14:01:51+0000,6554 2022-08-14T14:01:52+0000,6554 2022-08-14T14:01:53+0000,7373 2022-08-14T14:01:54+0000,7373 2022-08-14T14:01:55+0000,7783 2022-08-14T14:01:56+0000,8192 2022-08-14T14:01:57+0000,8602 2022-08-14T14:01:58+0000,9831 2022-08-14T14:01:59+0000,10650 2022-08-14T14:02:00+0000,12288 (...) 2022-08-14T14:03:15+0000,33588 2022-08-14T14:03:16+0000,33588 (...) 2022-08-14T14:11:52+0000,36864 2022-08-14T14:11:53+0000,36864 2022-08-14T14:11:54+0000,36864 2022-08-14T14:11:55+0000,37274 2022-08-14T14:11:56+0000,37274 2022-08-14T14:11:57+0000,37274 2022-08-14T14:11:58+0000,37274
CTRL-C
-s name:touched
will select the “name” (vm.cartel ID) and the one stat we are interested in-u
will change the default unit from KB- to MB here
- gb is available too since 7.0
- the output of
date
(ISO8601‘ish) is passed as a variable toawk
via-v
- we match (
~ /.*/
) the first column ($1
) for everything staring withvm
followed by … one instance of any character (.
) instead of the “.” character since I forgot to escape it with a dash, followed by a bunch (+
) of numbers ([0-9]
) while true
loops can sometimes be hard to exit, I opted to do “more than enough” iterations / with a 1 secondsleep
seq
will print a sequence from 1 to 1000, at the default increment of 1- we don’t use
$i
in the loop, we just want it to run (a maximum of) 1000 times
- we don’t use
- I do like printing the date and values comma separated, if there is a chance I might want to graph it later, I’ll already have it as
.csv
touched
, the metric formerly known as active
, is a sample (100 random small pages / min) based heuristic to assess how, well, active a VM is based on the % of those pages being touched (read or written) or dirtied (written) over each iteration. Touched by default includes dirtied but the later is available in esxtop
as TCHD_W
and the vSphere Client’s Advanced Performance Charts as Active Write
.
Running it for ~12 minutes shows that pretty much all mapped memory has recently been touched, statistically speaking. This means that the VMs memory should be full of random data, actually continuing to be filled at whatever rate a single worker thread manages.
[root@esxi_source:~] esxtop -u ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %MLMTD %SWPWT 2913542 2913542 test-vm 36 109.16 99.54 0.01 3450.18 3.59 0.15 1869.14 0.08 0.00 0.00 0.00
When I looked at the VM’s performance charts in the UI, I did notice an interesting difference between Active
and Active Write
, since the latter isn’t available in memstats
, I had to get a .csv
export from vSphere Client. Why no screenshot? Meh, hard to get right for light and dark mode? For anyone who likes a good ASCII graph (and who doesn’t?) here the plotted, vSphere Client exported metrics:
root@foo:~# plot -y 9:0 -d 30:73 -b 0:40000000 -s ascii \ > -i <(cut -d , -f 3 /tmp/exported.csv) \ > -i <(cut -d , -f 2 /tmp/exported.csv) 40000000 | .---, .--, 38620690 | .---, | `--, .' `--, .------, .------------, .-------- 37241379 | .--'' `-' `-' `-' `-' `--' 35862069 | .-'' | .--, 34482759 | .' | .----, | | 33103448 | || | | | | | . 31724138 | || `--, .--, .---, .--, | | | | .' 30344828 | || | | | | | | | | | | | | 28965517 | .-'' | | | .' | | | | | | | | 27586207 | | | | | | `--, | `--,| | | | | 26206897 | | | | | | | | || `--, | `--,| 24827586 | | `--, | `--,| | | || | | || 23448276 | | | | `' `--' || | | || 22068966 | .' `--' `' `--' `' 20689655 | | 19310345 | | 17931034 | | 16551724 | | 15172414 | | 13793103 | | 12413793 | | 11034483 | .' 9655172 | | 8275862 | | 6896552 | | 5517241 | | 4137931 | | 2758621 | | 1379310 | | 0 +-----'
Active Write
is the lower, more volatile one and it seems the workload does a good bit of just reading and can’t dirty the whole VM’s memory over a one minute sample … this didn’t change with two worker threads either, so the bottleneck is somewhere else. Hmmm … but no, let’s not make this article ADHD-incarnate, I mean more than it already is.
Whether with one or two worker threads, the continued change of memory might affect the switchover time in our little lab and we already got what we wanted from stress-ng
(which is filling the memory with random values) …
root@test-vm:~# ps | grep stress 1300 root stress-ng --vm-bytes 38586385k --vm-keep --vm-method rand-set --vm 1 1301 root {stress-ng-vm} stress-ng --vm-bytes 38586385k --vm-keep --vm-method rand-set --vm 1 1302 root {stress-ng-vm} stress-ng --vm-bytes 38586385k --vm-keep --vm-method rand-set --vm 1 1324 root grep stress root@test-vm:~# kill -TSTP 1300 1301 1302 [root@esxi_source:~] esxtop -u ID GID NAME NWLD %USED %RUN %SYS %WAIT %VMWAIT %RDY %IDLE %OVRLP %CSTP %MLMTD %SWPWT 2913542 2913542 test-vm 36 0.71 0.67 0.01 3546.22 39.83 0.10 1930.25 0.01 0.00 0.00 0.00
In the past I would have used kill -STOP
to suspend the processes but I checked and learned that there is a “politer” method. And it feels good to be nice, even if it just about suspending a couple of threads gently.
Time to prepare the migration in the vSphere Client but just before we hit “Finish” in the dialogue, let’s kick off some monitoring on the destination:
[root@esxi_destination:~] for i in $(seq 200); > do echo -en "$(date -Iseconds),"; > sched-stats -t numa-pnode | > awk '$8 ~ /[0-9]+/ {print int($8 / 1024)}' | > sed 'N;s/\n/,/'; > sleep 1; > done 2022-08-14T20:56:36+0000,30774,31336 2022-08-14T20:56:37+0000,30774,31336 2022-08-14T20:56:38+0000,30772,31334 2022-08-14T20:56:39+0000,30772,31334 2022-08-14T20:56:40+0000,30773,31334 2022-08-14T20:56:41+0000,30773,31334 2022-08-14T20:56:42+0000,30768,31330 2022-08-14T20:56:43+0000,30768,31330 <--- vMotion start 2022-08-14T20:56:45+0000,10257,10818 <--- brief allocation check? 2022-08-14T20:56:46+0000,10257,10818 2022-08-14T20:56:47+0000,28783,31276 <--- free space on node 0 is reducing 2022-08-14T20:56:48+0000,28783,31276 2022-08-14T20:56:49+0000,26523,31284 2022-08-14T20:56:50+0000,26523,31284 2022-08-14T20:56:51+0000,24250,31268 2022-08-14T20:56:52+0000,24250,31268 2022-08-14T20:56:53+0000,22031,31268 2022-08-14T20:56:54+0000,22031,31268 2022-08-14T20:56:55+0000,19821,31274 2022-08-14T20:56:56+0000,19821,31274 2022-08-14T20:56:57+0000,17608,31278 2022-08-14T20:56:58+0000,17608,31278 2022-08-14T20:56:59+0000,15390,31283 2022-08-14T20:57:00+0000,15390,31283 2022-08-14T20:57:01+0000,13172,31281 2022-08-14T20:57:02+0000,13172,31281 2022-08-14T20:57:03+0000,10952,31281 2022-08-14T20:57:04+0000,10952,31281 2022-08-14T20:57:05+0000,10257,29680 <--- free space on node 1 is reducing 2022-08-14T20:57:06+0000,10257,29680 2022-08-14T20:57:08+0000,10255,27456 2022-08-14T20:57:09+0000,10253,25234 2022-08-14T20:57:10+0000,10253,25234 2022-08-14T20:57:11+0000,10250,23014 2022-08-14T20:57:12+0000,10250,23014 2022-08-14T20:57:13+0000,10249,20793 2022-08-14T20:57:14+0000,10249,20793 2022-08-14T20:57:15+0000,10246,18572 2022-08-14T20:57:16+0000,10246,18572 2022-08-14T20:57:17+0000,10243,16350 2022-08-14T20:57:18+0000,10243,16350 2022-08-14T20:57:19+0000,10241,14131 2022-08-14T20:57:20+0000,10241,14131 2022-08-14T20:57:21+0000,10239,11907 2022-08-14T20:57:22+0000,10239,11907 2022-08-14T20:57:23+0000,10191,10733 <--- vMotion end 2022-08-14T20:57:24+0000,10191,10733 2022-08-14T20:57:25+0000,10194,10737 2022-08-14T20:57:26+0000,10194,10737 2022-08-14T20:57:27+0000,10201,10745 2022-08-14T20:57:28+0000,10201,10745 2022-08-14T20:57:29+0000,10206,10750 2022-08-14T20:57:31+0000,10206,10750 2022-08-14T20:57:32+0000,10210,10753 2022-08-14T20:57:33+0000,10210,10753 2022-08-14T20:57:34+0000,10210,10754
CTRL-C
- the one-liner is similar enough to other ones here, the KB to MB calculation results in a remainder, “cast” to
int
get’s rid of that - also, no printing
date
viaawk
, I fought with formatting two rows but gave up and justecho
ed it first, removed the newline (-n
) and then replaced the later one frommemstats
viased
- I’m fully aware of my utter loss of
streetshell cred and the sneers ofawk
purists (or really just anyone half-competent) that will rightfully be drawn my way, the shame of cutting corners just because “it does what I need” and not looking as smart as I could on the Internet will haunt me for the foreseeable future
- I’m fully aware of my utter loss of
You might ask, why look at the NUMA node’s freeMem
(column $8
in numa-pnode
) instead of the local memory of the VMs NUMA clients? That is what I did first but it turns out, those values aren’t populated until the VM resumes, i.e. it basically looks like this the whole time during the precopy:
[root@esxi_destination:~] sched-stats -t numa-clients groupName groupID clientID homeNode affinity nWorlds vmmWorlds localMem remoteMem currLocal cummLocal vm.1177915 943151 0 0 3 0 0 0 0 0 0 vm.1177915 943151 1 1 3 0 0 0 0 0 0
But the moment the VM resumes on the destination:
[root@esxi_destination:~] sched-stats -t numa-clients groupName groupID clientID homeNode affinity nWorlds vmmWorlds localMem remoteMem currLocal cummLocal vm.1177915 943151 0 0 3 10 10 20905984 0 100 100 vm.1177915 943151 1 1 3 10 10 20967424 0 100 100
freeMem
has no such hang-ups besides the little blip which is probably an allocation check, here the plotted values for both nodes:
root@foo:~# plot -y 6:0 -m -d 39 -b 0:32000 -s ascii \ > -i <(cut -d , -f 3 /tmp/plot.txt) \ > -i <(cut -d , -f 2 /tmp/plot.txt) 32000 | 31158 +-------, .-----------------, 30316 | | | | 29474 | | | `-, 28632 | | |-, | 27789 | | | | `, 26947 | | | | | 26105 | | | `-, | 25263 | | | | `-, 24421 | | | `-, | 23579 | | | | | 22737 | | | | `-, 21895 | | | `-, | 21053 | | | | `-, 20211 | | | `-, | 19368 | | | | | 18526 | | | | `-, 17684 | | | `-, | 16842 | | | | | 16000 | | | | `-, 15158 | | | `-, | 14316 | | | | `-, 13474 | | | `-, | 12632 | | | | | 11789 | | | | `-, 10947 | |-| `-, `----------- 10105 | `-' `---------------------------- 9263 | 8421 | 7579 | 6737 | 5895 | 5053 | 4211 | 3368 | 2526 | 1684 | 842 | 0 |
Now that the vMotion is done, we could check how many of the pages were zero / unaccessed via /vmkModules/migrate/migID/*/worldID/*/stats
but I’m afraid I’d never finish if I do. So let’s end it here.
P.S.
I think the source of the confusion is this 10+ year old KB and of course not re-validating claims in documentation on a semi regular basis.
P.S.
If you are wondering whether it should be precopy, pre-copy, preCopy, PreCopy or other variations, I did too and given that these seem to be used interchangeably, checked how often it is used in strings / log messages and comments. The clear leader is precopy, followed by pre-copy at a maybe 10-1 ratio, the remainder are one-offs.