Velocity Software, Inc. is recognized as a leader in the performance measurement of z/VM and Linux on z. The Velocity Performance Suite consist of a set of tools that enable installations running z/VM to manage Linux and z/VM performance. In addition, many components of server farms can be measured and analyzed. Performance data can be viewed real-time through the use of either 3270 or a browser. The CLOUD Implementation (zPRO) component is designed for full cloud PaaS implementation as well as to extend the capabilities of the z/VM sysprog (system programmer) to the browser world. This feature moves system management to the point-and-click crowd. Archived data and reports can be kept available of long term review and reporting usine zMAP. The zVPS, formally ESALPS, components consist of: zMON (formally ESAMON - real-time display of performance data), zTCP (formally ESATCP - SNMP data collection), zMAP (formally ESAMAP - historical reporting and archiving), zVWS (formally ESAWEB - z/VM based web server), zTUNE (a subscription service), zVIEW (formally SHOWCASE - web based viewing of performance data), zPRO (new to the quality line of Velocity Software Products). Velocity continues to work with other software vendors to ensure smooth interface with or from other products such as VM:Webgateway, CA-Webgateway, EnterpriseWeb, MXG, MICS. Velocity software remains the leader and inovator in the z/VM performance, Linux performance, Managing cloud computing arenas.
About Us | Products | FAQ | zVIEW Demo | zPRO Demo | Customer Area | Education | Linux Hints & Tips | Presentations | News | Industry and Events | Employment Opportunities
Home | Contact Us | License Info | Newsletter    

Simultaneous MultiThreading - SMT - how do you measure it?

Many installations are asking for information on how to manage performance for SMT enabled LPARs. I will talk about the 4 parts of performance management:

In an SMT enabled LPAR, it is rather important to understand that we talk in threads and IFLs. From a z/VM perspective, we now see threads. Where ever the term for traditional z/VM "CPU" is used, one now has to think "thread".

Capacity Planning

At a high level, On our ESALPARS report, it will show how much IFL time is allocated to the LPAR. When an IFL is assigned to an LPAR, it is very important to know that both threads on that IFL under SMT are also assigned to the LPAR. So if an IFL is assigned, during that time that it is assigned, it is not shared. BUT, both threads on that IFL might not be used concurrently. So that there is also some time that both threads are assigned to the LPAR, but only one is being used. So that is some extra capacity that is available. This time shows up on the ESALPARS report as thread idle time. "thread idle" is the time where one IFL, two threads are assigned to an LPAR, but only one of them is actually doing work

From a high level, this is the LPAR SUMMARY showing the LPARs and their allocations. Starting from the far right, the "entitled" CPU is the amount of the shared engines that this LPAR is guaranteed by the given LPAR weights. The LPAR from which this data comes from is the LXB5 LPAR, which will get 10.9 cores allocated as its guarantee. In this case, this LPAR has been assigned a physical core 828% of the time, meaning 8.28 CORES were assigned on average in this one minute reporting interval.

When a core is assigned to an LPAR in an SMT-2 environment, both threads are part of that assignment. Even though both threads are assigned, that does not mean they are utilized. In the CP monitor is another metric, the "idle thread" metric. If this LPAR is assigned 828%, and then subtract the LPAR overhead (non-smt), that means for real work, 816% "core assignement" which would be 1632% "thread assignment". Of that 1632%, in this case 594% thread idle existed. During thread idle, one thread was being utilized, and one thread was idle - but the core was assigned.

Report: ESALPARS     Logical Partition Summary
Monitor initialized: 07/07/15 at 13:03
---------------------------------------------------------------------------
         <--------Logical Partition------->  <-Assigned               Entitled
                      Virt CPU  <%Assigned>  <---LPAR--> <-Thread->   CPU Cnt
Time     Name     Nbr CPUs Type Total  Ovhd  Weight  Pct Idle   cnt
-------- -------- --- ---- ---- -----  ----  ------ ---- ------ ---  ------
13:05:00 Totals:   00   71 IFL   1055  18.7    1001  100
         LXB5      05   20 IFL  828.6  12.6     475 47.5 594.5    2  10.91
         LXBX      0F    1 IFL    0.5   0.1      50  5.0     0    1   1.15
         LXB2      02   12 IFL   1201   0.1     Ded 21.8     0    1      0
         LXB3      03   20 IFL   2000   0.1     Ded 36.4     0    1      0
         LXB8      08   10 IFL  224.7   5.7     475 47.5     0    1  10.91
         TS02      0E    8 IFL    1.3   0.3       1  0.1     0    1   0.02
 
Totals by Processor type:
<---------CPU-------> <-Shared Processor busy->
Type Count Ded shared  Total  Logical Ovhd Mgmt
---- ----- --- ------ ------ -------- ---- ----
IFL     55  32     23 1073.7   1036.5 18.7 18.4
 

From a capacity planning perspective, there is unused capacity. This happens to be a z13 that had inherrent bottlenecks in the TLB that was corrected on the z14 and z15. To understand this requires hardware understanding and use of the hardware metrics from the PRCMFC data. Many z13 installations had a "less than zero" increase in capacity by enabling SMT on the z13. This is only detectable using the MFC data when evaluating production data.

The objective of SMT is to better utilize the core processor. The "z" processors have a very sophisticated caching hierarchy to increase the amount of time a core can actually execute an instruction. Any instruction execution must have the instruction and all related data in the Level one cache. Any time there is a cache miss, the core sits idle while the data comes from level 2, level 3, level 4, level 4 on a remote book, or from memory. Each of these sources requires an increasing number of cycles to load the data into the level 1 cache. During these cache loads, if the core can process instructions from another thread, then core utilization should go up.

There are two measures then for measuring increased capacity when SMT is enabled.

  1. Instructions executed per second: If SMT is enabled and instructions per second increase from 2.5M to 3.0M instructions per second per core, then capacity has increased.

  2. Cycles per instruction: Traditionally, the mainframes required between 4 and 8 cycles to execute one instruction. Because of cache, pipelining, and other hardware enhancements, the z13 is closer to 1.5 cycles per instruction (without SMT). Traditionally, capacity is more focused on core utilization. If more cycles are required per instruction, then core utilization goes up - not a good thing. If cycles per instruction drop for a given workload, that workload is using less of the core to drive more work. Understanding this measurement invalidates a lot of traditional measurements of capacity.

Without proper measurement capability it is very difficult to know if capacity has increased or not. One installation says that the Linux admin's think their performance is better - the method of analysis was not scientific. From a capacity planning perspective, look at the instructions per second per core and cycles per instruction to know if more work is being processed. If the IFL utililization is low, then enabling SMT would change very little - SMT is useful from a capacity perspective when IFL utilization is high, when more capacity is desired.

Capacity planning becomes more difficult with SMT as there is no longer straight line capacity growth lines, needing multiple metrics and knowing that as CPU utilization grows, there will be more contention for cache and TLB, and thus less work done per cycle allocated.

Performance

It is stated by IBM in many places that when running in SMT mode, workloads WILL run slower. With SMT, there are now two workloads sharing the same processor core, so at some times, there will be cycles where both workloads are ready to run, but one has to wait. Then the question in regards to performance is "core contention" impact on performance.

The IBM Monitor provides metrics at the system level and the user level to assist in understanding how SMT impacts the system. There is also the PRCMFC (mainframe cache statistics) that show the impact of two threads on the hardware cache. zVPS has been enhanced to utilize and expose these new metrics for every processor from z196 to current z15.

For a system level performance reporting, it is important to understand that for most metrics, with SMT enabled, there are two sets of counters. This means that from a z/VM perspective there are twice as many CPUs, all of which have the traditional measurements. But there are still the physical hardare utilization. As in any performance understanding, utilization of hardware has an impact on performance and throughput.

In the above case in LPAR LXB5, there are 20 physical COREs available to the LPAR - and in SMT-2 mode, z/VM will see 40 threads. From an LPAR perspective we see the 20 "CORES" and the assigned percentages and idle thread time per core.

Report: ESALPAR      Logical Partition Analysis                    ftwareate   ZMAP 5.1.1 10/29/20   Pg   1257
--------------------------------------------------------------------------------------------------------------
         CEC  <-Logical Partition-> <----------Logical Processor--- <------(percentages)-------> 
         Phys              Pool     VCPU <%Assigned> VCPU Weight/   Total User    Sys  Idle  Stl  Idle  cp1/cp2
Time     CPUs Name     No  Name     Addr Total  Ovhd TYPE Polar      util ovrhd ovrhd  time  Pct  Time
-------- ---- -------- --- -------- ---- -----  ---- ---- --- ---   ----- ----- ----- ----- ---- ------ --- ---
13:05:00   55 LXB5      05        .    0  41.2   0.8 IFL  475 Hor    52.9   1.4   3.0 144.8 2.32  27.81   0 / 0
                                       1  39.6   0.6 IFL  475 Hor    50.3   1.3   2.1 147.9 1.75  26.65   2 / 3
                                       2  34.1   0.6 IFL  475 Hor    41.1   1.1   2.1 157.3 1.63  25.18   4 / 5
                                       3  34.8   0.5 IFL  475 Hor    41.1   0.9   1.7 157.5 1.39  26.68   6 / 7
                                       4  38.4   0.6 IFL  475 Hor    47.3   1.1   1.9 151.0 1.64  27.57   8 / 9
                                       5  43.5   0.6 IFL  475 Hor    55.0   1.2   2.3 143.3 1.66  30.12  10 /11
                                       6  44.1   0.7 IFL  475 Hor    56.5   1.4   2.2 141.6 1.89  29.47  12 /13
                                       7  40.3   0.7 IFL  475 Hor    50.1   1.4   2.3 148.0 1.95  28.37  14 /15
                                       8  44.5   0.5 IFL  475 Hor    53.4   0.8   1.7 145.2 1.36  33.99  16 /17
                                       9  39.2   0.6 IFL  475 Hor    48.1   1.1   1.8 150.3 1.62  28.38  18 /19
                                      10   6.4   0.2 IFL  475 Hor     6.4   0.2   0.8 192.9 0.75   5.82  20 /21
                                      11   5.8   0.1 IFL  475 Hor     5.8   0.1   0.4 193.8 0.38   5.41  22 /23
                                      12  27.9   0.5 IFL  475 Hor    32.3   0.7   1.7 165.4 2.31  21.76  24 /25
                                      13  30.4   0.6 IFL  475 Hor    36.0   0.9   2.3 161.2 2.78  22.70  26 /27
                                      14  62.6   0.8 IFL  475 Hor    79.0   1.3   3.1 117.6 3.42  43.40  28 /29
                                      15  52.7   0.9 IFL  475 Hor    65.1   1.3   3.4 131.4 3.47  37.30  30 /31
                                      16  49.9   0.8 IFL  475 Hor    65.0   0.9   3.2 131.8 3.24  31.95  32 /33
                                      17  64.6   0.8 IFL  475 Hor    75.4   0.9   3.2 121.3 3.28  50.80  34 /35
                                      18  61.4   0.9 IFL  475 Hor    76.8   1.6   3.6 119.3 3.91  42.56  36 /37
                                      19  67.1   1.0 IFL  475 Hor    81.9   1.2   3.9 113.9 4.11  48.57  38 /39
                                         -----  ----                ----- ----- ----- ----- ---- ------ --- ---
                                    LPAR 828.6  12.6                 1020  20.9  46.8  2936 44.9  594.5   0 / 0
 

And then from the z/VM side, we can look at the system thread by thread

 
Report: ESACPUU      CPU Utilization Report                        Vel
----------------------------------------------------------------------
         <----Load---->           <--------CPU (percentages)-------->
         <-Users-> Tran     CPU   Total  Emul  User   Sys  Idle Steal
Time     Actv In Q /sec CPU Type   util  time ovrhd ovrhd  time  time
-------- ---- ---- ----  -  ----  ----- ----- ----- ----- ----- -----
13:05:00   97  218  3.1  0  IFL    26.4  24.2   0.7   1.5  72.4   1.2
                         1  IFL    25.4  23.7   0.6   1.1  73.5   1.2
                         2  IFL    24.5  22.8   0.7   1.1  74.6   0.9
                         3  IFL    25.8  24.1   0.6   1.0  73.3   0.9
                         4  IFL    20.0  18.3   0.6   1.1  79.2   0.8
                         5  IFL    21.1  19.6   0.5   1.0  78.1   0.8
                         6  IFL    20.8  19.5   0.5   0.9  78.5   0.7
                         7  IFL    20.3  19.0   0.5   0.8  79.0   0.7
                         8  IFL    23.8  22.3   0.5   1.0  75.4   0.8
                         9  IFL    23.5  22.0   0.6   1.0  75.6   0.8
                        10  IFL    26.0  24.0   0.6   1.4  73.2   0.8
                        11  IFL    29.0  27.6   0.6   0.9  70.1   0.8
                        12  IFL    27.3  25.4   0.7   1.1  71.8   0.9
                        13  IFL    29.2  27.4   0.7   1.1  69.8   1.0
                        14  IFL    25.5  23.7   0.7   1.2  73.5   1.0
                        15  IFL    24.5  22.8   0.7   1.1  74.5   1.0
                        16  IFL    22.8  21.4   0.4   1.0  76.5   0.7
                        17  IFL    30.6  29.4   0.4   0.8  68.7   0.7
                        18  IFL    23.3  21.8   0.6   0.9  75.9   0.8
                        19  IFL    24.8  23.5   0.5   0.8  74.4   0.8
                        20  IFL     3.3   2.6   0.1   0.5  96.4   0.4
                        21  IFL     3.1   2.7   0.1   0.3  96.5   0.4
                        22  IFL     2.1   1.9   0.1   0.2  97.7   0.2
                        23  IFL     3.7   3.4   0.1   0.2  96.1   0.2
                        24  IFL    16.0  14.8   0.3   0.8  82.9   1.1
                        25  IFL    16.4  15.1   0.4   0.9  82.5   1.2
                        26  IFL    16.9  15.2   0.5   1.2  81.7   1.4
                        27  IFL    19.1  17.5   0.5   1.1  79.5   1.4
                        28  IFL    36.1  33.7   0.7   1.8  62.2   1.7
                       ....
                        38  IFL    36.7  33.9   0.6   2.2  61.2   2.0
                        39  IFL    45.2  43.0   0.5   1.6  52.7   2.1
                                  ----- ----- ----- ----- ----- -----
System:                            1019 951.3  20.8  46.4  2937  44.9
 

Now with 816% CORE assigned time (824 subtract 12 overhead), z/VM sees 1019% "total thread" busy time. So with the 20 cores, there are two different utilization numbers, one for core busy: 824% out of 20 cores, or thread utilization: 1019% out of 40 threads. Both will be important from a performance analysis perspective.

Processor Cache Reporting

One of the most interesting scenarios seen in understanding the value the mainframe cache data is the following. The ESAMFC from the following comes from an IBM benchmark without SMT. This is a z13 (the speed of the processor, 5GHz gives that away), with 6 processors in the LPAR. This shows the cycles being used by the workload for each processor and the number of instructions being executed by each processor - all at a rate of "per second". At the tail end of the benchmark, the processor utilization drops from 92% to 67% as some of the drivers complete. But please note the instruction rate goes up???

Even though the utilization dropped, the actual instructions executed went up as the remaining drivers stopped fighting for the CPU cache, the cache residency greatly improved. The last metric is the important one - cycles per instruction. If the processor cache is overloaded, then cycles are wasted loading data into the level 1 cache. As contention for the L1 cache drops, so does the cycles used per instruction. As a result, more instructions are executed using much less CPU.

Report: ESAMFC       MainFrame Cache Analysis Rep
-------------------------------------------------
.             <-------Processor------>
.              Speed/<-Rate/Sec->
Time     CPU Totl User  Hertz Cycles Instr Ratio
-------- --- ---- ----  ----- ------ ----- -----
14:05:32   0 92.9 64.6  5000M  4642M 1818M 2.554
           1 92.7 64.5  5000M  4630M 1817M 2.548
           2 93.0 64.7  5000M  4646M 1827M 2.544
           3 93.1 64.9  5000M  4654M 1831M 2.541
           4 92.9 64.8  5000M  4641M 1836M 2.528
           5 92.6 64.6  5000M  4630M 1826M 2.536
             ---- ----  ----- ------ ----- -----
System:       557  388  5000M  25.9G 10.2G 2.542
-------------------------------------------------
14:06:02   0 67.7 50.9  5000M  3389M 2052M 1.652
           1 67.8 51.4  5000M  3389M 2111M 1.605
           2 69.0 52.4  5000M  3450M 2150M 1.605
           3 67.2 50.6  5000M  3359M 2018M 1.664
           4 60.8 44.5  5000M  3042M 1625M 1.872
           5 70.1 53.8  5000M  3506M 2325M 1.508
             ---- ----  ----- ------ ----- -----
System:       403  304  5000M  18.8G 11.4G 1.640

It was this analysis that shows the need to understand the impact of the L1 cache and how traditional measures of capacity and CPU consumption need to be re-evaluated, and how workloads really do have an impact on the physical capacity of the CPU.

A typical production workload looked at with SMT enabled shows the 8 threads with an average respectable cycle per instruction (CPI) ratio of 1.7. This is at about 50% thread utilization. The question for the capacity planner is what happens to the CPI when core utilization goes up? If the CPI goes up significantly, it is possible that work is being executed taking much more time (and cycles), and the system capacity available is much less than appears.

Report: ESAMFC       MainFrame Cache Magnitudes
------------------------------------------------
              <-------Processor------>
               Speed/<-Rate/Sec->
Time     CPU Totl User  Hertz Cycles Instr Ratio
-------- --- ---- ----  ----- ------ ----- -----
09:01:00   0 47.0 45.9  5000M  2290M 1335M 1.716
           1 50.0 48.9  5000M  2439M 1480M 1.648
           2 45.5 44.4  5000M  2219M 1329M 1.669
           3 47.3 46.1  5000M  2313M 1331M 1.738
           4 42.5 41.0  5000M  2078M 1164M 1.785
           5 53.6 52.7  5000M  2623M 1750M 1.499
           6 44.3 43.3  5000M  2163M 1179M 1.834
           7 56.3 55.3  5000M  2758M 1665M 1.657
             ---- ----  ----- ------ ----- -----
System:       386  378  5000M  17.6G 10.5G 1.681

In this case, there are 17B cycles per second being utilized. The L1 cache is broken out in Instruction cache and Data cache. Of the 17B cycles consumed, 2.3B are used for Instruction cache load, and another 4.2B for data cache load. Thus of the 17B cycles per second used, only 11B are used for executing instructions.

 
 
Report: ESAMFC       MainFrame Cache Magnitudes Velocity Software Corpor
------------------------------------------------------------------------
              <-------Processor------> 
               Speed/<-Rate/Sec->       Instruction <---Data-->
Time     CPU Totl User  Hertz Cycles Instr Ratio Writes Cost Writes Cost
-------- --- ---- ----  ----- ------ ----- ----- ------ ---- ------ ----
09:01:00   0 47.0 45.9  5000M  2290M 1335M 1.716   13M  285M 8771K  470M
           1 50.0 48.9  5000M  2439M 1480M 1.648   13M  287M 9592K  564M
           2 45.5 44.4  5000M  2219M 1329M 1.669   13M  285M 8207K  455M
           3 47.3 46.1  5000M  2313M 1331M 1.738   13M  289M 9584K  568M
           4 42.5 41.0  5000M  2078M 1164M 1.785   11M  295M 7381K  447M
           5 53.6 52.7  5000M  2623M 1750M 1.499   14M  283M   11M  566M
           6 44.3 43.3  5000M  2163M 1179M 1.834   12M  309M 9235K  455M
           7 56.3 55.3  5000M  2758M 1665M 1.657   14M  320M   15M  685M
             ---- ----  ----- ------ ----- ----- ------ ---- ------ ----
System:       386  378  5000M  17.6G 10.5G 1.681  102M 2353M   79M 4210M
 
 
But it gets worse. There is also the cost of DAT (Direct address translation). Each reference to an address must have a valid translated address in the TLB (Translation look aside buffer). In this installation's case where of the 17B cycles used, 6B cycles were used for loading the cache, and now we see that another 3.6B cycles are used for address translation. In this case, 19% of the cycles utilized are for address translation. This also goes up as the core becomes more utilized and there are more cache misses.
Report: ESAMFC       MainFrame Cache Magnitudes Velocity Software Corpor
-------------------------------------------------------------
             .<-Translation Lookaside buffer(TLB)->
              .  CPU Cycles
Time     CPU Totl User . Instr  Data Instr  Data  Cost  Lost
-------- --- ---- ---- . ----- ----- ----- ----- ----- -----
09:01:00   0 47.0 45.9 .    87   517 1832K  539K 19.13  438M
           1 50.0 48.9 .   109   506 1471K  525K 17.48  426M
           2 45.5 44.4 .   127   470 1258K  542K 18.66  414M
           3 47.3 46.1 .    81   522 1980K  560K 19.55  452M
           4 42.5 41.0 .   115   524 1363K  496K 20.06  417M
           5 53.6 52.7 .    47   660 2949K  466K 17.01  446M
           6 44.3 43.3 .    82   541 2050K  538K 21.27  460M
           7 56.3 55.3 .    34   728 4796K  538K 20.10  554M
             ---- ---- .  ----- ----- ----- ----- ----- ----
System:       386  378 .    72   557   18M 4205K 19.11 3609M
At this point, anyone having to perform capacity planning must realize that there is a lot of guess work in the future capacity planning models...

User Chargeback / Accounting

The traditional methods of chargeback are for CPU seconds consumed. CPU consumed was based on time the virtual machine was actually dispatched to a CPU, and that number was very repeatable. In the SMT world, that number fails to be repeatable, and will be larger for a given workload. It is larger because even though the virtual machine is dispatched on a thread of a core for a period of time some of that time the core is being utilized by the other thread, increasing the time on thread, but not necessarily changing the cycle requirement for the unit of work.

The IBM monitor facility attempts to alleviate this problem. The traditional metrics are still reported, with two additional metrics, one of which is not "filled in". The new metrics are "MT-Equivalent" or what the server would have used if running alone on the core, and "MT Prorated" that actually attempts to charge for the cycles consumed.

The example we are working with had 1019% of the threads busy, which from the CPU performance data has 46% system overhead. This leaves from the CPU perspective 972% that is chargeable to real users. But wait... We only havce 816% of the cores we should be charging for, and some of that was idle.

Analyzing the user workload, by the traditional CPU time, now really thought of as "thread time", capture ratio is 100%, we know exactly to which virtual machine to charge the 972% thread time. As a charge back model, that validates the data. But realistically, this measure is time a virtual machine was dispatched on a thread. This is very accurate but less useful for chargeback as it is not repeatable, based on workload and L1 cache contention.

The next "MT-Equivalent" metrics are noticeably less and is the time that the thread would have used if SMT was disabled. Much closer to the 830% that should be charged.

In early days (z13 days), a third set of metrics was provided, but always zeros. The "MT Prorated" seems to be a "best guess". In early SMT days, these metrics were not available. To finish this scenario, the MT-Equivalent would be the best metrics to use for chargeback.

Report: ESAUSP5      User SMT CPU Consumption Analysis             Ve
---------------------------------------------------------------------
         <------CPU Percent Consumed   (Total)---->   <-CPU PCT Prima
UserID       
/Class   Total  Virt   Total  Virtual  Total Virtual  Total  Virtual
-------- ----- -----   -----  -------  ----- -------  -----  -------
13:05:00 972.1  951.3  830.8    813.0  972.1   951.3  830.8    813.0
 ***Key User Analysis ***
TCPIP     0.34   0.11   0.29     0.09   0.34    0.11   0.29     0.09
 ***User Class Analysis***
Servers   0.41   0.15   0.35     0.13   0.41    0.15   0.35     0.13
ZVPS      0.70   0.60   0.61     0.53   0.70    0.60   0.61     0.53
TheUsers 971.0  950.6  829.9    812.3  971.0   950.6  829.9    812.3
 ***Top User Analysis***
LINUX195 202.8  202.3  176.6    176.1  202.8   202.3  176.6    176.1
LINUX203 77.36  76.77  64.13    63.63  77.36   76.77  64.13    63.63
LINUX199 67.44  66.75  56.32    55.73  67.44   66.75  56.32    55.73
LINUX204 57.35  56.22  49.20    48.22  57.35   56.22  49.20    48.22
LINUX198 49.73  48.74  43.41    42.55  49.73   48.74  43.41    42.55
LINUX197 40.01  39.35  34.17    33.61  40.01   39.35  34.17    33.61
 

In a current analysis of our demonstration workload on our z15, a quick anlaysis from bottom to top shows the prorated values. From the user chargeback model, there is 25% thread time, 21% "MT-equivalent", or what the workload would have taken if SMT-2 was not enabled, and then the "prorated" estimate of what the workload actually consumed.

 
Report: ESAUSP5      User SMT CPU Consumption Analys
----------------------------------------------------
         <------CPU Percent Consumed   (Total)---->
UserID     
/Class   Total  Virt   Total  Virtual  Total Virtual
-------- ----- -----   -----  -------  ----- -------
12:00:00 25.14  24.13  21.19    20.34  23.60   22.73
 ***Key User Analysis ***
TCPIP     0.04   0.03   0.04     0.02   0.04    0.02
TCPIP2    0.17   0.08   0.14     0.07   0.15    0.07
RACFVM    0.00   0.00   0.00     0.00   0.00    0.00
SFSZVPS4  0.04   0.02   0.03     0.02   0.03    0.02
 ***User Class Analysis***
Servers   0.08   0.06   0.06     0.05   0.07    0.06
Velocity  1.87   1.81   1.63     1.58   1.76    1.70
TEST      0.75   0.62   0.65     0.54   0.65    0.54
Web       0.11   0.10   0.10     0.09   0.09    0.08
REDHAT    0.41   0.40   0.33     0.33   0.35    0.34
SUSE      2.08   2.05   1.70     1.67   1.88    1.86
ORACLE    2.96   2.72   2.50     2.30   2.69    2.48
TheUsrs  16.67  16.26  14.05    13.70  15.92   15.57
 ***Top User Analysis***
MONGO01  10.29  10.27   8.69     8.68  10.00    9.98
SLES12    4.54   4.53   3.83     3.82   4.40    4.39
S11S2ORA  2.32   2.10   1.96     1.77   2.09    1.90
SLES15    1.82   1.80   1.48     1.46   1.65    1.63
 

For your chargeback model, from a repeatability perspective one could justify using the MT-Equivalent metrics. From a real consumption, one could justify the use of the "prorated" CPU time. The numbers are there for both in zVPS.

When looking at real resources consumed, of the 2 IFLs assigned to the LPAR, if 24% core time is assigned, and then from the 48% thread time, subtract the idle time, resulting in 28% thread time being used. And as shown above, should chargeback be based on cycles, or instructions consumed? The metrics are available...

Report: ESALPARS     Logical Partition Summary                     Velo
-----------------------------------------------------------------------
           <--------Logical Partition------->  <-Assigned
                        Virt CPU  <%Assigned>  <---LPAR-->  <-Thread->
Time       Name     Nbr CPUs Type Total  Ovhd  Weight  Pct  Idle   cnt
--------   -------- --- ---- ---- -----  ----  ------ ----  ------ ---
12:00:00   Totals:   00    7  CP   71.8   0.3     250  100
           Totals:   00   12 IFL   33.3   1.0    1175  100
           VSIVM4    04    2 IFL   24.3   0.5     150 12.8  20.14    2
 
 

Conclusions

There are a lot of new metrics which require a need to understand how SMT really does impact user chargeback and capacity planning. Please provide feedback to Barton on any ideas or information you learn in your endeavors.




Don't miss Velocity Software's Performance Seminars