Adaptive Replica Selection benchmarks
Table of Contents
Author | Lee Hinman (lee@elastic.co) |
Date | 2017-08-30 14:58:46 |
Introduction
Benchmarks run on the adaptive replica selection branch based on a recent master branch. Since ARS is configurable there are tests with and without it.
This uses six machines running on Google Compute Engine, each machine has 16 CPUs and 60gb of RAM. Elasticsearch is given 31gb of RAM, all other performance-related settings are unchanged.
The machines are set up as follows:
Each test is 40,000 queries per lap, with 5 laps for a total of 200,000 queries. Rally is used with
100 clients simultaneously sending requests as quickly as possible. Rally is configured to send
requests to the client node (esclient
).
The query is based on a user's test case where they were seeing nodes with uneven response times, usually due to a problem with a particular node in the cluster. For more information, check out the track at rally-internal-tracks/3890.
It consists for 4 different benchmarks:
- One replica with no load applied to any of the nodes
- One replica with load applied to one of the nodes
- Four replicas with no load applied
- Four replicas with load applied
For the "load" scenarios, load was introduced on the es3
node with stress -i 8 -c 8 -m 8 -d 8
.
1 replica, no-load version
No ARS, 5 laps with the cluster started fresh before running the benchmarks
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 72.756 | 71.452 | 70.484 | 71.909 | 72.147 | s | 358.748 | |
Total Old Gen GC | 0 | 0 | 0 | 0 | 0 | s | 0 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 75.046 | 86.2669 | 95.5561 | 94.4944 | 94.4099 | ops/s | 75.046 |
Median Throughput | complex-query | 94.9815 | 96.5402 | 96.6245 | 95.0058 | 95.3971 | ops/s | 95.7866 |
Max Throughput | complex-query | 96.0466 | 97.2942 | 99.1457 | 98.5228 | 98.0666 | ops/s | 99.1457 |
50th percentile latency | complex-query | 1000.23 | 993.939 | 1008.95 | 1012.77 | 1000.85 | ms | 1003.29 |
90th percentile latency | complex-query | 1334.58 | 1329.26 | 1343.43 | 1350.75 | 1340.92 | ms | 1339.69 |
99th percentile latency | complex-query | 1644.43 | 1634.48 | 1648.19 | 1658.76 | 1655.41 | ms | 1648.34 |
99.9th percentile latency | complex-query | 1877.37 | 1866.27 | 1883.54 | 1889.51 | 1906.72 | ms | 1890.48 |
99.99th percentile latency | complex-query | 2136.38 | 2131.63 | 2288.19 | 2120.27 | 2117.74 | ms | 2144.03 |
100th percentile latency | complex-query | 2339.11 | 2304.25 | 2513.49 | 2275.12 | 2398.82 | ms | 2513.49 |
50th percentile service time | complex-query | 1000.23 | 993.939 | 1008.95 | 1012.77 | 1000.85 | ms | 1003.29 |
90th percentile service time | complex-query | 1334.58 | 1329.26 | 1343.43 | 1350.75 | 1340.92 | ms | 1339.69 |
99th percentile service time | complex-query | 1644.43 | 1634.48 | 1648.19 | 1658.76 | 1655.41 | ms | 1648.34 |
99.9th percentile service time | complex-query | 1877.37 | 1866.27 | 1883.54 | 1889.51 | 1906.72 | ms | 1890.48 |
99.99th percentile service time | complex-query | 2136.38 | 2131.63 | 2288.19 | 2120.27 | 2117.74 | ms | 2144.03 |
100th percentile service time | complex-query | 2339.11 | 2304.25 | 2513.49 | 2275.12 | 2398.82 | ms | 2513.49 |
error rate | complex-query | 1.04 | 1.06 | 0.86 | 0.99 | 1.03 | % | 0.99 |
ARS, 5 laps with cluster started fresh before running the benchmarks
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 104.813 | 99.567 | 92.415 | 93.434 | 92.012 | s | 482.241 | |
Total Old Gen GC | 0 | 0 | 0.068 | 0 | 0.075 | s | 0.143 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 71.6274 | 96.4191 | 97.3081 | 94.654 | 95.4668 | ops/s | 71.6274 |
Median Throughput | complex-query | 93.9667 | 96.828 | 98.461 | 97.712 | 97.8372 | ops/s | 97.6061 |
Max Throughput | complex-query | 95.904 | 99.7006 | 101.285 | 98.5563 | 98.4476 | ops/s | 101.285 |
50th percentile latency | complex-query | 995.373 | 995.155 | 977.072 | 984.347 | 977.937 | ms | 985.724 |
90th percentile latency | complex-query | 1366.45 | 1356.61 | 1333.67 | 1337.76 | 1337.2 | ms | 1346.43 |
99th percentile latency | complex-query | 1696.72 | 1673.96 | 1660.46 | 1653.38 | 1651.95 | ms | 1670.29 |
99.9th percentile latency | complex-query | 1980.4 | 1954.44 | 1903.94 | 1951.45 | 1938.64 | ms | 1949.28 |
99.99th percentile latency | complex-query | 2256.85 | 2260.59 | 2171.71 | 2216.76 | 2314.91 | ms | 2256.85 |
100th percentile latency | complex-query | 2621.21 | 2399.68 | 2387.3 | 2562.51 | 2568.04 | ms | 2621.21 |
50th percentile service time | complex-query | 995.373 | 995.155 | 977.072 | 984.347 | 977.937 | ms | 985.724 |
90th percentile service time | complex-query | 1366.45 | 1356.61 | 1333.67 | 1337.76 | 1337.2 | ms | 1346.43 |
99th percentile service time | complex-query | 1696.72 | 1673.96 | 1660.46 | 1653.38 | 1651.95 | ms | 1670.29 |
99.9th percentile service time | complex-query | 1980.4 | 1954.44 | 1903.94 | 1951.45 | 1938.64 | ms | 1949.28 |
99.99th percentile service time | complex-query | 2256.85 | 2260.59 | 2171.71 | 2216.76 | 2314.91 | ms | 2256.85 |
100th percentile service time | complex-query | 2621.21 | 2399.68 | 2387.3 | 2562.51 | 2568.04 | ms | 2621.21 |
error rate | complex-query | 1.06 | 0.98 | 1.02 | 0.94 | 1 | % | 1 |
1 replica, no load summary
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median Throughput (ops/s) | 95.7866 | 97.6061 | 1.8995350 |
50th percentile latency (ms) | 1003.29 | 985.724 | -1.7508397 |
90th percentile latency (ms) | 1339.69 | 1346.43 | 0.50310146 |
99th percentile latency (ms) | 1648.34 | 1670.29 | 1.3316427 |
Without any load, the median throughput and latency has roughly a 1% difference (+/-) when hitting only the client node.
Here are the results after applying Adrien's suggestions from the PR to the code and re-testing:
Metric | Operation | Value | Unit |
---|---|---|---|
Total Young Gen GC | 567.738 | s | |
Total Old Gen GC | 0.262 | s | |
Min Throughput | complex-query | 77.2156 | ops/s |
Median Throughput | complex-query | 98.4153 | ops/s |
Max Throughput | complex-query | 103.568 | ops/s |
50th percentile latency | complex-query | 976.994 | ms |
90th percentile latency | complex-query | 1333.18 | ms |
99th percentile latency | complex-query | 1657.75 | ms |
99.9th percentile latency | complex-query | 1969.66 | ms |
99.99th percentile latency | complex-query | 2449.54 | ms |
100th percentile latency | complex-query | 3094.64 | ms |
50th percentile service time | complex-query | 976.994 | ms |
90th percentile service time | complex-query | 1333.18 | ms |
99th percentile service time | complex-query | 1657.75 | ms |
99.9th percentile service time | complex-query | 1969.66 | ms |
99.99th percentile service time | complex-query | 2449.54 | ms |
100th percentile service time | complex-query | 3094.64 | ms |
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median Throughput (ops/s) | 95.7866 | 98.4153 | 2.7443296 |
50th percentile latency (ms) | 1003.29 | 976.994 | -2.6209770 |
90th percentile latency (ms) | 1339.69 | 1333.18 | -0.48593331 |
99th percentile latency (ms) | 1648.34 | 1657.75 | 0.57087737 |
Just as before, a negligible difference when the cluster is unloaded.
1 replica, with load on es3
No ARS, 5 laps with the cluster started fresh before running the benchmarks
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 300.074 | 272.544 | 266.54 | 268.632 | 274.019 | s | 1381.81 | |
Total Old Gen GC | 0 | 0.293 | 0 | 0.348 | 0.368 | s | 1.009 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 31.9327 | 39.3799 | 41.9122 | 41.3685 | 40.0552 | ops/s | 31.9327 |
Median Throughput | complex-query | 38.2174 | 40.2514 | 42.4385 | 41.9149 | 41.3764 | ops/s | 41.1558 |
Max Throughput | complex-query | 39.5881 | 43.5512 | 53.5363 | 48.204 | 58.5938 | ops/s | 58.5938 |
50th percentile latency | complex-query | 410.565 | 394.156 | 413.918 | 399.447 | 437.455 | ms | 411.721 |
90th percentile latency | complex-query | 5391.53 | 5240.86 | 5026.99 | 5101.87 | 5297.32 | ms | 5215.34 |
99th percentile latency | complex-query | 6469.9 | 6138.16 | 5881.31 | 5974.67 | 6268.04 | ms | 6181.48 |
99.9th percentile latency | complex-query | 8497.22 | 6824.09 | 6520.29 | 6553.15 | 6979.88 | ms | 7022.92 |
99.99th percentile latency | complex-query | 13174.6 | 7573.65 | 7001.51 | 7197.16 | 7436.44 | ms | 10347.9 |
100th percentile latency | complex-query | 14173.2 | 8231.89 | 7544.85 | 7688.05 | 8060.05 | ms | 14173.2 |
50th percentile service time | complex-query | 410.565 | 394.156 | 413.918 | 399.447 | 437.455 | ms | 411.721 |
90th percentile service time | complex-query | 5391.53 | 5240.86 | 5026.99 | 5101.87 | 5297.32 | ms | 5215.34 |
99th percentile service time | complex-query | 6469.9 | 6138.16 | 5881.31 | 5974.67 | 6268.04 | ms | 6181.48 |
99.9th percentile service time | complex-query | 8497.22 | 6824.09 | 6520.29 | 6553.15 | 6979.88 | ms | 7022.92 |
99.99th percentile service time | complex-query | 13174.6 | 7573.65 | 7001.51 | 7197.16 | 7436.44 | ms | 10347.9 |
100th percentile service time | complex-query | 14173.2 | 8231.89 | 7544.85 | 7688.05 | 8060.05 | ms | 14173.2 |
error rate | complex-query | 1.03 | 1.03 | 0.93 | 1.01 | 0.92 | % | 0.98 |
With ARS
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 104.178 | 102.469 | 102.617 | 103.145 | 102.971 | s | 515.38 | |
Total Old Gen GC | 0 | 0 | 0 | 0 | 0 | s | 0 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 42.8211 | 85.2494 | 85.5463 | 85.7206 | 84.9813 | ops/s | 42.8211 |
Median Throughput | complex-query | 88.2411 | 88.3537 | 88.8263 | 90.2476 | 89.1758 | ops/s | 88.8048 |
Max Throughput | complex-query | 89.0012 | 89.2967 | 89.8035 | 90.5627 | 89.905 | ops/s | 90.5627 |
50th percentile latency | complex-query | 935.323 | 952.327 | 911.456 | 954.175 | 956.159 | ms | 943.221 |
90th percentile latency | complex-query | 1963.25 | 1992.08 | 2036.04 | 1890.75 | 1933.65 | ms | 1962.73 |
99th percentile latency | complex-query | 2638.29 | 2672.81 | 2774.82 | 2516.11 | 2552.83 | ms | 2648.86 |
99.9th percentile latency | complex-query | 3119 | 3136.93 | 3283.85 | 2913.59 | 3043.9 | ms | 3134.2 |
99.99th percentile latency | complex-query | 3694.82 | 3402.12 | 3777.1 | 3315.67 | 3598.85 | ms | 3518.94 |
100th percentile latency | complex-query | 4223.11 | 3499.83 | 4210.43 | 3423.48 | 4049.51 | ms | 4223.11 |
50th percentile service time | complex-query | 935.323 | 952.327 | 911.456 | 954.175 | 956.159 | ms | 943.221 |
90th percentile service time | complex-query | 1963.25 | 1992.08 | 2036.04 | 1890.75 | 1933.65 | ms | 1962.73 |
99th percentile service time | complex-query | 2638.29 | 2672.81 | 2774.82 | 2516.11 | 2552.83 | ms | 2648.86 |
99.9th percentile service time | complex-query | 3119 | 3136.93 | 3283.85 | 2913.59 | 3043.9 | ms | 3134.2 |
99.99th percentile service time | complex-query | 3694.82 | 3402.12 | 3777.1 | 3315.67 | 3598.85 | ms | 3518.94 |
100th percentile service time | complex-query | 4223.11 | 3499.83 | 4210.43 | 3423.48 | 4049.51 | ms | 4223.11 |
error rate | complex-query | 0.97 | 1.03 | 1.12 | 1.01 | 1.05 | % | 1.04 |
1 replica, with load summary
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median throughput (ops/s) | 41.1558 | 88.8048 | 115.77712 |
50th percentile latency (ms) | 411.721 | 943.221 | 129.09227 |
90th percentile latency (ms) | 5215.34 | 1962.73 | -62.366212 |
99th percentile latency (ms) | 6181.48 | 2648.86 | -57.148450 |
So a trade of 50th percentile latency for a large reduction in 90th/99th percentile latency, while doubling the median throughput from 41 ops/s to 88 ops/s.
You can see the distribution of requests with ARS, requests were routed away from es3
(the
stressed node) to be handled by the non-loaded nodes:
node_name name active queue rejected completed es4 search 0 0 0 241512 es1 search 0 0 0 220730 es2 search 0 0 0 226168 es3 search 0 0 0 113115 esclient search 0 0 0 202875 es5 search 0 0 0 212850
In the non-ARS scenario, the requests are evenly distributed.
And running again after Adrien's feedback and subsequent changes:
Metric | Operation | Value | Unit |
---|---|---|---|
Total Young Gen GC | 489.716 | s | |
Total Old Gen GC | 0 | s | |
Min Throughput | complex-query | 35.7991 | ops/s |
Median Throughput | complex-query | 90.8303 | ops/s |
Max Throughput | complex-query | 96.1928 | ops/s |
50th percentile latency | complex-query | 923.891 | ms |
90th percentile latency | complex-query | 1915.97 | ms |
99th percentile latency | complex-query | 2620.81 | ms |
99.9th percentile latency | complex-query | 3101.01 | ms |
99.99th percentile latency | complex-query | 4120.6 | ms |
100th percentile latency | complex-query | 4989.43 | ms |
50th percentile service time | complex-query | 923.891 | ms |
90th percentile service time | complex-query | 1915.97 | ms |
99th percentile service time | complex-query | 2620.81 | ms |
99.9th percentile service time | complex-query | 3101.01 | ms |
99.99th percentile service time | complex-query | 4120.6 | ms |
100th percentile service time | complex-query | 4989.43 | ms |
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median throughput (ops/s) | 41.1558 | 90.8303 | 120.69866 |
50th percentile latency (ms) | 411.721 | 923.891 | 124.39735 |
90th percentile latency (ms) | 5215.34 | 1915.97 | -63.262798 |
99th percentile latency (ms) | 6181.48 | 2620.81 | -57.602225 |
So it still has around the same performance improvements
4 replicas, no load
With 4 replicas, every node could handle a request, so rather than the routing formula picking between two different copies of the data, it could potentially pick any node in the cluster.
With the user-based test case, one node sometimes ends up with a queue of search requests due to its inability to keep up and hot-spotting for the data. This was addressed with the Little's Law work and benchmarked in my other file at https://writequit.org/org/es/stress-run.html
Note that in these tests, I do not allow the queue to automatically resize, as I didn't want it affecting the performance results and giving false positives or negatives.
No ARS, 5 laps with the cluster started fresh before running the benchmarks
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 133.994 | 126.421 | 120.122 | 117.057 | 109.267 | s | 606.861 | |
Total Old Gen GC | 0 | 0.068 | 0 | 0.052 | 0.066 | s | 0.186 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 61.8983 | 83.9801 | 88.1633 | 83.4439 | 87.2767 | ops/s | 61.8983 |
Median Throughput | complex-query | 83.621 | 86.2051 | 89.7287 | 87.8284 | 87.9597 | ops/s | 87.689 |
Max Throughput | complex-query | 85.1025 | 88.4334 | 94.6429 | 89.4471 | 91.013 | ops/s | 94.6429 |
50th percentile latency | complex-query | 1454.42 | 1403.75 | 1305.12 | 1315.04 | 1308 | ms | 1348.53 |
90th percentile latency | complex-query | 2020.12 | 1972.7 | 1864.37 | 1909.29 | 1851.94 | ms | 1930.5 |
99th percentile latency | complex-query | 2448.26 | 2404.8 | 2294.22 | 2344.43 | 2257.22 | ms | 2363.1 |
99.9th percentile latency | complex-query | 2812.64 | 2734.92 | 2703.48 | 2664.67 | 2558.15 | ms | 2716.1 |
99.99th percentile latency | complex-query | 3080.82 | 2987.73 | 3039 | 3073.03 | 2841.12 | ms | 3055.47 |
100th percentile latency | complex-query | 3196.18 | 3398.03 | 3244.51 | 3260.1 | 3212.67 | ms | 3398.03 |
50th percentile service time | complex-query | 1454.42 | 1403.75 | 1305.12 | 1315.04 | 1308 | ms | 1348.53 |
90th percentile service time | complex-query | 2020.12 | 1972.7 | 1864.37 | 1909.29 | 1851.94 | ms | 1930.5 |
99th percentile service time | complex-query | 2448.26 | 2404.8 | 2294.22 | 2344.43 | 2257.22 | ms | 2363.1 |
99.9th percentile service time | complex-query | 2812.64 | 2734.92 | 2703.48 | 2664.67 | 2558.15 | ms | 2716.1 |
99.99th percentile service time | complex-query | 3080.82 | 2987.73 | 3039 | 3073.03 | 2841.12 | ms | 3055.47 |
100th percentile service time | complex-query | 3196.18 | 3398.03 | 3244.51 | 3260.1 | 3212.67 | ms | 3398.03 |
error rate | complex-query | 1.1 | 0.95 | 0.96 | 0.94 | 0.99 | % | 0.99 |
ARS, 5 laps with cluster started fresh before running the benchmarks
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 105.76 | 100.472 | 96.16 | 92.079 | 90.213 | s | 484.684 | |
Total Old Gen GC | 0 | 0 | 0.056 | 0 | 0.044 | s | 0.1 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 72.5623 | 95.187 | 91.4413 | 97.7224 | 97.4145 | ops/s | 72.5623 |
Median Throughput | complex-query | 96.3588 | 97.8425 | 96.7584 | 99.3148 | 99.7013 | ops/s | 97.8271 |
Max Throughput | complex-query | 97.4177 | 98.136 | 97.3876 | 107.248 | 100.778 | ops/s | 107.248 |
50th percentile latency | complex-query | 982.194 | 985.57 | 991.243 | 976.461 | 972.69 | ms | 981.778 |
90th percentile latency | complex-query | 1388.58 | 1383.4 | 1392.7 | 1366.4 | 1358.64 | ms | 1377.62 |
99th percentile latency | complex-query | 1768.21 | 1770.2 | 1768.3 | 1733.5 | 1705.18 | ms | 1751.05 |
99.9th percentile latency | complex-query | 2111.05 | 2114.6 | 2121.01 | 2059.34 | 2024.64 | ms | 2090.17 |
99.99th percentile latency | complex-query | 2316.13 | 2468.77 | 2450.8 | 2308.73 | 2278.21 | ms | 2394.18 |
100th percentile latency | complex-query | 2500.14 | 2726.6 | 2695.71 | 2608 | 2584.61 | ms | 2726.6 |
50th percentile service time | complex-query | 982.194 | 985.57 | 991.243 | 976.461 | 972.69 | ms | 981.778 |
90th percentile service time | complex-query | 1388.58 | 1383.4 | 1392.7 | 1366.4 | 1358.64 | ms | 1377.62 |
99th percentile service time | complex-query | 1768.21 | 1770.2 | 1768.3 | 1733.5 | 1705.18 | ms | 1751.05 |
99.9th percentile service time | complex-query | 2111.05 | 2114.6 | 2121.01 | 2059.34 | 2024.64 | ms | 2090.17 |
99.99th percentile service time | complex-query | 2316.13 | 2468.77 | 2450.8 | 2308.73 | 2278.21 | ms | 2394.18 |
100th percentile service time | complex-query | 2500.14 | 2726.6 | 2695.71 | 2608 | 2584.61 | ms | 2726.6 |
error rate | complex-query | 0.97 | 0.92 | 0.96 | 0.95 | 1 | % | 0.96 |
4 replicas, no load summary
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median throughput (ops/s) | 87.689 | 97.8271 | 11.561427 |
50th percentile latency (ms) | 1348.53 | 981.778 | -27.196429 |
90th percentile latency (ms) | 1930.5 | 1377.62 | -28.639213 |
99th percentile latency (ms) | 2363.1 | 1751.05 | -25.900300 |
ARS improves on both throughput and latency, for all of the percentiles.
Additionally, the requests are routed roughly evenly, even though round robin selection is not used. Instead the formula can select the "least loaded" node to send the request to.
4 replicas, with load on es3
No ARS, 5 laps with the cluster started fresh before running the benchmarks
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 153.063 | 150.372 | 148.406 | 141.904 | 142.388 | s | 736.133 | |
Total Old Gen GC | 0 | 0 | 0 | 0 | 0.235 | s | 0.235 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 32.2423 | 51.8938 | 49.0778 | 51.7943 | 52.0408 | ops/s | 32.2423 |
Median Throughput | complex-query | 50.9153 | 52.6325 | 50.6031 | 52.6305 | 52.4415 | ops/s | 52.1863 |
Max Throughput | complex-query | 52.1392 | 57.8188 | 55.4156 | 62.4498 | 60.4905 | ops/s | 62.4498 |
50th percentile latency | complex-query | 2552.22 | 2591.87 | 2612.83 | 2601.61 | 2578.97 | ms | 2587.46 |
90th percentile latency | complex-query | 3449.25 | 3486.15 | 3530.09 | 3503.63 | 3473.51 | ms | 3489.49 |
99th percentile latency | complex-query | 4149.98 | 4186.45 | 4207.7 | 4148.67 | 4144.06 | ms | 4168.83 |
99.9th percentile latency | complex-query | 4672.63 | 4870.2 | 4720.98 | 4632.22 | 4681.02 | ms | 4719.7 |
99.99th percentile latency | complex-query | 4987.28 | 5296.97 | 5145.64 | 4986.35 | 5163.95 | ms | 5146.07 |
100th percentile latency | complex-query | 5526.07 | 5633.2 | 5606.27 | 5420.99 | 5462.41 | ms | 5633.2 |
50th percentile service time | complex-query | 2552.22 | 2591.87 | 2612.83 | 2601.61 | 2578.97 | ms | 2587.46 |
90th percentile service time | complex-query | 3449.25 | 3486.15 | 3530.09 | 3503.63 | 3473.51 | ms | 3489.49 |
99th percentile service time | complex-query | 4149.98 | 4186.45 | 4207.7 | 4148.67 | 4144.06 | ms | 4168.83 |
99.9th percentile service time | complex-query | 4672.63 | 4870.2 | 4720.98 | 4632.22 | 4681.02 | ms | 4719.7 |
99.99th percentile service time | complex-query | 4987.28 | 5296.97 | 5145.64 | 4986.35 | 5163.95 | ms | 5146.07 |
100th percentile service time | complex-query | 5526.07 | 5633.2 | 5606.27 | 5420.99 | 5462.41 | ms | 5633.2 |
error rate | complex-query | 0.89 | 1.01 | 0.97 | 0.94 | 0.99 | % | 0.96 |
ARS, 5 laps with cluster started fresh before running the benchmarks
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 102.981 | 102.723 | 103.387 | 101.932 | 102.316 | s | 513.339 | |
Total Old Gen GC | 0 | 0 | 0 | 0 | 0 | s | 0 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 29.2484 | 81.7369 | 82.0965 | 85.1784 | 86.0431 | ops/s | 29.2484 |
Median Throughput | complex-query | 84.7027 | 87.0279 | 86.5161 | 86.2562 | 87.8629 | ops/s | 86.5302 |
Max Throughput | complex-query | 85.0649 | 87.6369 | 87.2331 | 93.7006 | 89.5872 | ops/s | 93.7006 |
50th percentile latency | complex-query | 928.932 | 960.769 | 944.702 | 949.644 | 943.277 | ms | 945.357 |
90th percentile latency | complex-query | 2216.35 | 2045.28 | 2110.41 | 2093.92 | 2059.62 | ms | 2099.22 |
99th percentile latency | complex-query | 3842.12 | 3243.75 | 3374.46 | 3487.2 | 3323.93 | ms | 3463.71 |
99.9th percentile latency | complex-query | 5766.9 | 4004.19 | 4186.89 | 4354.08 | 4327.47 | ms | 4463.67 |
99.99th percentile latency | complex-query | 6690.88 | 4452.9 | 4685.68 | 4721.74 | 4818.03 | ms | 6149.39 |
100th percentile latency | complex-query | 7687.69 | 5242.12 | 5011.89 | 5263.98 | 5054.94 | ms | 7687.69 |
50th percentile service time | complex-query | 928.932 | 960.769 | 944.702 | 949.644 | 943.277 | ms | 945.357 |
90th percentile service time | complex-query | 2216.35 | 2045.28 | 2110.41 | 2093.92 | 2059.62 | ms | 2099.22 |
99th percentile service time | complex-query | 3842.12 | 3243.75 | 3374.46 | 3487.2 | 3323.93 | ms | 3463.71 |
99.9th percentile service time | complex-query | 5766.9 | 4004.19 | 4186.89 | 4354.08 | 4327.47 | ms | 4463.67 |
99.99th percentile service time | complex-query | 6690.88 | 4452.9 | 4685.68 | 4721.74 | 4818.03 | ms | 6149.39 |
100th percentile service time | complex-query | 7687.69 | 5242.12 | 5011.89 | 5263.98 | 5054.94 | ms | 7687.69 |
error rate | complex-query | 0.99 | 0.97 | 0.94 | 1.04 | 0.92 | % | 0.97 |
With this distribution of requests
node_name name active queue rejected completed es3 search 0 0 0 119263 es2 search 0 0 0 224000 es4 search 0 0 0 234915 es5 search 0 0 0 207679 esclient search 0 0 0 203000 es1 search 0 0 0 229143
4 replicas, with load summary
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median throughput (ops/s) | 52.1863 | 86.5302 | 65.810184 |
50th percentile latency (ms) | 2587.46 | 945.357 | -63.463899 |
90th percentile latency (ms) | 3489.49 | 2099.22 | -39.841639 |
99th percentile latency (ms) | 4168.83 | 3463.71 | -16.914098 |
So an improvement in all latencies, while increasing the median throughput from 52 ops/s to 86 ops/s.
As the number of replicas goes up, I expect the adaptive replica selection to be better, since it has more choices of potentially unloaded machines to service the request.
Evenly distributing requests instead of using the client node
One more test, instead of hitting the client node this time, I had Rally hit all nodes in the cluster (including the client node). For this test I dropped back to one replica since that is the most common use case.
No ARS 1 replica
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 166.788 | 157.16 | 145.218 | 125.436 | 118.413 | s | 713.015 | |
Total Old Gen GC | 0 | 0 | 0.096 | 0.118 | 0 | s | 0.214 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 64.3486 | 85.5629 | 86.9623 | 89.4434 | 89.6475 | ops/s | 64.3486 |
Median Throughput | complex-query | 86.1217 | 89.5194 | 89.4143 | 90.6719 | 91.0868 | ops/s | 89.6289 |
Max Throughput | complex-query | 87.6732 | 90.0012 | 90.3885 | 93.114 | 93.7311 | ops/s | 93.7311 |
50th percentile latency | complex-query | 1124.43 | 1097.21 | 1043.54 | 1038.33 | 1160.2 | ms | 1088.81 |
90th percentile latency | complex-query | 1736.69 | 1734.69 | 1633.39 | 1632.16 | 1821.2 | ms | 1706.07 |
99th percentile latency | complex-query | 2550.62 | 2399.38 | 2178.97 | 2189.69 | 2722.73 | ms | 2481.1 |
99.9th percentile latency | complex-query | 3016.49 | 2893.91 | 2951.67 | 2624.62 | 3143.67 | ms | 2991.56 |
99.99th percentile latency | complex-query | 3360.23 | 3266.18 | 3490.71 | 2893.75 | 3397.57 | ms | 3360.22 |
100th percentile latency | complex-query | 3507.84 | 3751.69 | 4471.25 | 3186.82 | 3732.73 | ms | 4471.25 |
50th percentile service time | complex-query | 1124.43 | 1097.21 | 1043.54 | 1038.33 | 1160.2 | ms | 1088.81 |
90th percentile service time | complex-query | 1736.69 | 1734.69 | 1633.39 | 1632.16 | 1821.2 | ms | 1706.07 |
99th percentile service time | complex-query | 2550.62 | 2399.38 | 2178.97 | 2189.69 | 2722.73 | ms | 2481.1 |
99.9th percentile service time | complex-query | 3016.49 | 2893.91 | 2951.67 | 2624.62 | 3143.67 | ms | 2991.56 |
99.99th percentile service time | complex-query | 3360.23 | 3266.18 | 3490.71 | 2893.75 | 3397.57 | ms | 3360.22 |
100th percentile service time | complex-query | 3507.84 | 3751.69 | 4471.25 | 3186.82 | 3732.73 | ms | 4471.25 |
error rate | complex-query | 1.08 | 0.98 | 0.93 | 0.99 | 1.07 | % | 1.01 |
ARS 1 replica
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 128.553 | 124.833 | 119.194 | 105.361 | 100.273 | s | 578.214 | |
Total Old Gen GC | 0 | 0 | 0 | 0.124 | 0 | s | 0.124 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 65.0194 | 91.3818 | 92.2215 | 96.2927 | 95.7731 | ops/s | 65.0194 |
Median Throughput | complex-query | 92.3494 | 94.4122 | 95.7469 | 96.909 | 98.121 | ops/s | 95.7207 |
Max Throughput | complex-query | 93.5308 | 94.8664 | 96.326 | 100.97 | 98.5577 | ops/s | 100.97 |
50th percentile latency | complex-query | 1031.64 | 1025.99 | 1010.09 | 999.791 | 987.14 | ms | 1010.36 |
90th percentile latency | complex-query | 1449.99 | 1439.34 | 1417.91 | 1413.7 | 1386.26 | ms | 1422.55 |
99th percentile latency | complex-query | 1815.91 | 1797.08 | 1783.24 | 1780.29 | 1738.78 | ms | 1784.96 |
99.9th percentile latency | complex-query | 2124.39 | 2111.02 | 2078.33 | 2122.11 | 2011.52 | ms | 2095.82 |
99.99th percentile latency | complex-query | 2495 | 2344.26 | 2466.2 | 2406.79 | 2316.72 | ms | 2428.25 |
100th percentile latency | complex-query | 2749.13 | 2569.92 | 2552.57 | 2882.8 | 2706.45 | ms | 2882.8 |
50th percentile service time | complex-query | 1031.64 | 1025.99 | 1010.09 | 999.791 | 987.14 | ms | 1010.36 |
90th percentile service time | complex-query | 1449.99 | 1439.34 | 1417.91 | 1413.7 | 1386.26 | ms | 1422.55 |
99th percentile service time | complex-query | 1815.91 | 1797.08 | 1783.24 | 1780.29 | 1738.78 | ms | 1784.96 |
99.9th percentile service time | complex-query | 2124.39 | 2111.02 | 2078.33 | 2122.11 | 2011.52 | ms | 2095.82 |
99.99th percentile service time | complex-query | 2495 | 2344.26 | 2466.2 | 2406.79 | 2316.72 | ms | 2428.25 |
100th percentile service time | complex-query | 2749.13 | 2569.92 | 2552.57 | 2882.8 | 2706.45 | ms | 2882.8 |
error rate | complex-query | 1.07 | 0.98 | 0.95 | 0.96 | 0.95 | % | 0.98 |
Round robin requests summary
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median throughput (ops/s) | 89.6289 | 95.7207 | 6.7966917 |
50th percentile latency (ms) | 1088.81 | 1010.36 | -7.2051138 |
90th percentile latency (ms) | 1706.07 | 1422.55 | -16.618310 |
99th percentile latency (ms) | 2481.1 | 1784.96 | -28.057716 |
So still an improvement on throughput and latency for all categories.
Here's a run from after the changes Adrien recommended on the PR:
Metric | Operation | Value | Unit |
---|---|---|---|
Total Young Gen GC | 463.485 | s | |
Total Old Gen GC | 0.064 | s | |
Min Throughput | complex-query | 65.9433 | ops/s |
Median Throughput | complex-query | 95.9248 | ops/s |
Max Throughput | complex-query | 99.6405 | ops/s |
50th percentile latency | complex-query | 1014.5 | ms |
90th percentile latency | complex-query | 1412.53 | ms |
99th percentile latency | complex-query | 1752.06 | ms |
99.9th percentile latency | complex-query | 2041.12 | ms |
99.99th percentile latency | complex-query | 2353.43 | ms |
100th percentile latency | complex-query | 3001.61 | ms |
50th percentile service time | complex-query | 1014.5 | ms |
90th percentile service time | complex-query | 1412.53 | ms |
99th percentile service time | complex-query | 1752.06 | ms |
99.9th percentile service time | complex-query | 2041.12 | ms |
99.99th percentile service time | complex-query | 2353.43 | ms |
100th percentile service time | complex-query | 3001.61 | ms |
And the results, pretty much the same as the non-change version:
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median throughput (ops/s) | 89.6289 | 95.9248 | 7.0244084 |
50th percentile latency (ms) | 1088.81 | 1014.5 | -6.8248822 |
90th percentile latency (ms) | 1706.07 | 1412.53 | -17.205625 |
99th percentile latency (ms) | 2481.1 | 1752.06 | -29.383741 |
Conclusion
Final summary, we want to see a high throughput improvement and a negative latency percentile improvement to consider the feature successful.
Test case | Throughput improvement % | 50th % change | 90th % change | 99th % change |
---|---|---|---|---|
1 replica, no load | 1.9% | -1.7% | 0.5% | 1.3% |
1 replica, with load | 115.8% | 129.0% | -62.3% | -57.1% |
4 replicas, no load | 11.6% | -27.2% | -28.6% | -25.9% |
4 replicas, with load | 65.8% | -63.5% | -39.8% | -16.9% |
1 replica, round robin, no load | 6.8% | -7.2% | -16.6% | -28.0% |
The adaptive replica selection shows an improvement for almost all tests in throughput and latency. While not perfect, it should help route around overloaded nodes.
Additionally, since it is dynamically configurable, it can easily be toggled on or off as desired.
Updated tests after feedback
Simon left some feedback on the PR about outstanding search requests not being measured correctly
(which he is right!), so I adjusted the outstanding requests to work correctly and adjusted b=4
back to b=3
to re-run the benchmarks for the adaptive replica selection cases. I did not rerun the
regular cases as they should not change.
Single replica, non-loaded case
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 75.073 | 75.041 | 74.993 | 74.623 | 73.222 | s | 372.952 | |
Total Old Gen GC | 0 | 0 | 0 | 0 | 0 | s | 0 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 63.5801 | 97.0808 | 98.5696 | 96.4159 | 98.4785 | ops/s | 63.5801 |
Median Throughput | complex-query | 96.8881 | 98.2529 | 99.9219 | 98.1566 | 99.3179 | ops/s | 98.537 |
Max Throughput | complex-query | 98.8394 | 99.3881 | 100.639 | 98.9853 | 104.276 | ops/s | 104.276 |
50th percentile latency | complex-query | 962.508 | 981.686 | 965.195 | 974.092 | 965.343 | ms | 970.15 |
90th percentile latency | complex-query | 1319.55 | 1339.31 | 1322.88 | 1332.44 | 1318.87 | ms | 1326.79 |
99th percentile latency | complex-query | 1642.93 | 1658.19 | 1650.91 | 1648.07 | 1645.92 | ms | 1648.8 |
99.9th percentile latency | complex-query | 1932.01 | 1954.33 | 1917.32 | 1916.36 | 1913.22 | ms | 1925.47 |
99.99th percentile latency | complex-query | 2351.23 | 2270.86 | 2234.84 | 2250.36 | 2268.94 | ms | 2270.86 |
100th percentile latency | complex-query | 2422.38 | 2440.09 | 2444.61 | 2631.93 | 2423.32 | ms | 2631.93 |
50th percentile service time | complex-query | 962.508 | 981.686 | 965.195 | 974.092 | 965.343 | ms | 970.15 |
90th percentile service time | complex-query | 1319.55 | 1339.31 | 1322.88 | 1332.44 | 1318.87 | ms | 1326.79 |
99th percentile service time | complex-query | 1642.93 | 1658.19 | 1650.91 | 1648.07 | 1645.92 | ms | 1648.8 |
99.9th percentile service time | complex-query | 1932.01 | 1954.33 | 1917.32 | 1916.36 | 1913.22 | ms | 1925.47 |
99.99th percentile service time | complex-query | 2351.23 | 2270.86 | 2234.84 | 2250.36 | 2268.94 | ms | 2270.86 |
100th percentile service time | complex-query | 2422.38 | 2440.09 | 2444.61 | 2631.93 | 2423.32 | ms | 2631.93 |
error rate | complex-query | 0.9 | 0.96 | 1.05 | 1.01 | 1.01 | % | 0.99 |
Single replica, under load case
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 138.294 | 135.652 | 129.945 | 121.155 | 122.426 | s | 647.472 | |
Total Old Gen GC | 0 | 0 | 0.067 | 0 | 0 | s | 0.067 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 53.6798 | 83.1811 | 87.0763 | 88.1993 | 80.6025 | ops/s | 53.6798 |
Median Throughput | complex-query | 85.5729 | 87.2077 | 89.1485 | 89.4732 | 87.7697 | ops/s | 87.8231 |
Max Throughput | complex-query | 86.8944 | 88.1925 | 89.9637 | 91.725 | 88.4414 | ops/s | 91.725 |
50th percentile latency | complex-query | 1016.17 | 1006.04 | 992.635 | 1005.78 | 1015.35 | ms | 1007.22 |
90th percentile latency | complex-query | 1851.86 | 1843.43 | 1853.17 | 1806.01 | 1845.79 | ms | 1839.46 |
99th percentile latency | complex-query | 2466.59 | 2443.15 | 2454.92 | 2368.89 | 2431.93 | ms | 2433.55 |
99.9th percentile latency | complex-query | 2918.42 | 2785.14 | 2843.5 | 2751.1 | 2863.54 | ms | 2835.38 |
99.99th percentile latency | complex-query | 3185.07 | 3094.71 | 3197.62 | 3057 | 3283.16 | ms | 3245.19 |
100th percentile latency | complex-query | 3728.13 | 3565.79 | 3472.04 | 3742.51 | 3680.95 | ms | 3742.51 |
50th percentile service time | complex-query | 1016.17 | 1006.04 | 992.635 | 1005.78 | 1015.35 | ms | 1007.22 |
90th percentile service time | complex-query | 1851.86 | 1843.43 | 1853.17 | 1806.01 | 1845.79 | ms | 1839.46 |
99th percentile service time | complex-query | 2466.59 | 2443.15 | 2454.92 | 2368.89 | 2431.93 | ms | 2433.55 |
99.9th percentile service time | complex-query | 2918.42 | 2785.14 | 2843.5 | 2751.1 | 2863.54 | ms | 2835.38 |
99.99th percentile service time | complex-query | 3185.07 | 3094.71 | 3197.62 | 3057 | 3283.16 | ms | 3245.19 |
100th percentile service time | complex-query | 3728.13 | 3565.79 | 3472.04 | 3742.51 | 3680.95 | ms | 3742.51 |
error rate | complex-query | 0.97 | 0.92 | 1.01 | 1.08 | 0.95 | % | 0.99 |
Single replica round robin, no load
Metric | Operation | Lap 1 | Lap 2 | Lap 3 | Lap 4 | Lap 5 | Unit | TOTAL |
---|---|---|---|---|---|---|---|---|
Total Young Gen GC | 137.313 | 130.443 | 123.806 | 110.669 | 106.573 | s | 608.804 | |
Total Old Gen GC | 0 | 0 | 0 | 0.128 | 0 | s | 0.128 | |
Heap used for segments | 55.6507 | 55.6507 | 55.6507 | 55.6507 | 55.6507 | MB | 55.6507 | |
Heap used for doc values | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | 0.000324249 | MB | 0.000324249 | |
Heap used for terms | 44.3083 | 44.3083 | 44.3083 | 44.3083 | 44.3083 | MB | 44.3083 | |
Heap used for points | 6.06202 | 6.06202 | 6.06202 | 6.06202 | 6.06202 | MB | 6.06202 | |
Heap used for stored fields | 5.28004 | 5.28004 | 5.28004 | 5.28004 | 5.28004 | MB | 5.28004 | |
Segment count | 5 | 5 | 5 | 5 | 5 | 5 | ||
Min Throughput | complex-query | 68.0178 | 90.9538 | 92.648 | 94.4686 | 93.0936 | ops/s | 68.0178 |
Median Throughput | complex-query | 91.5291 | 95.6761 | 96.0065 | 96.3352 | 96.4799 | ops/s | 95.9452 |
Max Throughput | complex-query | 92.8571 | 96.1195 | 97.2325 | 97.5942 | 96.8988 | ops/s | 97.5942 |
50th percentile latency | complex-query | 1040.13 | 1010.26 | 1011.09 | 1004.91 | 1003.2 | ms | 1013.61 |
90th percentile latency | complex-query | 1456.38 | 1422.49 | 1414.84 | 1416.78 | 1406.63 | ms | 1423.83 |
99th percentile latency | complex-query | 1831.73 | 1787.15 | 1765.59 | 1784.83 | 1746.04 | ms | 1783.73 |
99.9th percentile latency | complex-query | 2140.53 | 2073.31 | 2075.77 | 2176.83 | 2056.19 | ms | 2107.3 |
99.99th percentile latency | complex-query | 2359.15 | 2428.91 | 2462.54 | 2481.45 | 2362.07 | ms | 2465.01 |
100th percentile latency | complex-query | 2628.35 | 2541.27 | 2785.96 | 2694.77 | 2589.07 | ms | 2785.96 |
50th percentile service time | complex-query | 1040.13 | 1010.26 | 1011.09 | 1004.91 | 1003.2 | ms | 1013.61 |
90th percentile service time | complex-query | 1456.38 | 1422.49 | 1414.84 | 1416.78 | 1406.63 | ms | 1423.83 |
99th percentile service time | complex-query | 1831.73 | 1787.15 | 1765.59 | 1784.83 | 1746.04 | ms | 1783.73 |
99.9th percentile service time | complex-query | 2140.53 | 2073.31 | 2075.77 | 2176.83 | 2056.19 | ms | 2107.3 |
99.99th percentile service time | complex-query | 2359.15 | 2428.91 | 2462.54 | 2481.45 | 2362.07 | ms | 2465.01 |
100th percentile service time | complex-query | 2628.35 | 2541.27 | 2785.96 | 2694.77 | 2589.07 | ms | 2785.96 |
error rate | complex-query | 0.94 | 0.95 | 1 | 0.93 | 0.98 | % | 0.96 |
Updated tests summary
Single replica, non-loaded case:
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median Throughput (ops/s) | 95.7866 | 98.537 | 2.8713828 |
50th percentile latency (ms) | 1003.29 | 970.15 | -3.3031327 |
90th percentile latency (ms) | 1339.69 | 1326.79 | -0.96290933 |
99th percentile latency (ms) | 1648.34 | 1648.8 | 0.027906864 |
So again, not a huge latency difference, as expected for the unloaded cluster.
Single replica, es3
under load:
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median throughput (ops/s) | 41.1558 | 87.8231 | 113.39179 |
50th percentile latency (ms) | 411.721 | 1007.22 | 144.63654 |
90th percentile latency (ms) | 5215.34 | 1839.46 | -64.729816 |
99th percentile latency (ms) | 6181.48 | 2433.55 | -60.631596 |
And again, a large improvement in throughput for the loaded case as well as a trade-off of 50th percentile latency for a large improvement in 90th and 99th percentile latency.
Single replica, round robin requests:
Metric | No ARS | ARS | Change % |
---|---|---|---|
Median throughput (ops/s) | 89.6289 | 95.9452 | 7.0471689 |
50th percentile latency (ms) | 1088.81 | 1013.61 | -6.9066228 |
90th percentile latency (ms) | 1706.07 | 1423.83 | -16.543284 |
99th percentile latency (ms) | 2481.1 | 1783.73 | -28.107291 |
Again a nice improvement in both throughput and latency for the non-stressed round-robin test case.
PMC tests
I didn't want to only test with my scenario, so I also ran the PMC tests with and without ARS enabled. Note that the throughput for this test is targeted at 20 ops/s, so I don't expect much difference there, only in the latency.
Metric | Operation | Value | Value | Unit | Change % |
---|---|---|---|---|---|
Total Young Gen GC | 71.609 | 80.071 | s | 11.816950 | |
Min Throughput | default | 1734.13 | 1173.94 | ops/s | -32.303807 |
Median Throughput | default | 2493.65 | 2110.04 | ops/s | -15.383474 |
Max Throughput | default | 2608.02 | 2258.78 | ops/s | -13.391002 |
50th percentile latency | default | 20.2543 | 29.9434 | ms | 47.837249 |
90th percentile latency | default | 66.1935 | 66.8977 | ms | 1.0638507 |
99th percentile latency | default | 263.346 | 133.466 | ms | -49.319147 |
99.9th percentile latency | default | 496.514 | 234.434 | ms | -52.784010 |
99.99th percentile latency | default | 687.331 | 255.652 | ms | -62.805111 |
100th percentile latency | default | 753.063 | 259.545 | ms | -65.534756 |
50th percentile service time | default | 20.2543 | 29.9434 | ms | 47.837249 |
90th percentile service time | default | 66.1935 | 66.8977 | ms | 1.0638507 |
99th percentile service time | default | 263.346 | 133.466 | ms | -49.319147 |
99.9th percentile service time | default | 496.514 | 234.434 | ms | -52.784010 |
99.99th percentile service time | default | 687.331 | 255.652 | ms | -62.805111 |
100th percentile service time | default | 753.063 | 259.545 | ms | -65.534756 |
Min Throughput | term | 1289.22 | 1514.94 | ops/s | 17.508261 |
Median Throughput | term | 1568.08 | 1910.02 | ops/s | 21.806285 |
Max Throughput | term | 1641.91 | 1974.78 | ops/s | 20.273340 |
50th percentile latency | term | 37.8038 | 36.8043 | ms | -2.6439141 |
90th percentile latency | term | 106.132 | 88.4687 | ms | -16.642766 |
99th percentile latency | term | 308.381 | 167.496 | ms | -45.685370 |
99.9th percentile latency | term | 579.552 | 287.72 | ms | -50.354757 |
99.99th percentile latency | term | 772.559 | 342.839 | ms | -55.622936 |
100th percentile latency | term | 785.949 | 357.648 | ms | -54.494757 |
50th percentile service time | term | 37.8038 | 36.8043 | ms | -2.6439141 |
90th percentile service time | term | 106.132 | 88.4687 | ms | -16.642766 |
99th percentile service time | term | 308.381 | 167.496 | ms | -45.685370 |
99.9th percentile service time | term | 579.552 | 287.72 | ms | -50.354757 |
99.99th percentile service time | term | 772.559 | 342.839 | ms | -55.622936 |
100th percentile service time | term | 785.949 | 357.648 | ms | -54.494757 |
Min Throughput | phrase | 1212.21 | 1607 | ops/s | 32.567789 |
Median Throughput | phrase | 1474.6 | 1889.54 | ops/s | 28.139156 |
Max Throughput | phrase | 1598.05 | 2007.98 | ops/s | 25.651888 |
50th percentile latency | phrase | 31.4457 | 22.0566 | ms | -29.858136 |
90th percentile latency | phrase | 116.218 | 98.1068 | ms | -15.583817 |
99th percentile latency | phrase | 360.709 | 288.757 | ms | -19.947381 |
99.9th percentile latency | phrase | 594.483 | 384.245 | ms | -35.364846 |
99.99th percentile latency | phrase | 698.474 | 538.333 | ms | -22.927267 |
100th percentile latency | phrase | 706.989 | 541.952 | ms | -23.343645 |
50th percentile service time | phrase | 31.4457 | 22.0566 | ms | -29.858136 |
90th percentile service time | phrase | 116.218 | 98.1068 | ms | -15.583817 |
99th percentile service time | phrase | 360.709 | 288.757 | ms | -19.947381 |
99.9th percentile service time | phrase | 594.483 | 384.245 | ms | -35.364846 |
99.99th percentile service time | phrase | 698.474 | 538.333 | ms | -22.927267 |
100th percentile service time | phrase | 706.989 | 541.952 | ms | -23.343645 |
Min Throughput | articles_monthly_agg_uncached | 413.915 | 351.617 | ops/s | -15.050916 |
Median Throughput | articles_monthly_agg_uncached | 642.901 | 606.109 | ops/s | -5.7228096 |
Max Throughput | articles_monthly_agg_uncached | 673.724 | 627.425 | ops/s | -6.8721019 |
50th percentile latency | articles_monthly_agg_uncached | 14.1694 | 14.6811 | ms | 3.6113032 |
90th percentile latency | articles_monthly_agg_uncached | 15.8405 | 17.8053 | ms | 12.403649 |
99th percentile latency | articles_monthly_agg_uncached | 26.3351 | 30.0847 | ms | 14.238032 |
99.9th percentile latency | articles_monthly_agg_uncached | 67.0225 | 77.817 | ms | 16.105785 |
99.99th percentile latency | articles_monthly_agg_uncached | 77.734 | 90.7812 | ms | 16.784419 |
100th percentile latency | articles_monthly_agg_uncached | 78.4629 | 90.9573 | ms | 15.923959 |
50th percentile service time | articles_monthly_agg_uncached | 14.1694 | 14.6811 | ms | 3.6113032 |
90th percentile service time | articles_monthly_agg_uncached | 15.8405 | 17.8053 | ms | 12.403649 |
99th percentile service time | articles_monthly_agg_uncached | 26.3351 | 30.0847 | ms | 14.238032 |
99.9th percentile service time | articles_monthly_agg_uncached | 67.0225 | 77.817 | ms | 16.105785 |
99.99th percentile service time | articles_monthly_agg_uncached | 77.734 | 90.7812 | ms | 16.784419 |
100th percentile service time | articles_monthly_agg_uncached | 78.4629 | 90.9573 | ms | 15.923959 |
Min Throughput | articles_monthly_agg_cached | 3274.88 | 3362.04 | ops/s | 2.6614716 |
Median Throughput | articles_monthly_agg_cached | 3397.29 | 3383.61 | ops/s | -0.40267390 |
Max Throughput | articles_monthly_agg_cached | 3511.35 | 3388.89 | ops/s | -3.4875475 |
50th percentile latency | articles_monthly_agg_cached | 2.48462 | 2.49701 | ms | 0.49866780 |
90th percentile latency | articles_monthly_agg_cached | 3.0102 | 2.96122 | ms | -1.6271344 |
99th percentile latency | articles_monthly_agg_cached | 5.33889 | 5.96885 | ms | 11.799456 |
99.9th percentile latency | articles_monthly_agg_cached | 59.0387 | 67.0066 | ms | 13.496063 |
99.99th percentile latency | articles_monthly_agg_cached | 60.1156 | 71.0272 | ms | 18.151029 |
100th percentile latency | articles_monthly_agg_cached | 69.1591 | 71.3471 | ms | 3.1637196 |
50th percentile service time | articles_monthly_agg_cached | 2.48462 | 2.49701 | ms | 0.49866780 |
90th percentile service time | articles_monthly_agg_cached | 3.0102 | 2.96122 | ms | -1.6271344 |
99th percentile service time | articles_monthly_agg_cached | 5.33889 | 5.96885 | ms | 11.799456 |
99.9th percentile service time | articles_monthly_agg_cached | 59.0387 | 67.0066 | ms | 13.496063 |
99.99th percentile service time | articles_monthly_agg_cached | 60.1156 | 71.0272 | ms | 18.151029 |
100th percentile service time | articles_monthly_agg_cached | 69.1591 | 71.3471 | ms | 3.1637196 |