OpenBSD NFS Performance Tuning
Recently I've starting using my FreeBSD server as an active NFS server again, instead of just a giant file storage system for old pictures and docs. I prefer to keep as much data on a central storage system, that way, individual client machines can be rebuilt at a moments notice with no data loss.
Until a few months ago, my home desktop was running CentOS. Now it is a OpenBSD desktop running CWM, and I have noticed that the NFS client performance didn't seem that great compared to the old Linux desktop. So, I figured it was time for some performance testing.
I knew that this could very well be the result when switching back to a BSD* desktop. However, it was now in my own best interest to find out how to get every bit of permformance of my new desktop.
After some research, I found that there are at least five variables that could be tweaked for optimal NFS performance. There could be more factors, but these are the five that I found:
- sysctl parameter vfs.nfs.iothreads, on a range of 4-20. Default is 4.
- NFS option TCP vs UDP. Default is UDP.
- NFS option soft vs hard. Default is hard.
- NFS option readahead, on a range of 1-4. Default is 4.
- NFS options rsize and wsize buffers. Power of 2, greater or equal to 1024. Default is 8192.
Now from my experience with NFS shares on Linux, I appeared to me that the read/write buffer size by default is rather small. I am used to our NFS shares at work defaulting to 65536 bytes, and could be as large as 1048576 bytes (1M) for some particular NFS servers. Why would OpenBSD default to such a small number? On the Linux desktop, I remember it being negotiated to 131072.
Well, after some manual remounting, I discovered something odd to me; TCP mounts could never be mounted with buffer sizes in excess of 65536 bytes, and UDP mounts could never using any buffer size greater than 32768 bytes. Combing through OpenBSD source on github led me to this file.
sys/nfs/nfsproto.h
54 #define NFS_MAXDGRAMDATA 32768
55 #define NFS_MAXDATA MAXBSIZE
As you can see here, it looks like the max UDP buffer size is set to 32768, and the max TCP buffer size is set to MAXBSIZE, which appears to me to be configure here:
sys/sys/param.h
134 #define MAXBSIZE (64 * 1024)
This seems to explain why TCP can be set to 65536, whereas UDP is limited to 32768 bytes. Now in order for my tests to be a little more equal between TCP and UDP, I did recompile a new kernel that bumped the NFS_MAXGRAMDATA up to 65536. My honest assumption would be that the larger buffer would speed up connections overall. I did not go any higher than 65K though, since I've been reading that the max buffer size should generally not exceed the blocksize of the underlying filesystem. Also, changing the MAXBSIZE would require more changes in the source than just one file from my reading.
Now that I knew the parameter that can be modulated, there could only be 1024 possible permutations, I wrote a little script that should be able to test every possible combination. The script was tweaked over the course of a couple days, as each full test would take about a day or more. The first full test copied over a data lump created from /dev/random with dd, with a size of 256M. I know that the smaller file would probably report higher transfer speeds than a larger file, but I think that the file is sufficiently large enough for this simple test. I am currently re-writing the script again to transfer three different lumps at three different sizes. I will include the script once I can fully test it.
In the mean time, I will include the data generated from the 256M transfer test below. I saw some interesting things that stuck out to me.
In order to keep things as fair between transfers as possible, the server runs a sync, then unmounts the share. It then remounts the share with the next combination of settings. I do this to keep each iteration as clean and fair between runs. I also collect the sync and unmount times of each run.
I will add the script later, as each admin may have different results than me, and may require their own tweaks.
Results
Top 30 results when transferring a 256M data file:
iothreads protocol mount readahead buffer transfer time transfer speed sync time umount time
4 tcp soft 3 32768 0:00:03 73.20MB/s 0.70 67.22
4 tcp hard 3 32768 0:00:03 70.57MB/s 0.75 65.75
5 tcp hard 4 32768 0:00:03 70.30MB/s 19.19 47.13
5 tcp soft 1 32768 0:00:03 69.20MB/s 17.90 46.68
5 tcp soft 2 32768 0:00:03 68.83MB/s 18.52 44.98
5 tcp hard 1 32768 0:00:03 68.64MB/s 19.08 45.90
4 tcp hard 4 32768 0:00:03 67.57MB/s 0.74 68.63
6 tcp soft 4 32768 0:00:03 67.22MB/s 10.69 52.85
4 tcp soft 4 32768 0:00:03 66.95MB/s 0.68 70.96
5 tcp soft 4 32768 0:00:03 66.80MB/s 23.96 66.58
5 tcp soft 3 32768 0:00:03 66.66MB/s 17.81 46.40
4 tcp soft 2 32768 0:00:03 66.45MB/s 0.80 68.15
5 tcp hard 2 32768 0:00:03 66.38MB/s 18.71 46.05
7 tcp hard 1 32768 0:00:03 66.33MB/s 17.71 42.02
5 tcp hard 3 32768 0:00:03 66.18MB/s 20.95 46.21
7 tcp hard 3 32768 0:00:03 65.52MB/s 17.38 42.05
11 tcp hard 4 32768 0:00:03 65.31MB/s 17.35 44.80
7 tcp hard 4 32768 0:00:03 64.84MB/s 17.54 45.64
10 tcp hard 2 32768 0:00:03 64.61MB/s 2.04 49.79
10 tcp soft 1 32768 0:00:03 64.61MB/s 1.74 49.74
7 tcp hard 2 32768 0:00:03 64.27MB/s 17.84 43.92
7 tcp soft 1 32768 0:00:03 64.06MB/s 18.39 43.17
9 tcp soft 3 32768 0:00:04 63.71MB/s 17.35 42.67
6 tcp soft 2 32768 0:00:04 63.55MB/s 4.87 49.46
6 tcp hard 4 32768 0:00:04 63.47MB/s 7.18 54.42
7 tcp soft 2 32768 0:00:04 63.41MB/s 18.83 43.72
6 tcp soft 3 32768 0:00:04 63.26MB/s 8.69 49.45
6 tcp hard 3 32768 0:00:04 63.09MB/s 3.67 58.31
7 tcp soft 4 32768 0:00:04 63.02MB/s 16.51 42.65
10 tcp hard 1 32768 0:00:04 62.61MB/s 1.94 47.95
Top 30 results when transferring a 1024MB data file:
iothreads protocol mount readahead buffer transfer time transfer speed sync time umount time
5 tcp hard 2 32768 0:00:19 53.18MB/s 1.48 42.95
4 tcp soft 2 32768 0:00:19 52.93MB/s 0.06 40.71
7 tcp hard 4 32768 0:00:19 52.19MB/s 7.65 44.48
4 tcp soft 3 32768 0:00:19 52.06MB/s 0.06 45.61
5 tcp soft 1 32768 0:00:19 52.02MB/s 1.42 42.71
4 tcp hard 1 32768 0:00:19 51.60MB/s 0.08 42.46
6 tcp soft 3 32768 0:00:19 51.54MB/s 0.08 47.62
5 tcp soft 3 32768 0:00:19 51.52MB/s 1.48 42.72
4 tcp soft 4 32768 0:00:19 51.47MB/s 0.07 43.57
5 tcp hard 4 32768 0:00:19 51.28MB/s 1.43 44.55
5 tcp soft 4 32768 0:00:20 51.08MB/s 1.50 42.84
13 tcp soft 3 32768 0:00:20 50.96MB/s 31.16 14.48
6 tcp soft 4 32768 0:00:20 50.93MB/s 0.09 47.37
13 tcp soft 4 32768 0:00:20 50.84MB/s 29.83 13.90
6 tcp hard 3 32768 0:00:20 50.80MB/s 0.55 47.55
13 tcp hard 2 32768 0:00:20 50.69MB/s 34.10 15.52
13 tcp soft 2 32768 0:00:20 50.64MB/s 30.35 13.75
5 tcp hard 1 32768 0:00:20 50.20MB/s 1.47 44.43
13 tcp hard 4 32768 0:00:20 50.08MB/s 31.16 13.63
6 tcp hard 1 32768 0:00:20 50.03MB/s 0.10 47.92
5 tcp soft 2 32768 0:00:20 50.02MB/s 1.52 43.00
6 tcp soft 1 32768 0:00:20 49.92MB/s 0.15 46.92
6 tcp soft 2 32768 0:00:20 49.75MB/s 0.51 47.84
4 tcp hard 4 32768 0:00:20 49.73MB/s 0.06 44.28
13 tcp soft 1 32768 0:00:20 49.70MB/s 30.90 14.50
7 tcp soft 4 32768 0:00:20 49.69MB/s 6.10 43.08
5 tcp hard 3 32768 0:00:20 49.69MB/s 1.58 43.66
4 tcp soft 1 32768 0:00:20 49.69MB/s 0.09 46.28
6 tcp hard 2 32768 0:00:20 49.65MB/s 0.18 46.86
13 tcp hard 3 32768 0:00:20 49.44MB/s 31.57 14.89
It appears that the iothreads definitely did not increase throughput for some reason, nor did increasing readahead. But using a buffer size of 32768 and TCP seemed to consistently produce the results.
Analyzing
In order to quickly find the best overall possible combination of values, I wrote another script that went through and looks at each individual setting in the data. Now this script should only find the best average values, as in it should find the combination that works the best, most of the time, but will not find the overall fastest combination. Just the settings that seem to work the best, across the most runs. What the script does is sort the data based on the fastest transfers, then adds up the overall placing of the respective value, from 1-1024, and finds the value that has the lowest result, which should indicate it had the highest average placing.
iothreads:
10,36019
11,32874
12,33641
13,34088
14,34913
15,35264
16,36803
17,37525
18,39251
19,36405
20,37461
4,26563
5,31043
6,31944
7,33194
8,34634
9,34281
best: 4
protocol:
tcp,344656
udp,241247
best: udp
mount:
hard,292213
soft,293690
best: hard
readahead:
1,145902
2,146944
3,147972
4,145085
best: 4
buffer size:
16384,127966
32768,58819
65536,181338
8192,217780
best: 32768
The way that I interpreted this, is that iothreads do have a bit of impact on transfer performance, with 4 iothreads being the most impactful. In fact, 4-6 seemed to have the best results overall. Between TCP and UDP, UDP absolutely blows away UDP on the best average, which is odd, since the fastest runs all were using TCP, and not UDP. I interpret this that UDP has a good average transfer speed, but that TCP requires specific combinations to boost its transfer performance.
There was not a whole lot if variance between hard and soft mounts, and readahead settings were almost identical. Now buffer sizes definitely had an impact on performance. 32K easily ran away with the "most impactful" title in this test, and the default setting of 8192 having the worst performance.
Interpretations
For most users, these parameters seem to provide the best average results:
- iothreads set to 4
- readahead set to 4
- udp
- soft
- rsize and wsize set to 32768
However, when looking for the quickest transfer speeds in my case, I found that TCP gave me better performance than UDP. Now using hard vs soft is really up to the individual. If you are using the NFS mount to back up data, I would stick with hard. However, for everyday use, soft should be fine. All together, this combination seemed to provide the best performance for me:
- iothreads set to 4
- readahead set to 3
- tcp
- soft
- rsize and wsize set to 32768
Conclusions
Change your r/w buffer sizes to 32768! This simple change easily had the strongest influence over any other default value. The other values seem to generally be ok for most transfers, but in the end, every case could end up needing slightly different settings.
Notes
NFS Server:
OS: FreeBSD 12.0
CPU: Intel(R) Xeon(R) CPU E3-1275 v6 @ 3.80GHz (3792.20-MHz K8-class CPU)
Network: 4 igb NICS bonded using LACP
NFS Client
OS: OpenBSD Current (6.5-beta)
CPU: Intel(R) Core(TM) i3-6100 CPU @ 3.70GHz, 3701.39 MHz, 06-5e-03
Network: 4 em NICS bonded using LACP
Has been tested on OpenBSD 6.4