This post is the first of a series and continues in Part 2: Memory
In light of the release of FileCatalyst Direct v3.0, I thought I’d write a few articles about the road to achieving 10Gbps speeds. It seems to me the best place to start is with the endpoint of the transfer: the storage media. Why? Before we even think about the file transfer protocol, we have to be sure that our disks can keep up. It’s very easy for people to forget how much of a bottleneck disk read & write IO can be.
Disk IO Tests
10Gbps (1250MB/s ) is fast. Especially for a single file stream. By comparison:
- Your average enterprise HDD SATA drive will read/write between 50-100MB/s (or 800Mbps)
- Your average SSD SATA drive will read data at 300MB/s, and write at 200MB/s (1.6 Gbps-2.4 Gbps)
- FiberChannel SAN arrays normally connect between 500MB/s to 1000MB/s (4Gbps or 8Gbps)
Getting FileCatalyst to run 10Gbps speeds (1250 MB/s) requires testing your hardware infrastructure. Having a fiber SAN may not be enough.
For our internal test environment, this meant adding a few RAID 0 arrays (8 SSD drives or 16 HDD drives) in order to achieve the desired speeds. Certainly not mentioning this as a recommendation for a production solution, but it was the simplest way to achieve the speeds we needed for testing.
To determine what your system is capable of providing, FileCatalyst includes tools within FileCatalyst Server, HotFolder, and Command Line Client. The tools are command-line scripts designed to emulate how the FileCatalyst software manages Disk IO, and can give you a good approximation of your system’s potential before a single byte has transferred over the network.
C:UsersccharetteDesktopapplettestsCL>java -jar FileCatalystCL.jar -testIO
Entering Write TestIO. This will run a series of tests on the file system to
attempt to discover the optimal values for # of writer threads and write block
size for your system.The test is both IO intensive and CPU intensive. Please
give the test adequate time to complete.
Please enter the drive/path you wish to test (ie: C:/ or /mnt/data/ ): c:temp
File to be written: c:temp/test.io
Please enter the size of file you wish write (in MB, default 500): 10000
File size: 10000MB.
Please enter the timeout length (secs) per run (default 60 secs): 180
Timeout: 180 seconds.
Please enter the number of runs to perform for each settings (default 5): 3
Number of runs per iteration: 3
Test if buffer size used in writes to disk affect performance.
Please enter a buffer array to attempt (default: '64,128,256,512,1024,2048,4096')
Size in KB, comma delimited: 4,16,64,256,1024,4096,16384
4 Buffer Size values (KB): 4,16,64,256,1024,4096,16384.
Test if multiple writters offer performance benefit when saving a block to disk.
Please enter a writer thread array to attempt (default: '1,2,4,6,8'): 1,2
2 Thread values: 1,2.
How many files would you like to create concurrently for IO test (default 1)?
Note: number of files will never exceed number of writer threads during tests.
1 files will be created in test: 1.
Test using Direct IO when allocating buffer space by Java (default true): true
Use DirectIO = true.
Mode used to open up files (rw/rws/rwd -- default rw): rw
Mode used = rw.
Things to note:
- It is better to create a file large enough that it represents your total data set expected for the transfer (ie: 9GB DVD ISO), or larger than the amount of memory the OS can utilize for file system buffers (see results below).
- Timeout should represent a valid length of time you expect can easily be reached when writing a single copy of the file.
- Specify # of runs with the average posted.
- Buffer size represents an optional switch on the Server & Clients which dictates how large each read/write IO should be from the JAVA code down to the file system. You should experiment with a few values here, as different disk configurations sometimes yield vastly different optimal results.
- The Server and Clients support multiple read and write threads per file.
- Keep the # of files to create concurrently to 1 if you are testing only a single file tranfer speed (single client to server endpoint). This is actually one of the hardest test cases to manage, as there are OS level locks which often task the CPU, limiting throughput you can get when writing to a single file. If you are looking to test 10 clients each utilizing 1Gbps connection, select multiple files (much higher file IO possible when multiple files are being saved at a time).
- Always select Direct IO, since this is what the FileCatalyst application uses.
- Select default “rw” mode, which takes advantage of OS level memory buffers if available.
Results
Machine 1a: Windows 7, single SSD drive | Machine 1b: Windows 7, single HDD drive |
Tests run with the following parameters: file: c:/temp/test.io size: 10000000000 timeout: 180000 directIO: true file mode: rw Max # files to use: 1 # of THREADS |1 |2 Buffer size +=======+=======+ 4 |240 |150 16 |237 |150 64 |238 |150 256 |236 |152 1024 |237 |152 4096 |236 |149 16384 |236 |196 |
Tests run with the following parameters: file: E:/tmp/test.io size: 10000000000 timeout: 300000 directIO: true file mode: rw Max # files to use: 1 # of THREADS |1 |2 Buffer size +=======+=======+ 4 |89 |52 16 |92 |53 64 |92 |53 256 |92 |53 1024 |95 |54 4096 |100 |55 16384 |104 |79 |
Machine 2a: Ubuntu RAID 0, 8 x SSD, 10GB file | Machine 2b: Ubuntu RAID 0, 8 x SSD, 60GB file |
Tests run with the following parameters: file: /opt/tmp//test.io size: 10000000000 timeout: 60000 directIO: true file mode: rw Max # files to use: 1 # of THREADS |1 |2 Buffer size +=======+=======+ 4 |1431 |1282 16 |1722 |1614 64 |2059 |1748 256 |2239 |1933 1024 |2095 |2050 4096 |2078 |2048 16384 |1841 |1720 |
Tests run with the following parameters: file: /opt/tmp//test.io size: 60000000000 timeout: 180000 directIO: true file mode: rw Max # files to use: 1 # of THREADS |1 |2 Buffer size +=======+=======+ 4 |1093 |893 16 |1177 |1131 64 |1400 |1184 256 |1387 |1324 1024 |1402 |1303 4096 |1293 |1294 16384 |1140 |1271 |
Observations: Note that neither of these machines benefitted from multiple writer threads, and that performance was higher when a single writer was involved.
Machine 1a: When writing to SSD, we can get 230+ MB/s (>1.8Gbps) of write speed when using 1 thread. Block sizes do not affect throughput.
Machine 1b: Same machine, but utilizing slower secondary HDD drive. When using slower disks, the software can only get a fraction of the bandwidth (in this case < 1Gbps). We do see marginal improvements the larger the block size used, so limiting the block size is not a good idea. By default, the FileCatalyst application will use the largest block size it can (determined by UDP block size).
Machine 2a: Can read/write at 2000+MB/s (>16Gbps) for 10GB files. We can also see a sweet spot of ~256KB write block, where smaller writes adversely affect performance (as to larger blocks). This system however has 48GB of RAM on it, so the numbers it provides me are actually above what I would expect the system to give me.
Machine 2b: Same test, but with 60GB file. Now we have realistic numbers which match the disk IO, giving us a system capable of sustaining 1350MB/s (10.5Gbps) write speed.
Configuration Values
On the Server & Client side, the following configuration options are therefore available to set (CLI arguments shown):
- numBlockWriters [#]
- writeBufferSizeKB [# KB]
- writeFileMode [rw/rwd/rws]
- numBlockReaders [#]
- readBufferSizeKB [# KB]
This are machine specific settings. To maximize performance, you need to run tests on both endpoints (client + server). On the server (if client connections are going to do both upload and downloads), you should run both read and write tests.
For both the FileCatalyst Server and HotFolder, these settings are configuration file changes that must be manually set (fcconf.conf for server, fchf.conf on HotFolder). For the CLI, these may be passed in as run-time arguments.
Conclusion
Knowing the limits of your system IO is the first required step in achieving high speed transfers. FileCatalyst v3.0 provides several tools to help both benchmarks those limits and tune the application to best take advantage of your system.
Want to learn more? Continue to the follow-up article, Part 2: Memory