Thursday, January 29, 2009

rsync vs scp vs ssh + tar

Many will agree that copying large directory trees with many files of varying sizes and compressibility from one linux box to another can be such a caterpillar; whereas surprisingly copying huge files is relatively faster. The primary reason being that as the number of files (actual files+directories) increases, the file system and kernel overheads add up to the network slowness.

Here are some interesting results from a small experiment I performed -

The same data was copied between same machine with similar network conditions and loads using different methods with and without compression. Both fedora installations are new and home partitions almost empty (I have no idea at the moment what would happen if they were somewhat filled). 2 dry runs were carried out before actual tests to give cache benefit to all methods.

Data being copied : 1.6GB worth matlab2008a(unix) installation - contains bunch of avi videos in megabytes, moderate number of jar files and many small .m files.

source machine:
Pentium D 820@2.8GHz, 2GB DDR2 dual channel@667, Intel 946 mobo, on board SATA controller; Samsung 160GB SATA hard disk drive@7200rpm and 8M cache, Linux f10 2.6.27.12-170.2.5.fc10.i686,
35GB ext3 source partition (fresh).

target machine:
Pentium core2 duo E4500@2.2GHz, 2GB DDR2 dual channel@667, Intel 965 mobo, on board SATA controller; Samsung 160GB SATA hard disk drive@7200rpm and 2M cache, Linux f10 2.6.27.12-170.2.5.fc10.i686,
100GB ext3 copy-to partition (fresh).

Results
Method
command
Timing avg. CPU util.
uncompressed recursive scp
scp -rq matlab_install prashant@10.105.41.19:matlab_install_regular_scp_unc
real 9m54.554s
user 0m23.204s
sys 0m15.103s
20
compressed recursive scp
scp -Crq matlab_install prashant@10.105.41.19:matlab_install_regular_scp
real 11m8.391s
user 3m48.508s
sys 0m25.200s
85
uncompressed recursive rsync
rsync -a matlab_install prashant@10.105.41.19:matlab_install_regular_rsync_unc
real 3m3.604s
user 0m26.709s
sys 0m21.664s
40
compressed recursive rsync
rsync -az matlab_install prashant@10.105.41.19:matlab_install_regular_rsync
real 4m11.651s
user 3m11.847s
sys 0m31.892s
90
uncompressed tar+ssh
tar -cf- matlab_install | ssh prashant@10.105.41.19 'tar -xf- -C ~/matlab_install_hack_unc'
real 2m59.706s
user 0m21.428s
sys 0m14.020s
20
compressed tar+ssh
tar -cf- matlab_install | gzip -f1 | ssh prashant@10.105.41.19 'tar -xzf- -C ~/matlab_install_hack_compr'
real 2m44.349s
user 2m7.709s
sys 0m18.114s
60

Conclusion

as seen from the timings, rsync and tar+ssh perform close-up; though tar+ssh beats rsync here.
On the other hand though, when updating the huge tree; rsync wins hands down - insane speedups!
scp is not to be used with more than a hundred files. period.

2 comments:

  1. whats the point of enabling compression if it is going to be SLOWer...

    ReplyDelete
  2. compression works nicely only when the files being copied are compressible; which is usually the case when copying large plaintext files.

    I guess the cost/time of compression is more than the saved transfer-time in this case.

    ReplyDelete