rsync vs scp vs ssh + tar

Many will agree that copying large directory trees with many files of varying sizes and compressibility from one linux box to another can be such a caterpillar; whereas surprisingly copying huge files is relatively faster. The primary reason being that as the number of files (actual files+directories) increases, the file system and kernel overheads add up to the network slowness.

Here are some interesting results from a small experiment I performed -

The same data was copied between same machine with similar network conditions and loads using different methods with and without compression. Both fedora installations are new and home partitions almost empty (I have no idea at the moment what would happen if they were somewhat filled). 2 dry runs were carried out before actual tests to give cache benefit to all methods.

Data being copied : 1.6GB worth matlab2008a(unix) installation - contains bunch of avi videos in megabytes, moderate number of jar files and many small .m files.

source machine:
Pentium D 820@2.8GHz, 2GB DDR2 dual channel@667, Intel 946 mobo, on board SATA controller; Samsung 160GB SATA hard disk drive@7200rpm and 8M cache, Linux f10 2.6.27.12-170.2.5.fc10.i686,
35GB ext3 source partition (fresh).

target machine:
Pentium core2 duo E4500@2.2GHz, 2GB DDR2 dual channel@667, Intel 965 mobo, on board SATA controller; Samsung 160GB SATA hard disk drive@7200rpm and 2M cache, Linux f10 2.6.27.12-170.2.5.fc10.i686,
100GB ext3 copy-to partition (fresh).

Results
Method
command
Timing avg. CPU util.
uncompressed recursive scp
scp -rq matlab_install prashant@10.105.41.19:matlab_install_regular_scp_unc
real 9m54.554s
user 0m23.204s
sys 0m15.103s
20
compressed recursive scp
scp -Crq matlab_install prashant@10.105.41.19:matlab_install_regular_scp
real 11m8.391s
user 3m48.508s
sys 0m25.200s
85
uncompressed recursive rsync
rsync -a matlab_install prashant@10.105.41.19:matlab_install_regular_rsync_unc
real 3m3.604s
user 0m26.709s
sys 0m21.664s
40
compressed recursive rsync
rsync -az matlab_install prashant@10.105.41.19:matlab_install_regular_rsync
real 4m11.651s
user 3m11.847s
sys 0m31.892s
90
uncompressed tar+ssh
tar -cf- matlab_install | ssh prashant@10.105.41.19 'tar -xf- -C ~/matlab_install_hack_unc'
real 2m59.706s
user 0m21.428s
sys 0m14.020s
20
compressed tar+ssh
tar -cf- matlab_install | gzip -f1 | ssh prashant@10.105.41.19 'tar -xzf- -C ~/matlab_install_hack_compr'
real 2m44.349s
user 2m7.709s
sys 0m18.114s
60

Conclusion

as seen from the timings, rsync and tar+ssh perform close-up; though tar+ssh beats rsync here.
On the other hand though, when updating the huge tree; rsync wins hands down - insane speedups!
scp is not to be used with more than a hundred files. period.

UUID business

Use *nix? Use many distributions at a time? Use many distributions on many disks?

Managing distributions installed on multiple disks can be a real pain in the a**, say you have fedora10 installed on one SATA disk, and Debian on other disk. Now, can you confidently tell which is the system drive? It gets really messy.

Have a portable hard drive you want to be mounted at a fixed place when attached? If there is another USB drive already attached, there is no way you can tell whether the new one is sdc or sdd.

UUID comes to rescue. Though not supported nor understood by the kernel, many distributions provide tools in initrd which can work with UUIDs. Every partition has a Universally Unique IDentifier (UUID), which can be used as a globally unique name for a partition.

Finding UUID of a partition


This command needs 'udev' package. It comes pre-installed on many modern Linux distributions. If not, please install it following your distribution's guidelines.

/lib/udev/vol_id --uuid /dev/sdaXX

replacing XX by the partition number.

Using UUIDs

  1. Painless booting
    UUID comes in handy when you have a separate boot and / partitions. In this case, the kernel and initrd resides on the boot partition, while the root, is on / partition. Naturally, the kernel needs to be told which partition to use as root.

    say the root partition is sda8, then

    # /lib/udev/vol_id --uuid /dev/sda8
    b036863a-2846-4f57-a6db-e7716f5d903c


    or use


    # udevadm info --query=all --name=/dev/sda8 | grep UUID | sed q | cut -d'=' -f2
    b036863a-2846-4f57-a6db-e7716f5d903c


    This UUID can be used in the kernel parameter. Open up menu.lst of grub (or lilo.conf if you still use LiLo), and make the kernel entry look something like

    kernel vmlinuz-2.6.27.9-159.fc10.i686 ro root=UUID=b036863a-2846-4f57-a6db-e7716f5d903c


    Now you can safely boot without worrying about which drive has the boot partition.

  2. Pseudo-permanent mountpoint settings

    Continuing from the USB disk example, say the partition on the USB drive has UUID 51e3a299-68f3-466f-86ac-428c60420621, an entry can be added into /etc/fstab which looks like

    UUID=
    ba92ef0d-bb4e-4632-bf22-133d9d3fa1f4 / ext3 defaults 1 1
    UUID=b036863a-2846-4f57-a6db-e7716f5d903c /home ext3 defaults 1 2
    UUID=51e3a299-68f3-466f-86ac-428c60420621 /media/thumbdrive auto defaults 0 0

    Now, the USB thumbdrive will always be mounted at /media/thumbdrive :)
Adios.
--
prashant

Directx 11 (10) benchmark - comparison with Windows server STD 2008's directx 10

I am trying out both the beta of Windows 7 (build 7000) and release of Windows server 2008 (standard edition). Since it is on the same machine, naturally benchmark comparison was possible and made sense.

Specs :
Windows 7 - Installed on a defragmented 40gig logical partition of 7200SATA disk, which still has other stuff and has 12 gigs free. Regular install, default drivers for sound (doesn't matter in this case).

Windows server 2008 - Installed on a defragmented 20gig primary partition of same disk, which has 5 gigs free. Regular install, Google's drivers for sound (again, doesn't matter in this case).

PC config :
Pentium D 820 (2.8GHz) (I know it is crappy... not a gamer's processor and all, but that's what I've got)
Intel 946GZIS mobo (PCIE 16x), 667MHz FSB.
2Gig (dual channel) RAM.
NVidia 8800GS, factory OC at 650MHz/mem 950MHz, a dx10 card.
Driver version 181.20 (used the same binary on both installations).
(again, that's a budget card; but it is sufficient for me)

3DMark Vantage settings - res 1280x1024, no AA
everything else Extreme, all tests were conducted.

Windows 7 (dx11)
GPU score = 4337.75
CPU score = 17495.47
3DMarks = 5342.16

Windows server 2008 std (dx10)
GPU score = 3975.24
CPU score = 17995.19
3DMarks = 0 (!!)

The demos visually ran faster on DX11 (though I do not have the FPS numbers at the moment).

The GPU scores for DX11 show 9.1% improvement over DX10. In all, windows 7 definitely is worth giving a try IMO (provided I can afford it OR my school has MSDNAA agreement for it). It has other cool 'features' too, of which I doubt the usefulness.

Please drop a comment if you have any suggestions; since as you can see, I am new to graphics benchmarks :)