SUMMARY: Ethernet Gigabit throughput with UltraSparc 3 ?

From: DAUBIGNE Sebastien - BOR ( ) <>
Date: Fri Nov 28 2003 - 08:04:13 EST
As there were many various interesting answers, I will quote all of them.
Basically, I could expect near 100 Mbytes/s with Jumbo Frames under SunFire
(This does not take the disk backend into account, just memory-to-memory
transfer speaking...).
Thank you all.

Joe Fletcher's answers:
On a V880 8x900 we did some basic tests using ftp which gave us about
This put about a 10-15% overhead on  the machine (ie it takes about a whole
UltraIII cpu to drive the card in any serious sense). This is dumping data
from an FC array down separate HBAs to another array volume.
Just checked some old results from another site I used to run. Probably not
very interesting to you but on an Alpha ES40 4xEV6 serving a group of Intel
clients we managed to get about 80MB/s. The Alpha was linked into a 3COM
switch via gigabit with the clients each on a 100FD port on the same switch.
Each client was tranferring  a different set of files, some via ftp, some
via the SMB server software (ASU). We could get similar results using two
Alphas with memory filesystems mounted which allowed us to get the storage
out of the picture. Not representative of real world particularly but we
just wanted to see how fast it was capable of going. I suspect the file
caching helped quite a lot where the PC clients were concerned.
Christophe Dupre's answer:
What ethernet card do you have in your server ? Sun has at least two
chipsets used in gigabit cards: GEM (with interface as ge0) and Cassini (sun
gigaswift, ce0 interface).
The GEM is older and pretty much all the processing is done by the CPU, and
the throughput isn't that great. The Cassini is much better and offload some
processing (IP CRC and TCP CRC) to the card, yielding much better
Note that GEM is only 1000BaseSX, while Cassini does both fiber and copper.
What do you use to compute the throughput ? I use iperf and between two
servers (both ultrasparc -2 400MHz, both dual CPU), both having GEM-based
cards connected to a Cisco 4506 switch I get 85Mbit/s for a single
connection, and an aggregate of 94Mbit/s with about 40% kernel time
according to top. This is using an MTU of 1500 (the GEM and Cisco switch
don't do jumbo frames). The TCP Window size was 64KByte.
By comparison, iperf runs between a Sun ultraSPARC3 with a Sun gigaswift and
a Dell PowerEdge 2650 with a Broadcom 1000TX card connected using the same
Cisco catalyst and 48KByte TCP windows yield 480MBit/s.
So before upgrading the CPU you should make sure you have a card that
offloads the CPU like the gigaswift. Next, jumbo frames don't matter much -
support is not standardized, not much equipment supports it, and you can get
pretty good performance without.
I'm not sure how much the CPU speed is needed, though. I'll install a
gigaswift in an ultraSPARC2 soon, I can tell you the performance difference
Jason Santos's answer:
I would suspect that your bottleneck on the E10K would be the SBUS
interface, not CPU speed.  With a gem or GigaSwift PCI card in a 750MHz
6800, we get about 60MB/s over NFS with a single thread.  Raw UDP or TCP
throughtput would be much higher, although I never tested it.
Let me test now, stand by...
This is a quick test from a 4x750MHz 6800 to a 4x1200MHz V880 (no network
tuning, single thread):
ttcp-t: buflen2768, nbuf 48, align384/0, portP01  tcp  ->
ttcp-t: socket
ttcp-t: connect
ttcp-t: 1073741824 bytes in 23.59 real seconds = 44441.28 KB/sec +++
ttcp-t: 1073741824 bytes in 23.16 CPU seconds = 45275.30 KB/cpu sec
ttcp-t: 32768 I/O calls, msec/call = 0.74, calls/sec = 1388.79
ttcp-t: 0.1user 23.0sys 0:23real 98% 0i+0d 0maxrss 0+0pf 3756+261csw
ttcp-t: buffer address 0x74000

The fastest Gigabit transfers I have ever seen were from an IBM x345 (dual
Intel Xeon 2.4GHz) over NFS to a NetApp FAS960, I was able to get over
100MB/sec, which is 80% of the theoretical max of 125MB/sec.
Paul Theodoropoulos's  answer:
Sun's 'Rule of Thumb' from the UltraSPARC II era was that you should have
300Mhz of ultraSPARC II horsepower per gigabit adapter. That's 'dedicated'
horsepower - if you had one 300Mhz cpu and one gigabit adapter, you'd have
no horsepower to spare for your applications. In practice of course, the
gigabit gets throttled down and the horsepower shared. But i would expect
approximately the same performance requirements with ultrasparc III,
Alex Madden's answer:
JV's answer:

#2) Throughput may depend more on the underlying storage architecture's
ability to READ. You will get better with Hardware RAID 0/1 than software
RAID like Disksuite or VXVM.
#3) copper or optical gigE? I use optical, but I just got v240s last month
so I am beginning to experiment with their ce interfaces.  #4) On optical
ge, with 14 column Veritas stripes, on large-ish dbf files (1.5-2GB),
6x336Mhz cpus, I can get 45 MB/sec with 35% sys. I haven't had a chance to
tune and test my 10-12 cpu UltraS-II (optical) or 2 cpu UltraS-III v240
(copper ce) boxes.

Tim Chipman's answer:
You might want to use " ttcp " utility to test tcp bandwidth throughput.  It
is more likely to represent " best case scenario " throughput that is
in-keeping with statements like " gig-ether can do 100Mbytes/sec " :-)
we did a bit of testing here a while back, and I'm appending the info below
as a general reference, for what use it may be.
test boxes were,

athlon MP running either Solaris x86 OR linux
ultraSparcII running solaris8

note, based on my experience, it seems unlikely you will ever get " real
world data xfer " much above 50-55Mbytes/sec over gig-ether. " ttcp "
benchmarks are one thing, but real-world protocols are another.
NOTE: testing done here using two dual-athlon systems, identies as follows:
wulftest = redhat 8 (dual-1800mhz, 1 gig ram, 64-bit PCI)
wulf2 = redhat 8 (dual-2000mhz, 1 gig ram, 64-bit PCI)
thore = solaris8x86 (dual-2000mhz, 1 gig ram, 64-bit PCI)
(note - Wulf2 & Thore are actually the same system with 2 different HDDs to
boot the alternate OS'es)
ultra5 = 270mhz Ultra5 (nb, 32-bit PCI bandwidth only)

Gig-ether NICs being tested are all 64-bit PCI / Cat5 cards:
Syskonnect SK-9821
3Com 3C996B-T (BroadCom chipset)

(note, we had 2 x SK nics and 1 x 3com on-hand, so didn't test 3com<->3com
perfo rmance.)

Software being used for testing was (1) TTCP and (2) Netbackup (for info on
TTCP, visit the URL:
<>  )

Parameters tuned include Jumbo Frames (MTU of 1500 vs 9000) ; 
combinations of NI
C<-> NIC and system<->system

Connection between NICs was made with a crossover cable, appropriately wired
(al l strands) such that Gig-ether was operational.

Note these ##'s are NOT " comprehensive ", ie, NOT every combination of
tuneable p arameters has been attempted / documented here. Sorry about that.

Hopefully, " so
mething is better than nothing ".

[TTCP results]
SysKonnect <-> SysKonnect = 77 MB/s
*	Wulftest with Syskonnect (Redhat 8)
*	Thore with Syskonnect (Solaris x86)
*	Jumbo frames don't affect speed,
		but offload the systems by around 20-40% for CPU loading.
	SysKonnect <-> 3COM = 78 MB/s
*	Wulftest with Syskonnect (Redhat 8)
*	Wulf2 with 3com (Redhat 8)
*	MTU = 1500

	SysKonnect <-> 3COM = 97 MB/s
*	Wulftest with Syskonnect (Redhat 8)
*	Wulf2 with 3com (Redhat 8)
*	MTU = 9000

	ULTRA5 <-> Wulftest tests with TTCP:
	(SysKonnect <-> Syskonnect NICs)
	with JumboFrames:
*	25% CPU load on Ultra5, 29 MB/s

	without JumboFrames:
*	60% CPU load on Ultra5, 17 MB/s

	[Netbackup results]
	Large ASCII file (5 gigs) = 50 MB/s
*	Wulftest with SysKonnect (Redhat 8)
*	Thore with 3COM (Solaris x86)
*	MTU 1500

	System backup (OS files, binaries) = 11 MB/s
*	Wulftest with SysKonnect (Redhat 8)
*	Thore with 3COM (Solaris x86)
*	MTU 1500



Basic question is : What effective throughput can I expect on a Gigabit
Ethernet link with UltraSparc-3 CPU, with or without Jumbo Frame support,
with or without multithreaded transfer ?

I ask this because with UltraSparc-2 CPU (E10K) and GE link (without Jumbo
Frame support) we couldn't get more than :
-	15 Mbytes/s with monothreaded transfer
-	55 Mbytes/s with multithreaded transfer (the best rate was reached
with 10 threads)

(We measured application throughput, that is to say TCP throughput).

As you see the CPU overhead with 1500 MTU was so high (truss showed 80%
kernel), that we had to multithread the transfer to reach the best
throughput  (55 Mbytes/s).
Unfortunately we were far from the theoretical limit (100 Mbytes/s ?), even
if there were still CPU resources free (50%), and I can't determine if it
was caused by the small MTU, the poor US-2 throughput or both ?

I think that Jumbo Frame could increase the throughput and lower the CPU
overhead, but how much ?
Will the US-3 throughput help much ?
Is there any chance to reach the 100 Mbytes/s limit ?

Thanks for your feedback, I will summarize.
Sebastien DAUBIGNE
<>  - (+33)
SchlumbergerSema - SGS/DWH/Pessac
sunmanagers mailing list
sunmanagers mailing list
Received on Fri Nov 28 08:30:16 2003

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:24 EST