SUMMARY: (Ancient posting) Gig-Ether NIC CPU utilization on single CPU/SMP systems?

From: Tim Chipman <>
Date: Thu Mar 27 2003 - 12:23:52 EST
Hi all. This is insanely overdue, but since I have some answers from a 
bit of in-house testing, plus a few replies from way back when, I 
thought it was a disservice to the list NOT to submit any kind of summary.

Many thanks to, (no particular order)

Joe Fletcher
Jed Dobson
Steven Haywood

General concensus:

-> Probably an SMP box with "fast" CPUs is a "good thing" to feed gig 
ether properly, although it will depend (of course) on use requirements, 
expectations, etc. [ie, a "FAST CPU" appears to be a requirement to 
"saturate" the data stream to a Gig-ether NIC].  For "lower-demand use" 
(ie, feeding backups) a single-CPU system probably will be OK. Note that 
in such cases Jumbo-Frame support is especially important (on NICs and 
switches), since it appears to significantly reduce CPU loading when 
feeding the gig-ether nic.

-> note that in general, netbackup performance/throughput over gig-ether 
(ie, when backing up "many small files") is WAY below "theoretical 
throughtput capacity" as measured with "TTCP" between the same systems. 
This throughput is, however, significantly higher than NBU perf. between 
same systems when using 100mbit ether instead of gig-ether.

->  See (way) below for text of replies, and also a URL where a great 
white-paper can be retrieved  ("Understanding Gigabit Ether performance 
on SunFire systems) which is pertinent to this topic.

-> Since I made my posting back last fall, we've bought a couple of 
Gig-ether NICs and done some testing here in house. The summary of my 
results follow immediately below.

Again, many thanks to everyone for their help.

--Tim Chipman

====Results from our tests with Gig-Ether NICS===

NOTE: testing done here using two dual-athlon systems, identies as follows:

wulftest = redhat 8 (dual-1800mhz, 1 gig ram, 64-bit PCI)

wulf2 = redhat 8 (dual-2000mhz, 1 gig ram, 64-bit PCI)

thore = solaris8x86 (dual-2000mhz, 1 gig ram, 64-bit PCI)
     (note - Wulf2 & Thore are actually the same system with
     2 different HDDs to boot the alternate OS'es)

ultra5 = 270mhz Ultra5 (nb, 32-bit PCI bandwidth only)

Gig-ether NICs being tested are all 64-bit PCI / Cat5 cards:

     Syskonnect SK-9821
     3Com 3C996B-T (BroadCom chipset)

(note, we had 2 x SK nics and 1 x 3com on-hand, so didn't test 
3com<->3com performance.)

Software being used for testing was (1) TTCP and (2) Netbackup
(for info on TTCP, visit the URL: )

Parameters tuned include Jumbo Frames (MTU of 1500 vs 9000) ; 
combinations of NIC<-> NIC and system<->system

Connection between NICs was made with a crossover cable, appropriately 
wired (all strands) such that Gig-ether was operational.

Note these ##'s are NOT "comprehensive", ie, NOT every combination of 
tuneable parameters has been attempted / documented here. Sorry about 
that. Hopefully, "something is better than nothing".

[TTCP results]

SysKonnect <-> SysKonnect = 77 MB/s
     - Wulftest with Syskonnect (Redhat 8)
     - Thore with Syskonnect (Solaris x86)
     - Jumbo frames don't affect speed,
     but offload the systems by around 20-40% for CPU loading.

SysKonnect <-> 3COM = 78 MB/s
     - Wulftest with Syskonnect (Redhat 8)
     - Wulf2 with 3com (Redhat 8)
     - MTU = 1500

SysKonnect <-> 3COM = 97 MB/s
     - Wulftest with Syskonnect (Redhat 8)
     - Wulf2 with 3com (Redhat 8)
     - MTU = 9000

ULTRA5 <-> Wulftest tests with TTCP:
(SysKonnect <-> Syskonnect NICs)

with JumboFrames:
     -25% CPU load on Ultra5, 29 MB/s

without JumboFrames:
     -60% CPU load on Ultra5, 17 MB/s

[Netbackup results]

Large ASCII file (5 gigs) = 50 MB/s
     - Wulftest with SysKonnect (Redhat 8)
     - Thore with 3COM (Solaris x86)
     - MTU 1500

System backup (OS files, binaries) = 11 MB/s
     - Wulftest with SysKonnect (Redhat 8)
     - Thore with 3COM (Solaris x86)
     - MTU 1500

[compare: typically, we get ~ 5-6 MB/s for NBU performance between these 
systems if using 100mbit ether to do a backup of "Misc OS files/Binaries"]

===Replies to my original posting/query:====

GREAT reference doc to read:
I've implemented a couple of Sun GigabitEthernet/2 cards on some E4500s.
When the cards are running at full whack (around 30 meg/second) usage on 
one of the CPUs hits 100% - from servicing interrupts put out by the 
NIC. Virtual Adrian reports lots of mutex contention (I think it's an 
old version of the monitoring script). Most of the spinlocks are on a 
single CPU, caused by ge_read.  From this, I'd also say go with faster 
CPUs rather than more CPUs, as solaris doesn't share interrupt load 
across multiple CPUs
Well.....I have run GigE cards in machines from single CPU Sun Blade 
1000s to F12K without any problems. Have I actually done throughput 
tests? Well.....not not really. I am mostly using GigE for backup 
servers where tape throughput is the big issue, keeping the tapes 
spinning and system bus free to allow a tape stream is the goal.  That 
said, on smaller servers I have not seen excessive CPU usage from 
gigabit devices.
FWIW using SUNs gigaswift cards in V880s we generally
see about 10-15% CPU overhead when driving the card
at any sort of level. That's on an 8 way box. I'd want at
least a 2 way server (of whatever type) to do what you
are trying to do.



Hi folks,

A general request for feedback from anyone who has deployed Gig-Ether
NICs (either SUN or 3rd Party) on either single CPU or SMP SunSparc

-> in order to maintain a "solid data stream" (assume, ,<scenario a>
15megs per second, <scenario b> - 30 megs per second) -- do you
typically observe significant CPU loading (of ?? amount??) that appears
to correlate purely to feeding data to the NIC ?

-> From what I have read, it seems that Gig-Ether NICs fall (broadly!)
into two categories:

-those which have more simple hardware, and provide no "CPU offload" for
TCP stack, "data pump" of information into the GigE pipe

-those which DO have ASIC / specific hardware on the NIC to facilitate
some CPU offloading, to alleviate CPU loading issues when feeding the
GigE pipe.

Typically, from what I can tell, the "simple" GigE nics are "cheaper"
while the ASIC-offload nics are "more expensive" (this is NOT a concern
for me, clearly - my aim is to get an idea of performance / cpu loading

I'm inquiring on this theme specifically because we're contemplating
scaling our netbackup server from Dual-DLT35/70 drive-based-robot to a
Dual-SuperDLT 160/320 drive-based robot. The current DLT35/70 drives
officially support ~10 megs/second data stream to tape (per drive), and
we can feed easily 5 - 8 megs / second, depending on the nature of the
data being streamed, using a standard 100mbit ether NIC.

Feeding two SuperDLT drives @ 16(up to 32?) megs per second (in
parallel!) clearly requires a significantly larger pipe than we get with
typical 100mbit ether, hence these Gig-Ether questions.

Our current NBU server is a single-CPU box ; also - hence these scaling
questions WRT CPU loading footprint from streaming lots of data through
GigEther nics.

If anyone has specific comments in this vein for <any particular Gig
Ether NICs - from any vendor> -- it certainly is very welcome.

As always, I'll post a summary.


--Tim Chipman
sunmanagers mailing list
Received on Thu Mar 27 12:31:08 2003

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:07 EST