SUMMARY: gigabit ethernet performance - Additional information

From: Kevin Buterbaugh <Kevin.Buterbaugh_at_lifeway.com>
Date: Mon Jun 11 2001 - 12:03:10 EDT
Greetings All,

     As promised, here's my summary to our gigabit ethernet performance
issues.  First of all, a big thank you to all who took the time to try to
help me out.  So that this post does not approach the length of "War and
Peace," I'm not going to include each and every response I received, but
the following people all tried to help:  Blake Matheny, Sean Berry, Jason
Grove, Danny Johnson, Marcelino Mata, John Marrett, Steve Hastings, Robert
Johannes, Elizabeth Lee, Jim Kerr, Gary Franczyk, Derrick Daugherty, Ben
Strother, Siddhartha Jain, Richard Skelton, Fedor Gnuchev, Sergio Gelato,
Marco Greene, Jim Ennis, Nelson Caparrosso, Anand Chouthai, Ying Xu, Thomas
Carter, Al Hopper Sam Horrocks, Adrian Saidac, Kevin Amorin, Gary Mansell,
Walter Weber, and Arturo Bernal.

     The bottom line:  make sure you're using the right tool for the job!
;-)  We did not actually have a performance problem; we were simply not
pushing the gigabit ethernet with the tools we were using.  Please note
that I tried rcp, ftp, and nfs for my copies.  None of them produced more
than 17 Mbytes / second, despite the fact that I was making sure that I was
avoiding any disk I/O and only copying from memory on Host1 to memory on
Host2 (and vice versa).  Also note that despite the claim made by several
list members, in our case rcp consistently produced better results than
either ftp or nfs.

     The right tool for the job?  ttcp (with multiple streams).  At the
suggestion of Fedor Gnuchev (who was kind enough to even send me a compiled
binary of ttcp for Solaris 2.8) and others on the list, I tried doing some
testing with ttcp.  Initially, I dismissed it, as the version I was using
(downloaded from playground.sun.com) was not producing any better results
than rcp.  In addition, it either did not support multiple streams or I
simply couldn't figure out how to get it to use multiple streams (it didn't
seem to recognize the "-s" option).

     However, the Sun engineer I was working with on the case I had open
sent me a copy of ttcp set up to use 5 streams by default.  When I ran it I
saw drastically improved numbers.  I'm including the actual results of my
test at the end of this e-mail, but I saw ~34.2 Mbyte / second going from
the E250 to the UE6000 and ~40.7 Mbyte / second going from the UE6000 to
the E250!  From everything I've read and been told, those are some very
good numbers for gigabit ethernet (i.e., you're never going to get 70 Mbyte
/ second out of gigabit ethernet in the real world).

     As far as parameter tuning is concerned, the only parameters Sun
recommended tuning were tcp_xmit_hiwat (Solaris 2.8 default:  16384) and
tcp_recv_hiwat (Solaris 2.8 default:  24576).  We increased both of them to
65536.  However, we also tuned tcp_max_buf, increasing it from it's default
of 1048576 to it's maximum value of 1073741824.

     Please note that any parameters you change with ndd will get set back
to their defaults after a reboot.  One way to make them permanent, as
suggested by Richard Skelton, is to create a new startup script (say,
/etc/rc2.d/S99gb_ndd) and add the ndd commands there.

     As I mentioned in my original post, the purpose behind our bringing in
gigabit ethernet in the first place is because Host1 and Host2 are
connected via UltraSCSI to one LTO drive each in our new StorageTek tape
library.  We plan on backing up other Sun servers over the network via
these hosts.  All the other servers with one exception (the one exception
will also be on gigabit ethernet) will remain for the time being on
switched fast ethernet.

     We have done some limited testing of backing these servers up with
Veritas NetBackup.  What we've found, not surprisingly, is that the more
servers you attempt to back up simultaneously, the better the performance
you get.  We did a test of just a couple of servers and saw throughput of
around 12 Mbyte / second (about what you'd expect).  We then did a test
backing up 6 of the aformentioned servers and saw throughput of over 20
Mbyte / second.  As the best performance we've seen out of the LTO drives
backing up local data on Host1 / Host2 is 24 Mbytes / second, we're pretty
content at this point with the gigabit ethernet.

     Again, my thanks to all for their help...

Kevin Buterbaugh
LifeWay

"Faster reboots are Microsoft's idea of 'high availability.'" - overheard
at SUPerG

==================================
ttcp output on the E250:

Host2# ./ttreceive
ttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5001, sockbufsize=65535  tcp
ttcp-r: ttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5003, sockbufsize=65535  tcp
socket
ttcpttcp-r: rcvbuf-r
: socket
ttcp-r: rcvbuf
ttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5004, sockbufsize=65535  tcp
ttcp-r: socket
ttcp-r: rcvbuf
ttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5005, sockbufsize=65535  tcp
ttcp-r: socket
ttcp-r: rcvbuf
ttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5006, sockbufsize=65535  tcp
ttcp-r: socket
ttcp-r: rcvbuf
ttcp-r: accept from 172.16.46.26
ttcp-r: accept from 172.16.46.26
ttcp-r: accept from 172.16.46.26
ttcp-r: accept from 172.16.46.26
ttcp-r: accept from 172.16.46.26
ttcp-r: 134215680 bytes in 12.00 real seconds = 85.33 Mbit/sec +++
ttcp-r: 5720 I/O calls, msec/call = 2.15, calls/sec = 476.67
ttcp-r: 0.0user 2.5sys 0:12real 21%
ttcp-r: 134215680 bytes in 13.00 real seconds = 78.77 Mbit/sec +++
ttcp-r: 6114 I/O calls, msec/call = 2.18, calls/sec = 470.31
ttcp-r: 0.0user 2.4sys 0:13real 19%
ttcp-r: 134215680 bytes in 13.00 real seconds = 78.77 Mbit/sec +++
ttcp-r: 5753 I/O calls, msec/call = 2.31, calls/sec = 442.54
ttcp-r: 0.1user 2.3sys 0:13real 19%
ttcp-r: 134215680 bytes in 13.00 real seconds = 78.77 Mbit/sec +++
ttcp-r: 5645 I/O calls, msec/call = 2.36, calls/sec = 434.23
ttcp-r: 0.0user 2.4sys 0:13real 19%
ttcp-r: 134215680 bytes in 13.00 real seconds = 78.77 Mbit/sec +++
ttcp-r: 6416 I/O calls, msec/call = 2.07, calls/sec = 493.54
ttcp-r: 0.0user 2.5sys 0:13real 19%

Host2# ./ttsend 172.16.46.26
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5001, sockbufsize=65535  tcp  -> 172.16.46.26
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5003, sockbufsize=65535  tcp  -> 172.16.46.26
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5004, sockbufsize=65535  tcp  -> 172.16.46.26
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5005, sockbufsize=65535  tcp  -> 172.16.46.26
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5006, sockbufsize=65535  tcp  -> 172.16.46.26
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: 134215680 bytes in 15.00 real seconds = 68.27 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 7.50, calls/sec = 136.53
ttcp-t: 0.0user 2.0sys 0:15real 13%

real       14.4
user        0.0
sys         2.0
ttcp-t: 134215680 bytes in 14.00 real seconds = 73.14 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 7.00, calls/sec = 146.29
ttcp-t: 0.0user 1.7sys 0:14real 12%

real       14.6
user        0.0
sys         1.8
ttcp-t: 134215680 bytes in 15.00 real seconds = 68.27 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 7.50, calls/sec = 136.53
ttcp-t: 0.0user 1.9sys 0:15real 13%

real       15.1
user        0.0
sys         1.9
ttcp-t: 134215680 bytes in 16.00 real seconds = 64.00 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 8.00, calls/sec = 128.00
ttcp-t: 0.0user 1.9sys 0:16real 12%

real       15.1
user        0.0
sys         1.9
ttcp-t: 134215680 bytes in 15.00 real seconds = 68.27 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 7.50, calls/sec = 136.53
ttcp-t: 0.0user 2.1sys 0:15real 14%

real       15.5
user        0.0
sys         2.1

==================================
ttcp output on the UE6000:

Host1# ./ttsend 172.16.46.17
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5001, sockbufsize=65535  tcp  -> 172.16.46.17
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5003, sockbufsize=65535  tcp  -> 172.16.46.17
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5004, sockbufsize=65535  tcp  -> 172.16.46.17
ttcp-t: socket
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: sndbuf
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5006, sockbufsize=65535  tcp  -> 172.16.46.17
ttcp-t: socket
ttcp-t: sndbuf
ttcp-t: buflen=65535, nbuf=2048, align=16384/0, port=5005, sockbufsize=65535  tcp  -> 172.16.46.17
ttcp-t: socketttcp-t
: nodelay
ttcp-tttcp: nodelay
-t: sndbuf
ttcp-t: connectttcp-t: connect

ttcp-t: nodelay
ttcp-t: connect
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: nodelay
ttcp-t: connect
ttcp-t: 134215680 bytes in 12.00 real seconds = 85.33 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 6.00, calls/sec = 170.67
ttcp-t: 0.0user 3.4sys 0:12real 29%

real       12.2
user        0.0
sys         3.5
ttcp-t: 134215680 bytes in 12.00 real seconds = 85.33 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 6.00, calls/sec = 170.67
ttcp-t: 0.0user 4.0sys 0:12real 33%

real       12.4
user        0.0
sys         4.0
ttcp-t: 134215680 bytes in 13.00 real seconds = 78.77 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 6.50, calls/sec = 157.54
ttcp-t: 0.0user 3.9sys 0:13real 30%

real       12.6
user        0.0
sys         3.9
ttcp-t: 134215680 bytes in 13.00 real seconds = 78.77 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 6.50, calls/sec = 157.54
ttcp-t: 0.0user 3.7sys 0:13real 29%

real       13.2
user        0.0
sys         3.7
ttcp-t: 134215680 bytes in 13.00 real seconds = 78.77 Mbit/sec +++
ttcp-t: 2048 I/O calls, msec/call = 6.50, calls/sec = 157.54
ttcp-t: 0.0user 3.7sys 0:13real 29%

real       13.3
user        0.0
sys         3.8

Host1# ./ttreceive
ttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5001, sockbufsize=65535  tcp
ttcp-r: socket
ttcp-r: rcvbuf
ttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5003, sockbufsize=65535  tcp
ttcp-r: socket
ttcp-r: rcvbuf
ttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5004, sockbufsize=65535  tcp
ttcp-r: socket
ttcpttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5005, sockbufsize=65535  tcp
-r: rcvbuf
ttcp-r: socket
ttcp-r: buflen=65535, nbuf=2048, align=16384/0, port=5006, sockbufsize=65535  tcp
ttcp-r: rcvbuf
ttcp-r: socket
ttcp-r: rcvbuf
ttcp-r: accept from 172.16.46.17
ttcp-r: accept from 172.16.46.17
ttcp-r: accept from 172.16.46.17
ttcp-r: accept from 172.16.46.17
ttcp-r: accept from 172.16.46.17
ttcp-r: 134215680 bytes in 15.00 real seconds = 68.27 Mbit/sec +++
ttcp-r: 17785 I/O calls, msec/call = 0.86, calls/sec = 1185.67
ttcp-r: 0.1user 4.8sys 0:15real 33%
ttcp-r: 134215680 bytes in 14.00 real seconds = 73.14 Mbit/sec +++
ttcp-r: 17174 I/O calls, msec/call = 0.83, calls/sec = 1226.71
ttcp-r: 0.1user 5.0sys 0:14real 37%
ttcp-r: 134215680 bytes in 15.00 real seconds = 68.27 Mbit/sec +++
ttcp-r: 17320 I/O calls, msec/call = 0.89, calls/sec = 1154.67
ttcp-r: 0.1user 4.5sys 0:15real 31%
ttcp-r: 134215680 bytes in 16.00 real seconds = 64.00 Mbit/sec +++
ttcp-r: 16143 I/O calls, msec/call = 1.01, calls/sec = 1008.94
ttcp-r: 0.1user 4.8sys 0:16real 31%
ttcp-r: 134215680 bytes in 15.00 real seconds = 68.27 Mbit/sec +++
ttcp-r: 16049 I/O calls, msec/call = 0.96, calls/sec = 1069.93
ttcp-r: 0.1user 4.5sys 0:15real 31%

---------------------- Forwarded by Kevin Buterbaugh/Nashville/BSSBNOTES on
06/11/2001 09:44 AM ---------------------------

Sent by:  sunmanagers-admin@sunmanagers.org


To:   sunmanagers@sunmanagers.org
cc:

Subject:  gigabit ethernet performance - Additional information



Greetings again all,

     First of all, thank you to all of you who have taken the time to reply
to my original post (which I've included below).  I've gotten about two
dozen reponses so far, and I'll be sure to thank each of you by name when I
finally do get to post a summary.

     Unfortunately, this is not a summary but some additional information
on my problem, which so far has not been solved.  First off, several of you
suggested using something other than rcp for testing the throughput.  I
have tried both ftp and nfs and have achieved worse results with each of
them (12 Mbyte/sec with ftp and 15 Mbyte/sec with nfs versus almost 17
Mbyte/sec with rcp).

     Several of you suggested to try swapping the cables out in case one
was bad.  We effectively did that when we directly connected the two boxes
with a different cable (note that also eliminates the switch as the
problem).  Again, 16.something Mbyte/sec.

     Another suggestion was to up the MTU on the interface.  Unfortunately,
on Solaris at least you cannot increase the MTU beyond the default of 1500
(or at least I got an error when I tried and was told by Sun that you
can't; if you know a way, I'd like to hear from you).

     I received several suggestions of kernel parameter and / or ndd
settings to modify.  I have tried all the suggestions I have received.
16.something Mbyte/sec.

     I spoke with my local Sun SSE.  He suggested that I might possibly
have a bottleneck on Host1, the UE6000.  The CPU's, while there are 14 of
them, are only 250's and it is an Sbus system.  I did another test while
simultaneously running "mpstat 5" on both hosts.  Host2 (the E250 with 2 x
400MHz CPU's) is currently not being used for anything (it'll be the
Veritas NetBackup master server next week).  During the rcp test, CPU idle
time dropped from 99-100% on each CPU down to as low as about 20% on each
CPU (but never lower).  On Host1, I had a couple of CPU's that were pegged,
100% busy.  However, as that is our main production box, I don't know
whether it was my test or something else that pegged the CPU's.  Therefore,
this weekend when the system is as idle as it ever gets, I'll dial in and
repeat the test.

     Speaking of my test; the reason why I'm transferring ~750MB of data
from /tmp on Host1 to /tmp on Host2 is because I can make that much data
fit in memory.  There's no disk I/O involved.  That does make a difference.
Copying from /var/tmp on Host1 to /var/tmp on Host2 drops the throughput to
12 Mbyte/sec.  ftp'ing enough data so that it won't all fit in memory
(about 4 GB worth) dropped the throughput to less than 4 Mbyte/sec.

     I've opened a case with Sun on this.  One of the things suggested to
me to do by the Sun engineer, and also buy a list member as well, is to do
a "netstat -k ge0" on both systems.  Here's the results, first for Host1,
then for Host2:

Host1# netstat -k ge0
ge0:
ipackets 3088833 ierrors 0 opackets 9192702 oerrors 0 collisions 0
ifspeed 1000000000   rbytes 1774338662 obytes 4250094359 multircv 221774
multixmt 0 brdcstrcv 262442
brdcstxmt 93 norcvbuf 0 noxmtbuf 0 inits 6 mac_mode 2 xmit_dma_mode 6
rcv_dma_mode 4 nocarrier 5 nocanput 0 allocbfail 0 pause_rcv_cnt 0
pause_on_cnt 0 pause_off_cnt 0 pause_time_cnt 0 txmac_urun 0
txmac_maxpkt_err 0 excessive_coll 0 late_coll 0 first_coll 0
defer_timer_exp 0 peak_attempt_cnt 0 jabber 0 no_tmds 0
txinits 0 drop 0 rxinits 0 no_free_rx_desc 0 rx_overflow 0
rx_hang 0 rxtag_error 0 rx_align_err 0 rx_crc_err 0 rx_length_err 0
rx_code_viol_err 0 pci_badack 0 pci_dtrto 0 pci_data_parity_err 0
pci_signal_target_abort 0 pci_rcvd_target_abort 0 pci_rcvd_master_abort 0
pci_signal_system_err 0 pci_det_parity_err 0 pci_bus_speed 0
pci_bus_width 0 tx_late_error 0 rx_late_error 0 slv_parity_error 0
tx_parity_error 0 rx_parity_error 0 slv_error_ack 0 tx_error_ack 0
rx_error_ack 0 ipackets64 3088833 opackets64 9192702 rbytes64 1774338662
obytes64 12840028951
align_errors 0 fcs_errors 0   sqe_errors 0 defer_xmts 0
ex_collisions 0 macxmt_errors 0 carrier_errors 0 toolong_errors 0
macrcv_errors 0 ge_csumerr 0 ge_queue_cnt 0 ge_queue_full_cnt 0

Host2# netstat -k ge0
ge0:
ipackets 9651942 ierrors 0 opackets 2586787 oerrors 0 collisions 0
ifspeed 1000000000   rbytes 4233517004 obytes 1712088274 multircv 235805
multixmt 0 brdcstrcv 278645
brdcstxmt 43 norcvbuf 0 noxmtbuf 0 inits 6 mac_mode 2 xmit_dma_mode 6
rcv_dma_mode 4 nocarrier 5 nocanput 0 allocbfail 0 pause_rcv_cnt 0
pause_on_cnt 0 pause_off_cnt 0 pause_time_cnt 0 txmac_urun 0
txmac_maxpkt_err 0 excessive_coll 0 late_coll 0 first_coll 0
defer_timer_exp 0 peak_attempt_cnt 0 jabber 0 no_tmds 0
txinits 0 drop 238173 rxinits 0 no_free_rx_desc 0 rx_overflow 0
rx_hang 0 rxtag_error 0 rx_align_err 0 rx_crc_err 0 rx_length_err 0
rx_code_viol_err 0 pci_badack 0 pci_dtrto 0 pci_data_parity_err 0
pci_signal_target_abort 0 pci_rcvd_target_abort 0 pci_rcvd_master_abort 0
pci_signal_system_err 0 pci_det_parity_err 0 pci_bus_speed 33
pci_bus_width 0 tx_late_error 0 rx_late_error 0 slv_parity_error 0
tx_parity_error 0 rx_parity_error 0 slv_error_ack 0 tx_error_ack 0
rx_error_ack 0 ipackets64 9651942 opackets64 2586787 rbytes64 12823451596
obytes64 1712088274
align_errors 0 fcs_errors 0   sqe_errors 0 defer_xmts 0
ex_collisions 0 macxmt_errors 0 carrier_errors 0 toolong_errors 0
macrcv_errors 0 ge_csumerr 0 ge_queue_cnt 0 ge_queue_full_cnt 0

     Note that the number of drops on Host1 is 0 while it's over 200,000 on
Host2 (I've done the rcp's / ftp's / nfs copy's both ways)!  Sun is
checking on that, but in the meantime, does anyone on the list know if
that's significant?  Could I possibly have a bad PCI gigabit ethernet card
in Host2?  Or is there some other explanation for the drops?

     Another responder asked for the "ifconfig -a" output on both hosts, so
here it is, again first for Host1, then for Host2 (obviously, we're
NAT'ing):

Host1# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 172.16.44.26 netmask ffffff00 broadcast 172.16.44.255
        ether 8:0:20:7d:6c:3e
ge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
        inet 172.16.46.26 netmask ffffff00 broadcast 172.16.46.255
        ether 8:0:20:7d:6c:3e

Host2# ifconfig -a
lo0: flags=1000849<UP,LOOPBACK,RUNNING,MULTICAST,IPv4> mtu 8232 index 1
        inet 127.0.0.1 netmask ff000000
hme0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 2
        inet 172.16.47.17 netmask ffffff00 broadcast 172.16.47.255
        ether 8:0:20:b1:ad:59
ge0: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 1500 index 5
        inet 172.16.46.17 netmask ffffff00 broadcast 172.16.46.255
        ether 8:0:20:b1:ad:59

     Again, I apologize to the list for the length of my post(s).  Just
trying to provide all the pertinent info.  Thanks again to everyone who's
tried to help so far.  I really do appreciate it.  For those of you who've
requested that I make sure to post a summary, I promise that I will.  It
may be a while, but rest assured, when this is solved, a summary will
follow...

Kevin Buterbaugh
LifeWay

"Fast, cheap, safe:  pick any two." - Brian Wong on RAID levels

---------------------- Forwarded by Kevin Buterbaugh/Nashville/BSSBNOTES on
06/01/2001 02:25 PM ---------------------------

Sent by:  sunmanagers-admin@sunmanagers.org


To:   sunmanagers@sunmanagers.org
cc:

Subject:  gigabit ethernet performance



Greetings All,

     We recently purchased Sun gigabit ethernet adapters for 2 of our Sun
servers.  "Host1" is a UE6000 (14 x 250 MHz CPU's, 9.5 GB RAM) and "Host2"
is a E250 (2 x 400 MHz CPU's, 2 GB RAM), both running Solaris 2.8 with the
recommended patch clusters installed.

     We added the cards, installed the Sun gigabit ethernet 3.0 software
and patch, plugged up the cables to our 3Com SuperStack gigabit ethernet
switch, plumbed the interfaces, and ifconfig'ed the interfaces.  They are
on a separate subnet from the hme interfaces.  I assigned the ge0
interfaces the hostnames "Host1ge" and "Host2ge."  Note that no one else is
using the gigabit ethernet interfaces besides us.  Also note that after
configuration I had messages in my /var/adm/messages file that the
auto-negotiated 1000Mbps full duplex link was up on both hosts.

     We then tested the transfer rate between the 2 hosts by copying a ~750
MB file to /tmp on Host1, cd'ing to /tmp on Host2 (of course, the whole
purpose of doing the copy from /tmp on Host1 to /tmp on Host2 is so that
it's memory based, i.e. the test is not slowed down by transferring data to
/ from disk), then executing the following command on Host2:  date; rcp
Host1ge:/tmp/750mbfile .; date.  I realize this is not the most accurate
test possible, but we're just wanting to get a rough idea of the transfer
rate we're getting.

     We were very disappointed when the copy took about 49 seconds, which
works out to slightly less than 17Mbyte / second.  I have spent a good part
of the past 2 days trying to figure out why we're not seeing a higher
transfer rate.  I have searched SunSolve, the Sun Managers archive,
docs.sun.com, etc.  Another Sun Managers list member sent me some InfoDoc's
from Sun, as well.

     To make a long story somewhat shorter, we have tried every combination
of enabling / disabling autonegotiation on both the Sun hosts and the
switch.  We've tried manually setting full duplex on (with ndd) and half
duplex off.  I've increased tcp_xmit_hiwat and tcp_recv_hiwat to 100000 on
both hosts.  I've increased tcp_max_buf to 1073741824 (the max value
possible according to the Solaris 8 Tunable Parameters reference manual) on
both hosts.  After each and every change I repeated the aformentioned test,
and in each and every case I'm still seeing a 16.something Mbyte/sec
transfer rate.

     We also took another fiber cable and directly connected the 2 Sun
boxes together, bypassing the gigabit ethernet switch.  16.something
Mbyte/sec yet again.

     As an aside, I performed the same test using the hme interfaces on the
2 hosts (they're connected to a 100Mbit/sec switch) and got a transfer rate
of ~7Mbyte/sec, just about what I'd expect.

     My questions are these:  1)  Why am I not seeing a higher transfer
rate?  2)  What is a realistic transfer rate for gigabit ethernet (25
Mbyte/sec, 50 Mbyte/sec, ???)?  3)  Am I missing something here; some
parameter I need to tune?

     The reason why we purchased the gigabit ethernet interfaces in the
first place is because we have also purchased a StorageTek tape library
with LTO drives connected to Host1 and Host2 via UltraSCSI.  As UltraSCSI
supports a transfer rate of ~40Mbyte/sec, I'd like to be able to transfer
data at at least that rate over the gigabit ethernet.  Is that an
unrealistic goal???  Thanks in advance, and of course I'll summarize...

Kevin Buterbaugh
LifeWay

"Anyone can build a fast CPU.  The trick is to build a fast system." -
Seymour Cray


_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers


_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Jun 11 17:03:10 2001

This archive was generated by hypermail 2.1.8 : Wed Mar 23 2016 - 16:24:56 EDT