First, many thanks to all of the sun-managers who have patiently
offered things to try making moving up the BayNetworks and SUN Support
feeding chains just that much quicker.
Here's the Summary of my summary. I've been having trouble getting
a SUN FastEthernet interface to talk to a BayNetworks Ethernet Workgroup
Switch. The problems went beyond the performance problem that has
been posted about several times recently. I'd not only see poor
performance, but see serious NFS outages, where the server could
ping the client, and the client could ping the server, but NFS was
unusable. And sometimes the two machines couldn't even ping each other
at all. The problem only occurred under moderate to heavy loads.
K. I got several useful tips, and some more settings to diddle
with ndd. It also appears that the NFS faq on sunsolve isn't truncated
in the middle anymore, and it mentioned reducing the NFS read and write
sizes to 1024 bytes instead of 8192. I tried this, and was unable
to reproduce the problem. However this is an untenable solution for
me -- this *killed* my NFS performance even more, and would require a
change on *every* client as well as to all the automounter maps. It's a
hack at best.
It did point out that the problem only occurred with packets larger than
the MTU (1500 bytes). I tried bing with 2048 byte packets (I'd been unable
to reproduce the problem with bing with its default 108 byte packets) and
saw a 10 or 15 second burst of nice, fast traffic to/from the server,
and then the server stopped receiving traffic altogether (just like w/NFS).
Several people suggested using snoop on the server -- this just showed
that the server was suddenly receiving less traffic when the problem
occurred. Netstat shows the same thing. The number of collisions
is fairly low throughout.
SUN suggested applying the latest kernel and NFS jumbo patches,
which I will do, but it turns out this is not the problem. A friend of
mine brought over a 3Com switch (a LanPlex I believe) which we tried
out. With the default /dev/hme and /dev/tcp settings, the 3Com switch
worked just dandy! Throughput was 3 to 5 times better than I'd ever
seen from the Bay Workgroup Switch, and I couldn't reproduce the problem
(even using 8k packets, bing, and flooding all 6 other collision domains
with NFS requests).
So... It looks as if Bay Networks has some additional problems with
their 100BaseT media adapter on their Ethernet Workgroup Switch. Needless
to say, I'll be giving them a call and asking when I should expect the
next firmware revision and/or hardware patch...
This archive was generated by hypermail 2.1.2 : Fri Sep 28 2001 - 23:11:18 CDT