[SUMMARY] millions of files in ZFS

From: Martin Preßlaber <presslaber_at_ips.at>
Date: Mon Jul 11 2011 - 09:33:01 EDT
Hi!

First of all, thank you for your mails!

In short: It's no problem with ZFS.

--- SUMMARY ---
Got a lot of answers from admins who are running servers with more than 90TB
and 200-300 million files in one pool (-> mail servers). Having a lot of files
in one directory will have a poor performance using commands like "ls" and
users could have a hard time browsing the directories with such "big folders".
But generally this amount of files is no problem, you can have billions of
files in ZFS. (128bit)
Nearly in every mail it is mentioned to have only one pool; splitting up will
always prove to have been a wrong decision unless you need different
reliability/performance setups (e.g. mirror vs. raid6). It's recommended to
have at least 48GB RAM for 100 million files and if possible a separate ZFS
log and/or cache device. A separate log device is highly recommended when
sharing the data using NFS. Using more "small" LUNs should have a much better
performance than only one big LUN (ZFS-striping & I/O queues).
BUT; do not let ZFS go over 80-85% utilization. -> works as designed ->
copy-on-write

Reported issue:
If you NFS clients are traversing the directories too often, that might
invalidate the DNLC cache which associates paths to vnodes. That could happen
in UFS as well as in ZFS since it's a common OS facility (see vmstat -s | grep
'name lookups').
http://download.oracle.com/docs/cd/E19620-01/805-4448/6j47cnj0u/index.html
This happened to us because people had 500k message in their Maildirs. And the
IMAP software was scanning all messages every time someone checked their
inbox. When that happened, the DNLC hit ratio would drop from 97% to 10% for a
few seconds... that would put too much pressure on the disk subsystem.

Tuning guide:
http://www.solarisinternals.com/wiki/index.php/ZFS_Evil_Tuning_Guide

Tuning tip:

Use NFS v4 with ZFS, raise the buffer limits:

/etc/system

set zfs:zfs_arc_max = 0x200000000

set ncsize=0x100000

set nfs:nfs4_bsize=0x100000

set ndd:tcp_recv_hiwat=1024000

set ndd:tcp_xmit_hiwat=1024000

set ndd:tcp_max_buf=4194304

set ndd:tcp_cwnd_max=2097152

set ndd:tcp_conn_req_max_q=1024

set ndd:tcp_conn_req_max_q0=4096



Note: Not all OS versions can set these values via

       /etc/system, on Osol we have to set the tcp* values

       in one of the netinit scripts in /lib/svc/method

       via ndd.


more information on zfs and scalability could be found on the zfs-discuss
list:
http://mail.opensolaris.org/mailman/listinfo
http://opensolaris.org/jive/forum.jspa?forumID=80


UFS support should be no problem in the next solaris releases, at least for
the next 5 to 10 years, but everyone said, it's time to migrate and you will
love ZFS. Definitely go ZFS, nothing can beat that filesystem, you never ever
don't want anything else if you have worked with it!



once again, thanks for all answers!

greetings from Austria,

-    martin presslaber


[Q] millions of files in ZFS

Hallo together,

I am looking for some recommendations or suggestions regarding ZFS and on my
opinion a huge amount of files.
What we have now, is a setup with Solaris9 and UFS. Around 40 File-Systems
with 500GB mounted on a single Server and each holds nearly 2 million files
(100k to 10MB) in one directory (application based -> medical software).
That's about 20TB with 80 million files (still growing), so let's plan to
migrate to Solaris10 and ZFS... We won't have a problem with 20TB in one pool,
often seen with ZFS, BUT:

* Any experiences with 80 million files and ZFS? 80M in a pool and 2M per
ZFS-FS?
* Should we use 1 pool, or split the data into 2 or more pools?
* Will ZFS still perform with 2 million files in one directory? (UFS works
good enough)
* With UFS we have 500GB LUNs; we plan to use 1-2TB LUNs for ZFS, good idea?
* Is there any tuning needed or will it work with the standard settings?
* What's about the "80% used space - performance issue"; having 20TB data, 40
ZFS-FS filled with 90-95%, will we need additional 4TB free space inside the
pool to guarantee a good performance???

Some side notes:

* Don't worry about the hardware; the new server is a m5000 with 128GB RAM
with a lot of I/O getting the LUNs from an HDS USPv with a bunch of 15k FC
disks.
* The 500GB FS is a legacy limit; in the past, we had to less inodes with more
than 500GB (should be solved with a modern UFS+)
* With more than 2 million files in one directory, we had several issues
sharing the directories with NFS (v3).

It takes ages to copy the data from UFS to ZFS, so it would be horrible to
see, that ZFS isn't working and we must copy everything back to UFS.
What do you think, is it a good idea to migrate to ZFS or should we stay on
UFS? (how long will UFS be supported in Solaris 11 12 13 14...)

Thanks in advance! If you have any more questions, don't hesitate to contact
me.
 - Martin Presslaber
_______________________________________________
sunmanagers mailing list
sunmanagers@sunmanagers.org
http://www.sunmanagers.org/mailman/listinfo/sunmanagers
Received on Mon Jul 11 09:33:23 2011

This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:44:18 EST