This is an update of: SUMMARY: Simple anti-spam system using open-source software and freely-available data http://www.sunmanagers.org/pipermail/sunmanagers/2003-August/024169.html which you might want to browse through before reading this -- though it's not really necessary, as this is a complete rewrite. This is the approach that I use. Let me emphasize "approach": I don't do all these things on all mail servers, and I don't do them in the exact same way, because every server/domain gets a different mix of incoming spam. It's always important to try to figure out what that mix looks like and tailor the blocking to match it. But most of this will work most of the time for most people -- and in a lot of cases it's turned out to "good enough" that more work isn't necessary. In others, it's been "good enough" that the additional work required is made quite a bit easier by it. So here goes. I run sendmail and have had excellent results using a layered approach to blocking spam. The general idea is to use those measures which are computationally cheapest first, in order to reduce the burden on subsequent layers. The approach I'm taking (outlined below) would also work for other MTAs (e.g. postfix, exim) on other 'nix systems. I don't do any kind of content analysis: I'm in agreement with Paul Vixie on this one: either people share our values or they don't. If they do, then they don't allow spam to flow out of their networks (at any rate beyond a trickle, which is probably inevitable). If they don't, then they're either actively supporting spammers or incidentally supporting them through neglect and incompetence -- and the reason doesn't really matter to me, my users, my systems or my networks. More succinctly: systems and networks which emit spam are broken and should either be repaired immediately or physically disconnected from the Internet until they are. More bluntly: I'm not going to waste my resources trying to sort out clean water from sewage. That responsibility rests with the people whose servers and networks are spewing effluent through the pipes designated for water. 1. I use this: The Spamhaus Project: DROP (Don't Route Or Peer) List http://www.spamhaus.org/DROP/ at the firewall and router level, or in the sendmail 'access' file when that's not possible. These are networks which are 100% controlled by spammers, so no good can come of accepting their traffic. I've augmented this locally by a few particularly problematic networks; for example, after reading these: Call for Internet Death Penalty: Burstnet/Hostnoc http://groups.google.com/groups?selm=20030708121252.GA14167%40example.com Call for Internet Death Penalty #2: Optigate/Optinrealbig http://groups.google.com/groups?selm=20040604204406.GA2771@example.com Call for Internet Death Penalty #3: Hopone/Superb http://groups.google.com/groups?selm=20040604204549.GA637@example.com their network allocations are now a fixture in my deny lists. It's up to you, of course, but I see no reason to ever accept another packet from them. 2. I have configured sendmail to reject all mail from domains which don't resolve. This also blocks mail from broken mail servers, but since there's no way to tell them to fix their DNS... Sendmail comes set up this way by default on most systems. 3. I have set up sendmail to issue a multi-line SMTP greeting banner. This causes a surprising amount of the malware installed on hijacked Windows systems to fail, as it's not set up to deal with that. No doubt future malware will cope with this, but for the last year it's been very useful. Simple, easy, fast, and satisfying. ;-) 4. I then use a very large list of domains, via the sendmail 'access' file. This is handy because the access file is hashed, thus lookups are roughly O(1) no matter how large it becomes. But it's also error-prone: in fact, during the past two years, every time I've had a false positive reported to me, this is where I've traced it to on all but two occasions. But - considering that I'm using a list of about 128,000 domains and have had less than a dozen false positives in two years, it seems like a reasonable approach. Doubly so because this step alone blocks from 30% to 40% of incoming spam with very little overhead. Even more so because reduces the number of DNSBL queries (see step 8) which not only reduces my outbound traffic, but the load I impose on the DNSBLs that I'm using. Many domain lists are also available; here's a few of them: http://www.rhyolite.com/anti-spam/unwelcome.html http://www.river.com/ops/spam/bad-domains.txt http://www.spamblocked.com/killfile http://www.znet.com/blocked-domains.html http://www.cluelessmailers.org/listings/blacklistbydomain.html http://obob.manilasites.com/ http://www.carl.net/spam/access.txt http://www.unixgirl.com/blockeddomains.html http://www.cart00ney.org/blocklist.txt http://abuse.easynet.nl/spamlist-usage.html Note: if you use a large list of domains in the sendmail 'access' file, you will want to RTFM on "makemap" and note the "-c" flag. The speedup in rebuilding the hash is quite significant. 5. I block all mail from certain TLDs on some mail servers because the people using those servers don't expect to ever receive mail from those places. I don't like doing this, because it's such a drastic measure, but it's too effective a technique not to use. In particular, I routinely block: .cn (China) .kr (Korea) .tw (Taiwan) I'm about >this close< to adding .biz to that list. Of course, if you actually expect to get non-spam mail from those TLDs, you probably can't do this. This is why I don't block .br, for example: I have users who actually get non-spam mail from there. But if you don't, you might want to consider blocking it. 6. I use a few special-purpose rules in the sendmail access file to take care of spam from hijacked CacheFlow servers, hijacked AOL proxy servers, often-forged addresses, and so on. Let me know if you want them: they're pretty simple/short/easy. 7. I use ~150 subdomains (also in the sendmail access file) which correspond to dynamically-allocated IP space, e.g. "dhcp.example.com". I don't like doing this either, but it's also too effective not to use: spam from hijacked PCs on cable/DSL connections is epidemic. I have been slowly expanding this because it seems to be filling in gaps that the other measures are missing. Note: in most cases, the users on such networks are contractually obligated to use their ISP's designated outbound mail server(s). So the only SMTP traffic that this measure blocks is (a) spam from zombies (b) spam from the spammers' own systems and (c) mail from people who are deliberating violating their own ISP's TOS. It's correct to say that (c) isn't necessarily spam: but I'm not going to lose any sleep over blocking it anyway. 8. I use multiple DNSBLs, each of which targets a slightly different mix of spam. For starters, I use cn-kr.blackholes.us tw.blackholes.us for the same reason I block .cn, .kr and .tw -- see step 5 above. Again, this may not be a reasonable step for everyone, but check www.blackholes.us for other available DNSBLs that might be. They have quite a wide selection, both by country and by ISP/host. But locally, use of those two DNSBLs alone nails about 30% of incoming spam. I then use these DNSBLs (each listed with DNSBL name and web site) sbl-xbl.spamhaus.org http://www.spamhaus.org/sbl/ http://www.spamhaus.org/xbl/ dnsbl.ahbl.org http://www.ahbl.org/ list.dsbl.org http://dsbl.org/ dnsbl.njabl.org http://njabl.org/ relays.ordb.org http://ordb.org/ l1.spews.dnsbl.sorbs.net http://www.spews.org/ The Spamhaus SBL+XBL combined DNSBL is a must-have. I have never had a false positive with it. And the relatively recent addition of the XBL picks up millions of zombie Windows machines that are spewing spam. The AHBL augments this nicely, and includes a RHSBL (right-hand-side BL) which handles blocking by domain name. If you don't want to do step 4, this is a good substitute. The DSBL, NJABL, and ORDB all pick up different combinations of open relays, open proxies, hijacked systems, etc. The SPEWS list -- despite what some of its less-informed critics have said -- is very accurate and correctly targets the spam-supporting ISPs and hosts who are directly responsible for much of the spam we all endure. Other DNSBLs that I have either used or am considering using: Blitzed OPM http://opm.blitzed.org/ PDL http://www.pan-am.ca/pdl Leadmon http://www.leadmon.net/spamguard/ SORBS http://dnsbl.sorbs.net/ FiveTen http://www.five-ten-sg.com/blackhole.php NOTE: You should probably not use any DNSBL until you've read its policies. NOTE: If you intend to make heavy use of these DNSBLs, you should probably read their web sites and see about doing zone transfers. NOTE: I find it very useful to run a local copy of BIND in caching mode on every mail server, since those servers often get repeatedly pummeled from the same sets of addresses. This not only enhances performance locally, but cuts down on the load my servers impose on the DNSBLs. NOTE: DNSBLs are invoked sequentially by sendmail, so it's a good idea to put the one that blocks the most spam as seen by your servers first. But figuring out which that is can be quite an effort. For most people, the Spamhaus SBL+XBL DNBSL is a pretty good first guess, though. 9. I'm experimenting with using rbldnsd to run my own internal DNSBL -- replacing, in part, the sendmail 'access' file. The upside of doing this is that rbldnsd stores information in a very compact format with a low memory footprint; it's designed to serve DNSBLs, not as a general purpose DNS server. Another advantage is that keeping the information in rbldnsd would allow it to be used by sendmail, postfix, exim, whatever. Yet another is that it can be queried easily (contrast with the sendmail 'access' file). The downside is that it's another process to run; it requires a different format than sendmail (which means reworking scripts, etc.); and it's one more step that could conceivably fail. (Mitigating this is that sendmail presumes a non-responding DNSBL means "not listed" and thus fails soft.) It's not clear to me yet who this experiment will turn out, but the early results are promising enough for me to suggest to others as a possible course of action. 10. My best estimates of the performance of all this is that the local measures (1-7) block about half the spam that is blocked, and the DNSBLs (8) block the other half of the spam that is blocked. The blocking rate itself appears to be somewhere around 93% to 97%: it varies as spammers switch networks or domains, or activate new groups of zombies. The false positive rate is about 1 per month; but I need to caveat that by stating that unreported false positives may still be lurking. (On the other hand: my users squawk pretty loud and fast when something goes wrong, so I don't think there are many.) NOTE: Assessing performance of anti-spam techniques requires both the FN (false negative: unblocked spam) and FP (false positive: blocked non-spam). It's easy to drive either to 0; it's hard to do both at once. NOTE: Everybody's incoming spam and non-spam mix is different. The only way to really figure out which of these steps will best minimize (FP, FN) is to analyze the statistics. But 1, 2, 3, and some of 8 are nearly always a good first guess, and in some cases, they solve enough of the problem that further analysis/measures aren't necessary. ---Rsk _______________________________________________ sunmanagers mailing list sunmanagers@sunmanagers.org http://www.sunmanagers.org/mailman/listinfo/sunmanagersReceived on Fri Jul 23 16:44:48 2004
This archive was generated by hypermail 2.1.8 : Thu Mar 03 2016 - 06:43:36 EST