From owner-spamtools@lists.abuse.net Sat Sep 13 08:14:28 2003
Return-Path: <paul@city-fan.org>
Delivered-To: ea_plusn-earlesfield-p-filtered@earlesfield.plus.com
Received: (qmail 17191 invoked from network); 13 Sep 2003 08:14:28 -0000
Received: from warrior.services.quay.plus.net (212.159.14.227) by
	netmail00.services.quay.plus.net with SMTP; 13 Sep 2003 08:14:28 -0000
Received: (qmail 16856 invoked from network); 13 Sep 2003 08:14:31 -0000
Received: from gatekeeper.city-fan.org (212.56.100.58) by
	warrior.services.quay.plus.net with SMTP; 13 Sep 2003 08:14:30 -0000
X-SQ: A
Received: from gatekeeper.city-fan.org (paul@localhost.intra.city-fan.org
	[127.0.0.1]) by gatekeeper.city-fan.org (8.12.9/8.12.9) with ESMTP id
	h8D8EPER006140 for <p-filtered@earlesfield.plus.com>; Sat, 13 Sep 2003
	09:14:25 +0100
Received: (from paul@localhost) by gatekeeper.city-fan.org
	(8.12.9/8.12.9/Submit) id h8D8EPx1006138 for
	p-filtered@earlesfield.plus.com; Sat, 13 Sep 2003 09:14:25 +0100
Received: from xuxa.iecc.com (xuxa.iecc.com [208.31.42.42]) by
	gatekeeper.city-fan.org (8.12.9/8.12.9) with SMTP id h8D8DwER006131 for
	<paul@city-fan.org>; Sat, 13 Sep 2003 09:14:04 +0100
Received: (qmail 23513 invoked by uid 85); 13 Sep 2003 08:13:21 -0000
Received: (qmail 23129 invoked from network); 13 Sep 2003 08:12:35 -0000
Received: from serv5.gtcs.com (HELO mail.gtcs.com) (209.181.16.5) by
	mail2.iecc.com with SMTP; 13 Sep 2003 08:12:35 -0000
Received: from home.gtcs.com  (home [209.181.16.2]) by mail.gtcs.com
	(8.11.3/gtcs-5.7.9) with ESMTP id h8D8CQg55456 (using TLSv1/SSLv3 with
	cipher EDH-RSA-DES-CBC3-SHA (168 bits) verified NO) for
	<spamtools@lists.abuse.net>; Sat, 13 Sep 2003 02:12:31 -0600 (MDT)
	(envelope-from: <antispam@gtcs.com>)
Content-class: urn:content-classes:message
X-Authentication-Warning: serv.gtcs.com: Host home [209.181.16.2] claimed
	to be home.gtcs.com
Received: from localhost (localhost [127.0.0.1]) by home.gtcs.com
	(8.11.3/8.11.3/wkstn-1.02) with ESMTP id h8D8CPM45248 for
	<spamtools@lists.abuse.net>; Sat, 13 Sep 2003 02:12:25 -0600 (MDT)
	(envelope-from antispam@gtcs.com)
Date: Sat, 13 Sep 2003 02:12:25 -0600 (MDT)
Message-Id: <200309130812.h8D8CPM45248.smij@home.gtcs.com>
From: Bruce Gingery <antispam@gtcs.com>
To: SpamTools <spamtools@lists.abuse.net>
Subject: [spamtools] Sendmail access maps -- beyond access.db
MIME-Version: 1.0
Content-Type: Text/PLAIN; charset="US-ASCII"
In-Reply-To: <1698323961.20030911182251@conti.nu>
References: <1698323961.20030911182251@conti.nu>
Precedence: list
List-Help: <mailto:spamtools-request@lists.abuse.net?subject=help> (List
	Instructions)
List-Unsubscribe: (Use this command to get off the list)
	<mailto:spamtools-request@lists.abuse.net?body=unsubscribe%20spamtools>
List-Subscribe: (Use this command to join the list)
	<mailto:spamtools-request@lists.abuse.net?body=subscribe%20spamtools>
List-Post: <mailto:spamtools@lists.abuse.net>
List-Owner: <mailto:owner-spamtools@lists.abuse.net> (Contact Person for
	Help)
List-ID: <spamtools.abuse.net>
Precedence: bulk
Sender: owner-spamtools@lists.abuse.net
Reply-To: spamtools@lists.abuse.net
X-Loop: paul@city-fan.org
X-Evolution-Source:
	pop://earlesfield+p-filtered%23mail.plus.net@gatekeeper.intra.city-fan.org/
Content-Transfer-Encoding: 8bit

Kai asked:
> Bruce Gingery wrote:
> > A lookup in Scheck_mail (with delayed checks) or Local_check_mail
> > (without delayed checks) could use a secondary access-style
> > lookup using a generic (virtual-users) style map processing.

> Can you elaborate on this ? I am a little confused about this concept.

  Not sure which you were confused about.

  The access database is a very coarse tool.  It has built-ins for
  use in the distribution, but many additional finer granularities
  are possible.  For example, by waiting until RCPT TO: time, you
  can check the MAIL FROM: in the light of the RCPT TO: information.
  But that is not directly supported by normal access.db content.

  Since access.db is already used for simple checks:
	Connected IP		or octet mask
	Connected domain	or parent domain
	MAIL FROM: domain
	MAIL FROM: user@domain
	RCPT TO:   domain
	RCPT TO:   user@domain	friend/hater
	Offer TLS
	Require TLS
	Use TLS outgoing
	Require TLS outgoing
	Certificate matching
	  etc. etc.
  to add complexity (e.g. does user Y@local accept mail from Z@example.com)
  it makes better sense to add a different map, than to further overload 
  the tags even further, in access.db

  The other possible confusion, I'll deal with first...

  Normally, sendmail calls several top-level rulesets based upon
  what stage of the transfer negotiation is current -- if the ruleset
  exists.  I'll ignore AUTH and STARTTLS testing, to simplify things:

STANDARD  check_relay	- before any greeting
NONE	  check_vrfy	- after any VRFY, if they're enabled
NONE	  check_expn	- after any EXPN, if they're enabled
STANDARD  check_mail	- after 220, HELO/EHLO, 250, and MAIL FROM:
STANDARD  check_rcpt	- after each RCPT TO
NONE	  check_eoh	- after the CRLFCRLF marking end-of-headers
NONE	  check_compat	- after end-of-DATA phase with most of the info
			  gathered from the above no longer available.

  The Local_check_* rules are invoked near the beginning of their
  equivalent distribution ruleset.  By default the Local_* ones
  are empty, hence are effectively not even called.  But, if present:

STANDARD  check_relay
YOUR		calls Local_check_relay
STANDARD  check_mail
YOUR		calls Local_check_mail
STANDARD  check_rcpt
YOUR		calls Local_check_rcpt


  When you invoke FEATURE(`delay_checks') many of those are shuffled,
  in order-logic, and no check_mail/check_relay rulesets are even 
  created in the sendmail.cf.  Note reversed order ...

NONE	  check_relay - is not generated into sendmail.cf
NONE	  check_mail  - is not generated into sendmail.cf

STANDARD  check_rcpt
STANDARD	calls checkrcpt
YOUR			calls Local_check_rcpt
STANDARD	and calls checkmail
YOUR			calls Local_check_mail
STANDARD	and calls checkrelay
YOUR			calls Local_check_relay

   but, that also means that for non-delayed checking, you may
   create your OWN check_relay and check_mail, since the standard
   generated checks are generated without the underscore in the
   name.  That gives the possibility of (in order):

YOUR	  check_relay	- at connect time
YOUR	  check_mail	- after 220, HELO/EHLO, 250, and MAIL FROM:
STANDARD  check_rcpt	- after each RCPT TO:
STANDARD	calls checkrcpt
YOUR			calls Local_check_rcpt
STANDARD	and calls checkmail
YOUR			calls Local_check_mail
STANDARD	and calls checkrelay
YOUR			calls Local_check_relay

   hence FIVE places to check the three main negotiation parameters
   with your own specialty checks.  In effect - you can have your
   cake (delayed checks) and eat it, too (do some checks NOT delayed).

   The differences in the Local_* rules, are almost negligible, though,
   with delayed checks.

   Things done in check_relay are absolutely by remote IP/domain.
   Nothing else is known.  We haven't even greeted the caller with
   a 220 (hence could greet with something else, like a 554).
   We only have the connection details - his address, his domain
   if any, his port, our IP (we could be listening on more than one)
   our port (we SHOULD be listening on more than one, if we do
   both mail and submissions on the same daemon), and the name
   and flags for the mailer the connection is on.

   Things done in check_mail  can check the above, as well as
	* HELO/EHLO
	* HELO/EHLO parameter
	* MAIL FROM: parameters
   as well as combinations of those.  Some users should only
   be able to send from a local connection, or even background
   non-TCP connection.  Root, various daemons, even postmaster
   should PROBABLY only be able to send from limited local
   connections.  Others can be invalidated right here.  If
   somebody's sending with a MAIL FROM:<postmaster> we don't
   want to add our own hostname to that in processing, unless
   it really is the local postmaster.  Neither do we want to leave
   it as a bare <postmaster> address without domain.

   Things done in Local_check_rcpt, OR Local_check_mail OR
	Local_check_relay with delayed checks can check the above, plus
	* RCPT TO: parameters
   as well as combinations of ANY of those.

   That means that unless you care WHAT recipient is being addressed,
   things like non-valid HELO parameters can be checked in a ruleset
   check_mail, so long as you delay checks.  

   If you do not delay_checks, then you must move that to the
   Local_check_mail ruleset to avoid colliding with the standard
   check_mail ruleset.

   So what kind of checks aren't usually done (but should be)
   even in various options?

   1. Check the RCPT TO: address for outgoing mail -- has the mail
      been invalidated with an MX record in DNS of a dot?
      This can be added to ParseLocal, so long as you're not
      using a pseudo-domain handled locally but with an MX 0 .
      to the public.

   2. Check the MAIL FROM: domain for incoming mail.  Has the
      mail been invalidated with an MX record in DNS of a dot?

   3. Was the HELO (or EHLO) parameter valid?  RFC2821 is very
      much more restrictive about what's permitted than some
      past poor practices.   There are some domains (such as
      those hosted by mail.com) which NEVER should appear bare
      in a HELO/EHLO.  There are others never used by the domain
      themselves, but which have been exploited by mailware, such
      as compuserve.com.   Finally, there are dial-up direct-to-MX
      spammers, such as the "airs*.*" personnel services, which
      seem to always use their own domain in HELO/EHLO and/or
      MAIL FROM:, but morph all over the place.  There are also
      bogon [IPv4] addresses and bogon domains (never registered,
      often forged) to reject right at the HELO/EHLO - no matter
      who they caim to be sending mail from and to.

      If the host HELOs or EHLOs as your name, it would be risking
      a mail loop to accept anything from that host.  Same if it 
      HELOs or EHLOs with your [IPaddress].  Somebody's confused,
      and manual intervention is required!  Rejecting everything
      on such a connection SHOULD attract that manual attention,
      if there was any legitimacy to begin with.

      If it's a tld that is not used in any root you subscribe to,
      then it's mail to reject.  For most people, that includes
      the few gTLD and special-purpose TLDs, and the ccTLDs, and
      those only.  For others, there are more.

      An unbracketed IPv4 address is NOT a legitimate "address literal",
      nor is an [IPv4] that doesn't match the actually connected client's
      address.  There could be a name mismatch, according to RFC2821,
      or even a fully-qualified-domain-name that doesn't resolve, and
      it still be mail that shouldn't be rejected for a bad HELO/EHLO.

   4, Some domains only send from certain servers.  If you KNOW you
      have no dot-forward mail coming your way, then you can restrict
      MAIL FROM: various domains to their actual servers.  This
      especially includes the huge ISPs and FreeMail providers,
      so often abused because of presumed anonymity by spammers.

 None of this needs to be as late as RCPT TO: time.

 There's plenty left for RCPT TO: time -- nonexistent users, those
 who want to ban all mail from an otherwise legitimate sender.  Block
 lists that aren't applied to whitelisted accounts like postmaster@
 or abuse@.  Checking a submitting IP address or fully-qualified-domain
 client name, against a list exploder RCPT TO: ... many many things.

OR

 The other possible confusion.  Using a "virtual users" style map,
 or even one more complex, in addition to simple access.db processing
 allows for fall-thrus and blanket white/blacklisting based on
 sender, apart from other usages.

 Listing an IP address in the LHS of access.db, or a series of octets
 indicating an /8, or /16, or /24, is a broad brush.  Similarly with
 a domainname.  They can be qualified with Connect: or From: tags,
 but still they have no bearing on the recipient.  Site policy is
 established.  Either a recipient is declared as a "friend" or "hater"
 but that can only be checked with delayed checks -- and still isn't
 by-sender/by-recipient, but only yes-no by-recipient when ANY flaws
 are found with the sender.

 In its simplest extension, a separate map is created with
	<sender>:<recipient>		action

 The fangs MAY be needed for odd sender local-part or odd recipient
 local part.  The action can be a token (like RELAY DELIVER OK BLOCK
 REFUSE) or a full ERRROR:d.s.n:"COD message" -- provided the 
 rules are invoked at or after the right time to check for matches in
 that map.

 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *
 *    The WORST thing is to pass such decisions on to a local    *
 *    delivery agent like procmail, after having accepted        *
 *    delivery.  Then you either confirm deliverability (even    *
 *    if mail is discarded), or  bounce unnecessarily -- often   *
 *    to a forged sender.                                        *
 * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * *

 For highly complex checks, a MILTER is needed with sendmail.  Sendmail
 itself is limited to only $1 thru $9 for positional parsed parameters 
 in any give rule.  It is POSSIBLE to do multi-line parsing with temporary
 storage macros to get more than 9, but that gets VERY bloated.

 Keeping it _somewhat_ neat ...  parse first THEN use those stored
 values in your lookups:

 Krelayctl hash /etc/mail/relayctl
 Kstorage macro

 SLocal_check_rcpt
 R<$*>                   $1
 # next line wraps twice
 R$*                  $: <$1> $(storage {tlocal} $@ FALSE $) 
        $(storage {flocal} $@ FALSE $) $(storage {hlocal} $@ FALSE $)
        $(storage {fnull} $@ FALSE $) $(storage {hident} $@ <> $)
 R<$* @ $=w >         $: <$1@$2> ${storage {tlocal} $@ TRUE $)
 # next line wraps
 R<$+ + $+ @ $+ >     $: $(storage {tuser} $@ $1 $) 
        $(storage {ttag} $@ $2 $) $(storage {tdom} $@ $3 $)
 R<$+ @ $+ >          $: $(storage {tuser} $@ $1 $) ${storage {tdom} $@ $2 $)
 R$*                  $: <$&{mail_addr}>
 R<$*@$=w>            $: <$1@$2> $(storage {flocal} $@ TRUE $)
 # next line wraps
 R<$+ + $+ @ $+ >     $: $(storage {fuser} $@ $1 $) 
        $(storage {ftag} $@ $2 $) $(storage {fdom} $@ $3 $)
 R<$+ @ $+ >          $: $(storage {fuser} $@ $1 $) ${storage {fdom} $@ $2 $)
 R<>                  $: $(storage {fnull} TRUE $)
 R$*                  $: <$&client_name}>
 R<$=w>               $: <$1> $(storage {hlocal} $@ TRUE $)
 R<[$-.$-.$-.$-]>     $: $(storage {hname} $@ <> $)
 R<$+>                $: $(storage {hname} $@ $1 $)
 R$*                  $: $&_
 R$+@$*               $: $(storage {hident} $@ $1 $)
 R$*                  $: <$&{rcpt_addr}>
 #
 # At this point
 #     $&{tuser}     contains either RCPT TO: local-part, or
 #                   RCPT TO: local part sans tag.
 #     $&{ttag}      contains RCPT TO: tag, if there was one
 #     $&{tdom}      contains domain-part from RCPT TO:
 #     $&{tlocal}    contains TRUE if we consider that domain local
 #     $&{fnull}     contains TRUE if it was MAIL FROM:<> null-sender
 #     $&{fuser}     contains MAIL FROM: local part, or
 #                   MAIL FROM: local part sans tag
 #     $&{ftag}      contains MAIL FROM: tag if any
 #     $&{fdom}      contains MAIL FROM: domain part
 #     $&{flocal}    contains TRUE if that domain part is considered local
 #     $&{hname}     contains the resolved name of the connected host
 #                   or <>, if rDNS failed.
 #     $&{client_addr} contains the address of the connected host
 #     $&r           contains SMTP or ESMTP, if this is a TCP connection
 #     $&s           contains the HELO or EHLO parameter
 #     $&f           contains the raw MAIL FROM: address
 #     $&g           contains the processed MAIL FROM: address.
 #     $&{client_resolve} contains OK TEMP or FAIL
 #     $&{hident}    contains the identd return (or <> if none) for client
 #
 # in addition to having the passed parameter restored for matching,
 # and any session authentication parameters (if active), or $&{dsn_*}
 # macro content if ESMTP and Delivery-Service-Notification was requested.
 #
 # so we can combine any of this to look up in a relayctl left-side
 # or even cross-match among it.  That's not the most efficient, but
 # it is the most flexible.  The only thing not taken into account
 # is actual content - which (without MILTER overrides to what
 # information is "current") cannot be cross-matched to the recipients.
 # - BUT -
 # Even more parsing is possible.  Do we want to try to de-VERP the
 # sender address to see if it matches the recipient address?  Or
 # to extract some kind of list name?

   Perhaps the $&{tuser}@$&{tdom} doesn't want any mail with an ident
   of hidden-user nor squid, nor CacheFlowServer, regardless of +tag
   in his E-Mail address, but postmaster will take it anyways?
   
   More matching and shifting is possible in subsequent rules...
   Perhaps extracting parent domains of the E-Mail addresses.  Or
   lookups in an fdnsbl of the sender domain, HELO/EHLO param or
   resolved client domain.

   Maybe the tag is an expiring timestamp that needs to be checked
   against the $b date (when-client-connected) or $t current time()
   value, as well as being checked to see if someone "just made it up"
   to try to zap past an expiring tagged address.

   The VERPed sender for MIME-format digests of spamtools can
   be tested (reasonably) with a lookup in a map that contains
   list subscriptions:

   <recipientaddr>:<senderaddr>        OK

   with 
   R$*              $: $1 $| $(listsubs $1:<$fuser@$fdom> $)
   R$* $| OK        $@ OK

   because it would find an entry for 
	<recip@example.com>:<owner-spamtools@lists.abuse.net>
   with an OK - discarding the VERP info for the lookup.

   Not all VERP patterns are that easy.  They should be.  If such
   messages were forged, then merely add the IP address or domain
   name of the sending client as a 3rd parameter in the map's LHS
   and in the lookup.