Mark.Martinec+amavis at ijs.si
Wed Jun 29 20:33:28 CEST 2011
> >> but, if clamav hung on the primary (like it has done twice since
> >> upgrading to 0.97.1),
> > Ah, it's happening to you as well? Happened here twice or three times
> > already :(
> >> amavisd just seems to sit there till I totally kill clamd with a
> >> sigsegv.
> > Yeah, same here.
> I still want to put a timeout in amavisd so that my secondary takes over.
> anyone help?
Well, timeouts on virus scanners are implemented at two levels:
by setting an alarm and catching its signal, and by a timeout
on a socket. The value of a timeout for each operation is calculated
dynamically according to a remaining time left until a deadline
(improved somewhat in 2.7.0), so the only user-configurable setting
is $child_timeout, with a sensible value perhaps a bit under a minute,
like 45 seconds. That applies to a proxy setup with 2.7.0(-rc/pre).
With a post-queue setup one can afford a longer time limit.
But apperently (at least in Ralf's case with 2.7.0) these timeout
mechanisms did not do their job. Seems like one of the operations
(connect/write/read/close) got stuck in an uninterruptible state.
This hasn't happened here yet (despite running 0.97.1), but
our mail traffic is much lighter than yours - so I don't see how
to test this. Is there a way to make clamd stuck at will?
It would be useful to see amavisd log (at log level 3 at least)
when this happens, or perhaps later with a debug run with clamd
still being stuck.
It may be possible to have two instances of clamd running on
separate sockets, and when one fails switch over and restart
amavisd on the other, while leaving the first for experimentation.
More information about the amavis-users