amavisd stalling out

J David jdavidlists at gmail.com
Thu Feb 28 03:10:04 CET 2019


Hello,

After using amavisd with relatively few problems for many years,
something has recently come up which is proving extremely difficult to
diagnose and correct.

After running fine for a few minutes, amavisd stalls out.  About five
minutes after that, this happens:

Feb 28 00:20:20 b1 amavis[6879]: (06879-01) (!!)TROUBLE in check_mail:
reg_proc FAILED: timed out
Feb 28 00:20:20 b1 amavis[6845]: (06845-01) (!)Requesting process
rundown, task exceeded allowed time
Feb 28 00:20:20 b1 amavis[4184]: (04184-18) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[6845]: (06845-01) (!)TempDir removal:
tempdir is to be PRESERVED:
/data/amavis/tmp/amavis-20190228T001526-06845-J_pe8wnf
Feb 28 00:20:20 b1 amavis[6836]: (06836-01) (!!)TROUBLE in check_mail:
forwarding FAILED: timed out
Feb 28 00:20:20 b1 amavis[6828]: (06828-01) (!!)TROUBLE in check_mail:
forwarding FAILED: timed out
Feb 28 00:20:20 b1 amavis[4105]: (04105-20) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[6660]: (06660-03) (!!)TROUBLE in check_mail:
forwarding FAILED: timed out
Feb 28 00:20:20 b1 amavis[4108]: (04108-20) (!!)TROUBLE in check_mail:
forwarding FAILED: timed out
Feb 28 00:20:20 b1 amavis[4183]: (04183-20) (!!)TROUBLE in check_mail:
quar+notif FAILED: timed out
Feb 28 00:20:20 b1 amavis[6842]: (06842-01) (!!)TROUBLE in check_mail:
forwarding FAILED: timed out
Feb 28 00:20:20 b1 amavis[4078]: (04078-20) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[4083]: (04083-20) (!)Requesting process
rundown, task exceeded allowed time
Feb 28 00:20:20 b1 amavis[4083]: (04083-20) (!)TempDir removal:
tempdir is to be PRESERVED:
/data/amavis/tmp/amavis-20190228T001105-04083-BIJKpd1g
Feb 28 00:20:20 b1 amavis[6748]: (06748-02) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[4132]: (04132-13) (!)Requesting process
rundown, task exceeded allowed time
Feb 28 00:20:20 b1 amavis[4132]: (04132-13) (!)TempDir removal:
tempdir is to be PRESERVED:
/data/amavis/tmp/amavis-20190228T001115-04132-_pPVvTO6
Feb 28 00:20:20 b1 amavis[4172]: (04172-18) (!!)TROUBLE in check_mail:
spam_scan FAILED: timed out
Feb 28 00:20:20 b1 amavis[4114]: (04114-16) (!)Requesting process
rundown, task exceeded allowed time
Feb 28 00:20:20 b1 amavis[4114]: (04114-16) (!)TempDir removal:
tempdir is to be PRESERVED:
/data/amavis/tmp/amavis-20190228T001113-04114-XIBpubu9
Feb 28 00:20:20 b1 amavis[4071]: (04071-17) (!)Requesting process
rundown, task exceeded allowed time
Feb 28 00:20:20 b1 amavis[4095]: (04095-18) (!)Requesting process
rundown, task exceeded allowed time
Feb 28 00:20:20 b1 amavis[4071]: (04071-17) (!)TempDir removal:
tempdir is to be PRESERVED:
/data/amavis/tmp/amavis-20190228T001103-04071-uWuQ_FjN
Feb 28 00:20:20 b1 amavis[4095]: (04095-18) (!)TempDir removal:
tempdir is to be PRESERVED:
/data/amavis/tmp/amavis-20190228T001107-04095-evFSHnz_
Feb 28 00:20:20 b1 amavis[6852]: (06852-01) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[4098]: (04098-19) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[6846]: (06846-01) (!!)TROUBLE in check_mail:
quar+notif FAILED: timed out
Feb 28 00:20:20 b1 amavis[6841]: (06841-01) (!)Requesting process
rundown, task exceeded allowed time
Feb 28 00:20:20 b1 amavis[6841]: (06841-01) (!)TempDir removal:
tempdir is to be PRESERVED:
/data/amavis/tmp/amavis-20190228T001525-06841-LjhaSm_A
Feb 28 00:20:20 b1 amavis[4169]: (04169-19) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[4079]: (04079-20) (!)Requesting process
rundown, task exceeded allowed time
Feb 28 00:20:20 b1 amavis[4079]: (04079-20) (!)TempDir removal:
tempdir is to be PRESERVED:
/data/amavis/tmp/amavis-20190228T001103-04079-QxWEpEkc
Feb 28 00:20:20 b1 amavis[6808]: (06808-01) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[4089]: (04089-20) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[6743]: (06743-02) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[4128]: (04128-19) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[6859]: (06859-01) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[6865]: (06865-01) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[4151]: (04151-20) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[6727]: (06727-02) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[6770]: (06770-01) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[4073]: (04073-20) (!!)TROUBLE in check_mail:
decide_mail_destiny FAILED: timed out
Feb 28 00:20:20 b1 amavis[4157]: (04157-11) (!)Requesting process
rundown, task exceeded allowed time
Feb 28 00:20:20 b1 amavis[4157]: (04157-11) (!)TempDir removal:
tempdir is to be PRESERVED:
/data/amavis/tmp/amavis-20190228T001116-04157-JivXLHdS
Feb 28 00:20:20 b1 amavis[6776]: (06776-02) (!!)TROUBLE in check_mail:
mime_decode-1 FAILED: timed out
Feb 28 00:20:20 b1 amavis[6879]: (06879-01) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001530-06879-GFZd6Pxo
Feb 28 00:20:20 b1 amavis[6828]: (06828-01) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001524-06828-DS1VbeD9
Feb 28 00:20:20 b1 amavis[4184]: (04184-18) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001117-04184-lixrt0pw
Feb 28 00:20:20 b1 amavis[6836]: (06836-01) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001526-06836-k3iHZPIY
Feb 28 00:20:20 b1 amavis[4105]: (04105-20) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001108-04105-1hQIqEGe
Feb 28 00:20:20 b1 amavis[6660]: (06660-03) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001506-06660-RKk3QK6F
Feb 28 00:20:20 b1 amavis[4108]: (04108-20) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001111-04108-ecbB3Q_C
Feb 28 00:20:20 b1 amavis[4183]: (04183-20) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001117-04183-ENU2SsX3
Feb 28 00:20:20 b1 amavis[6842]: (06842-01) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001526-06842-yabNzMXc
Feb 28 00:20:20 b1 amavis[4078]: (04078-20) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001103-04078-YBmu2Xal
Feb 28 00:20:20 b1 amavis[6748]: (06748-02) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001518-06748-coWC10OH
Feb 28 00:20:20 b1 amavis[4172]: (04172-18) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001117-04172-WWseqk4c
Feb 28 00:20:20 b1 amavis[6852]: (06852-01) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001527-06852-MJAWa9Wt
Feb 28 00:20:20 b1 amavis[4098]: (04098-19) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001107-04098-99luc2mN
Feb 28 00:20:20 b1 amavis[6846]: (06846-01) (!)PRESERVING EVIDENCE in
/data/amavis/tmp/amavis-20190228T001526-06846-SNCNRlZL

At this point, amavis is dead until restarted.  The version is
amavisd-new-2.11.1 on FreeBSD 11.

A "ps" of the running processes shows they are all stuck:

110 4063  716   0  20  0 155208 147576 select S     -  0:03.03
/usr/local/sbin/amavisd (master) (perl)
110 4068 4063   0  20  0 169304 158632 ucond  I     -  0:12.24
/usr/local/sbin/amavisd (ch20-avail) (perl)
110 4069 4063   0  20  0 165248 155816 ucond  I     -  0:10.28
/usr/local/sbin/amavisd (ch19-avail) (perl)
110 4071 4063   0  20  0 169304 159192 ucond  I     -  0:09.78
/usr/local/sbin/amavisd (ch17-finish) (perl)
[...]
110 6938 4063   0  20  0 156672 147984 ucond  I     -  0:00.03
/usr/local/sbin/amavisd (ch1-avail) (perl)
110 6942 4063   0  20  0 156672 147984 ucond  I     -  0:00.02
/usr/local/sbin/amavisd (ch1-avail) (perl)
110 6944 4063   0  20  0 156672 147984 ucond  I     -  0:00.03
/usr/local/sbin/amavisd (ch1-avail) (perl)
110 6951 4063   0  20  0 156664 147976 ucond  I     -  0:00.02
/usr/local/sbin/amavisd (perl)
110 6952 4063   0  20  0 156664 147976 ucond  I     -  0:00.02
/usr/local/sbin/amavisd (perl)

The ps above shows all the child processes are in wait state "ucond"
which is short for "user-mode condition."  This means they are all
waiting for something outside the kernel, like a POSIX mutex or
condition variable.  (This specifically excludes file locks, which
wait inside the kernel.)

So, it appears to an outsider that all the child processes are deadlocked.

To verify this, I tried to trace the processes.  It seems like the
last traced operation in each process is this:

  9660 perl     CALL
_umtx_op(0x80089e004,UMTX_OP_CV_WAIT,0x6<CVWAIT_ABSTIME|CVWAIT_CLOCKID>,0x80089d000,0)

So that does appear to bear out the idea that the processes are all
waiting for the same condition variable that never gets signaled.

Through tweaking amavisd.conf, I found that adding:

@bypass_spam_checks_maps  = (1);

Appears to prevent the issue, which suggests that it's somewhere in
the spam-handling code.

Does this make sense to anyone, or is there any way for me to
troubleshoot this further?

Thanks for any advice!


More information about the amavis-users mailing list