amavis crashing in Natty

Mark Martinec Mark.Martinec+amavis at ijs.si
Tue Apr 5 21:59:08 CEST 2011


Florian,
 
> 2011/4/2 Gary V <mr88talent at gmail.com>:
> > I thought that changes made to start-stop-daemon may have possibly
> > resulted in this issue, but after downgrading dpkg (of which
> > start-stop-daemon is a component) to the same version that is on
> > Ubuntu Maverick 10.10 (which does not exhibit the problem), the issue
> > remains.
> 
> if there is anything I can do to help debugging the issue, let me
> know. If I find a pattern or a cause, I'll let you know as well :)

Don't know what makes Maverick behave better than Natty, but
as far as I can tell it is pure luck, depending on timing.

The underlying problem seems to be in Net::Server, whose master process
does not wait for its child processes to shut down when it tells them to.

So when the master process terminates, the restart script goes on
with starting a new server, not realizing that some child processes
of the previous incarnation may still be doing their shutdown.


Try the following patch to Net::Server 0.99:

--- Net/Server.pm~	2010-07-09 16:55:31.000000000 +0200
+++ Net/Server.pm	2011-04-05 21:47:18.318431988 +0200
@@ -32,4 +32,5 @@
 use Fcntl ();
 use FileHandle;
+use Errno qw(ESRCH);
 use Net::Server::Proto ();
 use Net::Server::Daemonize qw(check_pid_file create_pid_file
@@ -1098,14 +1099,38 @@
 
   foreach my $pid (keys %{ $prop->{children} }) {
-    ### if it is killable, kill it
-    if( ! defined($pid) || kill(15,$pid) || ! kill(0,$pid) ){
-      $self->delete_child( $pid );
-    }
-
+    kill(15,$pid) || $! == ESRCH
+      or $self->log(1, "Error sending SIGTERM to a child process [$pid]: $!");
   }
 
   ### need to wait off the children
-  ### eventually this should probably use &check_sigs
-  1 while waitpid(-1, POSIX::WNOHANG()) > 0;
+  my $process_group;
+  $process_group = $$  if $prop->{setsid};
+  if ($process_group) {
+    $self->log(3, "Close_children, waiting for %s procs in proc group [%s]",
+                  scalar keys %{ $prop->{children} }, $process_group);
+  } else {
+    $self->log(3, "Close_children, waiting for %s processes",
+                  scalar keys %{ $prop->{children} });
+  }
+  while (%{ $prop->{children} }) {
+    my($pid, $child_pid);
+    if ($process_group) {
+      $child_pid = waitpid(-$process_group, 0); # any child of a process group
+    } else {
+      $pid = (keys %{ $prop->{children} })[0];  # pick one explicitly
+      $child_pid = waitpid($pid,0);
+    }
+    my $child_status = $?;
+    if (!$child_pid || $child_pid == -1) {
+      $self->log(1, "Unable to obtain exit status of a child process [$pid]");
+      $child_pid = $pid;
+    } elsif (!$child_status) {
+      $self->log(3, "Child process [$child_pid] terminated with success");
+    } else {
+      $self->log(1, "Child process [%s] terminated with status %04x",
+                 $child_pid, $child_status);
+    }
+    $self->delete_child($child_pid)  if $child_pid;
+  }
 
 }



Mark


More information about the amavis-users mailing list