Would it be reasonable for fatal_exit() to dump core?

Thu Oct 17 14:16:54 UTC 2024

Yes, I agree to that.

We have Restart=on-abnormal in our systemd unit for unbound. If it met 
runtime error, which is not recoverable, it would make sure a new 
instance is started again, hopefully restoring it to working state. On 
the other hand, if I make just typo in configuration file, that is a 
final failure. But not a kind of issue where restart will likely help to 
fix it. That is why I do not like Restart=on-failure in services.

It would be great, if described errors could lead to fatal_abort indeed. 
Of course primary issue for it should be fixed. But described error is 
something, where restart should fix the issue. At least on Fedora it is 
common, that services do not create core dump by default. But that can 
be changed. At least emitting different exit code when runtime error 
happened is useful. abort() call does that and systemd process it.

Cheers,
Petr

On 15. 10. 24 5:33, David Pfitzner via Unbound-users wrote:
> Recently, I have had cases where unbound (v1.20.0) occasionally exits with
> a log message like:
>
>    fatal error: event_dispatch returned error -1, errno is Bad file
> descriptor
>
> If that is a known issue I would be interested to hear, but that is not
> actually my main point, which is: In this case it is not clear (at least to
> me) what the detailed cause of the fatal error was, and so I think it would
> be useful if unbound would generate a core file in cases like this, as that
> might help to understand the problem. That is, for the fatal_exit()
> function (which generates the above message) to call abort() rather than
> exit(1). So my question is, would it be reasonable to modify fatal_exit()
> to do that?
>
> I could imagine possibly not, because fatal_exit() may be called in a lot
> of cases, including, for example, bad configuration, and in many of those
> cases the cause of the error may be immediately obvious, so a core file
> could be considered superfluous.
>
> Or, would such a feature be more palatable if it was enabled by some sort
> of global config option or command-line parameter etc?
>
> Alternatively, one could change just the code (in comm_base_dispatch())
> which calls fatal_exit() with the above message, so that it dumps core
> instead of calling fatal_exit(). But then I would worry that other calls to
> fatal_exit() may have a similar problem in future.
>
> Or, maybe there should be two functions, eg fatal_exit() and fatal_abort(),
> and cases where the cause could be unclear could use the latter one.
>
> Any thoughts?
>
> I have applied a local change to make fatal_exit() dump core for me, but
> was wondering whether something like that could be applied upstream.
>
> Regards,
> David Pfitzner
>
-- 
Petr Menšík
Software Engineer, RHEL
Red Hat, http://www.redhat.com/
PGP: DFCF908DB7C87E8E529925BC4931CA5B6C9FC5CB