During the OCSP renewal proces lots of things could go wrong, some errors are recoverable, others can be ignored, still others could be cause by temporary issues e.g.: a service interruption of the OCSP server in question. So extensive error handling is done to keep the daemons threads running.
The following is an overview of what can be expected when exceptions occur.
|IOError/OSError||certfinder||Directory can’t be read.||Ignore, certfinder will try at every refresh.|
|CertFileAccessError||certfinder||Certificate file can’t be read.||Schedule retry 3x n*60s, then 3x, every hour, then ignore. |
|CertParsingError||certparser||Can’t access the certificate file, doesn’t parse or part of the chain is missing.||Ignore, certfinder will try at every refresh.|
|OCSPBadResponse||ocsprenewer||The response is empty, invalid or the status is not “good”.||Schedule retry 3x n*60s, then 3x, every hour, then twice a day. indefinately. If it’s not a server issue, wait for the file to change |
|urllib.error.URLError||ocsprenewer||An OCSP url can’t be opened.||We can try again later, maybe there is a server side issue. Some certificates contain multiple URL’s so we will try each one with 10 seconds intervals and then start from the first again. Schedule retry 3x n*60s, then 3x, every hour, then then twice a day.|
|requests.exceptions.Timeout||Data didn’t reach us within the expected time frame.|
|requests.exceptions.ConnectTimeout||A connection can’t be established because the server doesn’t reply within the expected time frame.|
|requests.exceptions.TooManyRedirects||When the OCSP server redirects us too many times. Limit is quite high so probably something is wrong with the OCSP server.|
|requests.exceptions.HTTPError||A HTTP error code was returned, this can be a 4xx or 5xx status code.|
|requests.exceptions.ConnectionError||A connection to the OCSP server can’t be established.|
|SocketError||ocspadder||A HAProxy socket can not be opened||Log a critical error. Every “send” action will try to re-open the socket.|
|BrokenPipeError||A HAProxy socket consistently has a broken pipe|
|OCSPAdderBadResponse||HAProxy does not respond with ‘OCSP Response updated!’||Schedule a retry 3x n*60s, then 3x, every hour, then ignore.|
|||(1, 2) When the certificate file is changed, certfinder will add the file back to the parsing queue.|
This module holds the application specific exceptions.
Gets raised when a OCSP staple is not valid.
Gets raised when a OCSP renewal is run while not all requirements are met.
Gets raised by the
OCSPAdderwhen it is impossible to connect to or use its socket.
Gets raised when the HAProxy does not respond with “OCSP Response updated”
Gets raised when a file can’t be accessed at all.
Gets raised when something went wrong while parsing the certificate file.
Gets raised when something went wrong while validating the certificate chain.
This module defines a context in which we can run actions that are likely to fail because they have intricate dependencies e.g. network connections, file access, parsing certificates and validating their chains, etc., without stopping execution of the application. Additionally it will log these errors and depending on the nature of the error reschedule the task at a time that seems reasonable, i.e.: we can reasonably expect the issue to be resolved by that time.
It is generally considered bad practice to catch all remaining exceptions, however this is a daemon. We can’t afford it to get stuck or crashed. So in the interest of staying alive, if an exception is not caught specifically, the handler will catch it, generate a stack trace and save if in a file in the current working directory. A log entry will be created explaining that there was an exception, inform about the location of the stack trace dump and that the context will be dropped. It will also kindly request the administrator to contact the developers so the exception can be caught in a future release which will probably increase stability and might result in a retry rather than just dropping the context.
Dropping the context effectively means that a retry won’t occur and since the
context will have no more references, it will be garbage collected.
There is however still a reference to the certificate model in
core.daemon.run.models. With no scheduled actions it will
just sit idle, until the finder detects that it is either removed – which will
cause the entry in
core.daemon.run.models to be deleted, or
it is changed. If the certificate file is changed the finder will schedule
schedule a parsing action for it and it will be picked up again. Hopefully the
issue that caused the uncaught exception will be resolved, if not, if will be
caught again and the cycle continues.
This is a global variable that is overridden by ocspd.__main__ with the command line argument:
Handle lots of potential errors and reschedule failed action contexts.
When something bad happens, sometimes it is good to delete a related bad OCSP file so it can’t be served any more.
Check that HAProxy doesn’t cache this, it probably does, we need to be able to tell it not to remember it.
Examine the last exception and dump a stack trace to a file, if it fails due to an IOError or OSError, log that it failed so the a sysadmin may make the directory writeable.