Class PhiAccrualFailureDetector

java.lang.Object
org.apache.pekko.remote.PhiAccrualFailureDetector
All Implemented Interfaces:
FailureDetector, FailureDetectorWithAddress

public class PhiAccrualFailureDetector extends Object implements FailureDetector, FailureDetectorWithAddress
Implementation of 'The Phi Accrual Failure Detector' by Hayashibara et al. as defined in their paper: [https://oneofus.la/have-emacs-will-hack/files/HDY04.pdf]

The suspicion level of failure is given by a value called φ (phi). The basic idea of the φ failure detector is to express the value of φ on a scale that is dynamically adjusted to reflect current network conditions. A configurable threshold is used to decide if φ is considered to be a failure.

The value of φ is calculated as:


 φ = -log10(1 - F(timeSinceLastHeartbeat)
 
where F is the cumulative distribution function of a normal distribution with mean and standard deviation estimated from historical heartbeat inter-arrival times.

param: threshold A low threshold is prone to generate many wrong suspicions but ensures a quick detection in the event of a real crash. Conversely, a high threshold generates fewer mistakes but needs more time to detect actual crashes param: maxSampleSize Number of samples to use for calculation of mean and standard deviation of inter-arrival times. param: minStdDeviation Minimum standard deviation to use for the normal distribution used when calculating phi. Too low standard deviation might result in too much sensitivity for sudden, but normal, deviations in heartbeat inter arrival times. param: acceptableHeartbeatPause Duration corresponding to number of potentially lost/delayed heartbeats that will be accepted before considering it to be an anomaly. This margin is important to be able to survive sudden, occasional, pauses in heartbeat arrivals, due to for example garbage collect or network drop. param: firstHeartbeatEstimate Bootstrap the stats with heartbeats that corresponds to to this duration, with a with rather high standard deviation (since environment is unknown in the beginning) param: clock The clock, returning current time in milliseconds, but can be faked for testing purposes. It is only used for measuring intervals (duration).

  • Nested Class Summary

    Nested classes/interfaces inherited from interface org.apache.pekko.remote.FailureDetector

    FailureDetector.Clock
  • Constructor Summary

    Constructors
    Constructor
    Description
    PhiAccrualFailureDetector(double threshold, int maxSampleSize, scala.concurrent.duration.FiniteDuration minStdDeviation, scala.concurrent.duration.FiniteDuration acceptableHeartbeatPause, scala.concurrent.duration.FiniteDuration firstHeartbeatEstimate, FailureDetector.Clock clock)
    Constructor without eventStream to support backwards compatibility
    PhiAccrualFailureDetector(double threshold, int maxSampleSize, scala.concurrent.duration.FiniteDuration minStdDeviation, scala.concurrent.duration.FiniteDuration acceptableHeartbeatPause, scala.concurrent.duration.FiniteDuration firstHeartbeatEstimate, scala.Option<EventStream> eventStream, FailureDetector.Clock clock)
     
    PhiAccrualFailureDetector(com.typesafe.config.Config config, EventStream ev)
    Constructor that reads parameters from config.
  • Method Summary

    Modifier and Type
    Method
    Description
    scala.concurrent.duration.FiniteDuration
     
    scala.concurrent.duration.FiniteDuration
     
    final void
    Notifies the FailureDetector that a heartbeat arrived from the monitored resource.
    boolean
    Returns true if the resource is considered to be up and healthy and returns false otherwise.
    boolean
    Returns true if the failure detector has received any heartbeats and started monitoring of the resource.
    int
     
    scala.concurrent.duration.FiniteDuration
     
    double
    phi()
    The suspicion level of the accrual failure detector.
    protected org.apache.pekko.remote.HeartbeatHistory
    recordInterval(long interval)
     
    void
    Address of observed host will be set after detector creation.
    double
     

    Methods inherited from class java.lang.Object

    clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
  • Constructor Details

    • PhiAccrualFailureDetector

      public PhiAccrualFailureDetector(double threshold, int maxSampleSize, scala.concurrent.duration.FiniteDuration minStdDeviation, scala.concurrent.duration.FiniteDuration acceptableHeartbeatPause, scala.concurrent.duration.FiniteDuration firstHeartbeatEstimate, scala.Option<EventStream> eventStream, FailureDetector.Clock clock)
    • PhiAccrualFailureDetector

      public PhiAccrualFailureDetector(double threshold, int maxSampleSize, scala.concurrent.duration.FiniteDuration minStdDeviation, scala.concurrent.duration.FiniteDuration acceptableHeartbeatPause, scala.concurrent.duration.FiniteDuration firstHeartbeatEstimate, FailureDetector.Clock clock)
      Constructor without eventStream to support backwards compatibility
    • PhiAccrualFailureDetector

      public PhiAccrualFailureDetector(com.typesafe.config.Config config, EventStream ev)
      Constructor that reads parameters from config. Expecting config properties named threshold, max-sample-size, min-std-deviation, acceptable-heartbeat-pause and heartbeat-interval.
  • Method Details

    • acceptableHeartbeatPause

      public scala.concurrent.duration.FiniteDuration acceptableHeartbeatPause()
    • firstHeartbeatEstimate

      public scala.concurrent.duration.FiniteDuration firstHeartbeatEstimate()
    • heartbeat

      public final void heartbeat()
      Description copied from interface: FailureDetector
      Notifies the FailureDetector that a heartbeat arrived from the monitored resource. This causes the FailureDetector to update its state.
      Specified by:
      heartbeat in interface FailureDetector
    • isAvailable

      public boolean isAvailable()
      Description copied from interface: FailureDetector
      Returns true if the resource is considered to be up and healthy and returns false otherwise.
      Specified by:
      isAvailable in interface FailureDetector
    • isMonitoring

      public boolean isMonitoring()
      Description copied from interface: FailureDetector
      Returns true if the failure detector has received any heartbeats and started monitoring of the resource.
      Specified by:
      isMonitoring in interface FailureDetector
    • maxSampleSize

      public int maxSampleSize()
    • minStdDeviation

      public scala.concurrent.duration.FiniteDuration minStdDeviation()
    • phi

      public double phi()
      The suspicion level of the accrual failure detector.

      If a connection does not have any records in failure detector then it is considered healthy.

    • recordInterval

      protected org.apache.pekko.remote.HeartbeatHistory recordInterval(long interval)
    • setAddress

      public void setAddress(String addr)
      Description copied from interface: FailureDetectorWithAddress
      Address of observed host will be set after detector creation.
      Specified by:
      setAddress in interface FailureDetectorWithAddress
    • threshold

      public double threshold()