|
Krouwer Consulting
|
|
|
Building and quantifying fault trees – an example This example will show how a fault tree helps in completing a FMEA. The example will also demonstrate some quantification. Introduction and starting point
FMEA is a “bottoms-up” approach and a fault tree is a “top-down” approach. Both approaches are useful. A difficulty with FMEA is that the entries form part of a table and unlike a fault tree, much of the structure within the table cannot be expressed. This example is an hCG blood test. The component that is being investigated is the reagent. The starting point for this section of the FMEA is: Failure
mode – outlier result The questions are: What is
the failure effect? This gives the following FMEA fragment:
A corresponding fault tree fragment is: Outlier result OR HAMA interference in assay Assume that this part of the FMEA is concerned with potential harm to the patient. There could be other effects of outliers too, such as customer complaints. An outlier result by itself does not inform one about severity with respect to patient harm, because patients are not directly connected to the assay. To assess the importance of the outlier, one must know what happens with outlier results. There are many possibilities. Consider one, where the hCG value is elevated (falsely) leading to a diagnosis and treatment of trophoblastic carcinoma, when the patient does not have this condition. The fault tree now looks like this Error – Patient harm OR – Outlier result OR HAMA interference in assay In the VA scheme for severity (1), this would be severity 3 (severe injury, but not death). What about frequency of occurrence of the root cause? In this case, “HAMA interference” is considered as a discrete event – the assay either interferes or doesn’t (due to its design and formulation) and this assay interferes, so in principle, the frequency of occurrence is always! But this implies that every hCG sample assayed results in an outlier and this is not the case. What’s missing is that for an outlier to occur, the patient sample must have human anti mouse antibodies in sufficient quantity. So now the fault tree looks like the one below. Note that there are two AND events and that the original root cause of HAMA interference in the assay has been changed from an OR to an AND gate. Both AND events have to occur for an outlier to result. Assume that 1% of patient samples have human anti mouse antibodies. This gives a frequency of occurrence for outliers of 1%. Error – Patient harm OR – Outlier result: freq. 1% AND Human anti mouse antibodies in patient sample: freq. 1% AND HAMA interference in assay: freq. 100% However, in a real lab, results are reviewed before they are reported, so this step must be added. The result review can be considered as an error, detection, recovery scheme.
HAMA assay interference is a known problem with immunoassays and there are methods to detect it, which may be performed for certain assay results according to the lab’s rules. Assume that detection is successful 75% of the time (in detecting errors due to HAMA interference). Recovery means that the assay will be repeated to eliminate the interference and the new result reported to the clinician. Be aware that recovery is not always 100% effective – it can fail. Assume that in this case it is 99% effective. What is the outlier rate, given these assumptions? An Excel routine to automate these calculations has been programmed and is available here. In this case,
Error – Patient harm OR – Outlier result reported: freq. 26 per year OR – Result review fails: EDR Sequence* OR – Outlier result AND Human anti mouse antibodies in patient sample AND HAMA interferences in assay *EDR = error, detection, recovery One still has to take into account two more things: 1) the outlier result must fall into a specific region of a (Parks type) error grid (2) to cause this level of patient harm and 2) the clinician must act on the incorrect result. The error grid means that outliers (e.g., large errors) that don’t cross medical decision limits are not as dangerous as errors that do cross medical decision limits. In addition, the clinician has the opportunity to question the result and (for any reason) not act on it. If this happens, there may be no patient harm and in any case the outlier is not involved. Assume that these values are: 26 x (outlier percent in dangerous region) x (percent clinician acts on incorrect result) = 26 x (5%) x (50%) = 0.65 This gives as the final fault tree for this cause and effect: Error – Patient harm (1) frequency of occurrence ~= slightly more than once in two years AND – Clinician acts on incorrect result (2) AND – Outlier falls into dangerous region of error grid (3) AND – Outlier result reported (4) OR – Result review fails: EDR Sequence*(5) OR – Outlier result (6) AND Human anti mouse antibodies in patient sample (7) AND HAMA interferences in assay (8) OR – Other causes *EDR = error, detection, recovery Note that there are other possible causes for the outlier to occur (the bottommost OR gate), which would raise the frequency of this type of patient harm, but these causes are distinct from HAMA interference. Also, the original question of the frequency of occurrence of the root cause is being addressed by the frequency of occurrence of the effect of the root cause. Thus, the fault tree has helped to inform the FMEA. An outlier has many possible failure effects, the one studied here has a severity of 3 and causes serious harm to the patient and has a risk to occur of slightly more than once in two years which in the VA frequency scheme is the second highest frequency of occurrence. It’s hard to imagine this level of analysis with only a FMEA table.
Further discussion This fault tree could still considered to be simplified and of course all of the numbers have been made up, but note that there have been 12 cases reported recently in which unnecessary treatment was carried out due to incorrect hCG results caused by HAMA interference (3). A quantification of an entire fault tree (or a large subsection) requires algorithms which are available only in advanced (and expensive) fault tree software. This software is warranted in these cases, provided that one has good input data. The fault tree helps to suggest risk mitigations. For this example, among the possible lab risk mitigations are:
A manufacturer’s risk mitigation would require an expanded fault tree, with causes listed for the HAMA interference. This also illustrates the concept of not enumerating causes when they are not relevant. That is, a lab may know possible reagent causes for HAMA interference, but if the lab must use a manufacturer’s assay without reagent modification, these causes are not relevant. Finally, note that whereas a risk mitigation (or initial analysis) may result in a very tiny frequency of occurrence (e.g., once in 1,000 years) it still won’t be zero. Building fault trees using a top down approach This example was for illustration purposes. This is because the example involved building a fault tree from a FMEA, which while possible would not be how a fault tree is typically done. If one were normally building a fault tree, one would use a top down approach. The end result would be the same. Thus, the main error types are: Lab error Expanding the harm to patient Lab error Expanding the outlier event, with help of a process flowchart Lab error Continuing with this tree would give the same results as above. Note that the process flowchart does not help with all parts of the fault tree. References
|