What does the Engineer Do?
An analysis which especially occupies the developer of the overall system.
- Hazard-/ Risk Analysis: First the safety level of the system or subsystem respectively of a function must be determined. This is derived form the possible damage and the probability with which such a damage occurs, in most cases based on some kind of flow diagram. Especially in aviation there are predefined safety levels for different systems. Here a short glossary of the most common abbreviations:
|Range ||Highest Risk Level ||Industry |
DAL: Design Assurance Level
|E..A ||A ||Aviation: ARP4761, ARP 4754A, DO-178C, DO-254... |
SIL: Safety Integrity Level
|1..4 ||4 || |
Industry, IEC 61508 and railway, EN 50128/9
ASIL: Automotive Safety Integrity Level
|A..D ||D || |
Automotive, ISO 26262
PL: Performance Level
|a..e ||e || |
Machinery, EN 62061
The other analyses then take place on each level: system, subsystem, component, function, as well for software as for hardware, sometimes with different characteristics. So those affect each developer. There are different variants, the most important are:
- Fault Tree Analysis (FTA): The FTA proceeds deductively, i.e. from the failure to the cause. What are the faults in my system that can lead to a certain failure. E.g. which components must fail so that the safety-relevant function is compromised?
This makes this method suitable for design, especially for the top-down system design. The FTA exists in two variants, a purely qualitative and a quantitative, for which probabilities are assigned to the fault events.
- Failure Modes and Effects Analysis (FMEA): In contrast the FMEA proceeds inductively, from the cause to the failure. For each subsystem/ component the question asked here is: What kind of safety-relevant failures can arise from a fault. E.g. if this component changes its value over time (i.e. it ages), how does this affect the function? If a state-machine swallows a bit, how does this affect the function? The FMEA also exists in two variants, a purely quantitative and a quantitative one. For the latter the analysis is based on fault probabilities and fault mechanisms for the components.
In the industrial and automotive areas usually a FMEDA (Failure Modes, Effects and Diagnosis Analysis) is performed in which a reduction of the failure rates is calculated for diagnosis mechanisms (e.g. read back of output signals).
Based on the safety analyses, safety measures have to be implemented to catch the discovered faults.
- against random hardware failures
- against systematic software failures
- against systematic hardware failures
These measures can comprise: plausibility checks, redundancy (i.e. several systems which are checking each other), diverse redundancy (redundancy based on components that are built and developed complete diverse), program flow monitoring, error correction for memories and many more.
Errors in the requirements are the most prevalent cause of failure. This is why a lot of importance is attached to requirements in functional safety. Though several aspects have to be considered:
- V-Model: The requirements must be manged according to the V-model in all industries, this means:
- There are successively more detailed requirements on each level (e.g. system, software, software unit). The extent of requirements for each element (system, software, unit) should be so, that a human still can understand them, the details are moved to the next lower level.
- Basically all requirements are being tested on each level.
- Requirements Traceability: Requirements and Test must be traceable, amongst others to make sure the overall product remains maintainable:
- Vertically: it must be clear which requirements on one level are covering the more abstract requirements on the next higher level.
- Horizontally: it must be clear which tests are testing which requirements.
- Bi-directional: it must be possible, starting from one level to follow the relationships to all other levels.
- Traceability Coverage Analysis: Evidence must be provided that all requirements on each level exist as more detailed requirements down to the implementation and that all requirements are tested.
- "Derived" Requirements: If from architecture or design new requirements are generated, "derived" requirements are generated, e.g. from interfaces between different subsystems. This means that "derived" are those requirements that cannot be traced to higher levels. Such requirements must undergo a separate analysis. It must be established that they are not jeopardizing the function of the superordinate element and the safety.
- No Unintended Functionality: Another important aspect of the handling of (especially "derived") requirements and traceability is to prevent that unintended functionality is inserted into the implementation by e.g. the programmer. These usually comes from interpretable, i.e. not accurate enough requirements or from good intentions like defensive programming. Both can lead to unintended malfunctions.
About the V-model an important point: The V-model should not primarily be seen as a Gantt Chart, but before all as a data management model. It maps the "divide and conquer" and the relationship between the artifacts. In practice this means that one cannot get by without iterations between the levels. Of course, those should be minimized as much as possible for the sake of efficiency. A sequence naturally follows because on the lower levels of detailing no specification and design can be completed if on the higher ones the artifacts are not stable and released. As it is impossible in the higher integration levels to finalize testing as long as on the lower level not all tests have run completely.
Verification is often equated with testing. Here, this is not true, tests are just a small part of verification. Most of verification consists of reviews.
- Reviews: Before their release, all artifacts must be verified by a review, often even by a reviewer with exactly defined independence. For some of the artifacts several reviews take place, if a quality assurance review or a review with respect to the standard is requested.
- Checklists: Usually, for each artifact there exists a checklist. Without evidence of the performed reviews the reviews are considered not done, so the filled-in checklists must be filed as review results.
- Tests: There are test specifications, test instructions, maybe test code and also here again evidence of all test results. The tests must be requirements based, amongst others there may be no test without an according requirement.
- Code-Coverage Analysis: For the software, tests must make sure that all code is covered by the tests, including that all branches were taken. Note that it is coverage analysis, coverage is not a test for itself, but rather an analysis whether the tests satisfy some minimal quality criteria. Coverage can be demonstrated using tools for dynamic code analysis.
From the required code coverage and the requirements based testing results by the way that it is not allowed (explicitly so in avionics with DO-178C), to write tests for the code coverage for which no requirements exist. So let's just generate a requirement? ...which as "derived" requirement then needs a safety analysis. There must be no unintended functionality. This is why it is worthwhile only to implement that which is really required.
To ensure a homogeneous quality over the overall project, for many artifacts standards are called for. Those can be developed internally, but it makes it easier to deal with the external auditors if known standards are used, e.g. MISRA for C/ C++ code.
- Requirement Standards: Those describe how requirements must be formulated, down to formatting.
- Design Standards: Clear guidelines for the design, they must cover all requirements of the standards, like no hidden data flow, hierarchical design...
- Coding Standards: For the software, only a safe, deterministic and well readable subset of the programming language shall be used. Coding standards like MISRA can be substantiated for the most part automatically using tools for static code analysis.
For electronics only high quality components should be selected. Those should be available as long as possible, so the safety evidence upon component changes does not have to be provided again and again. In addition it would be favorable to have good data for the calculation of the failure rates.
Unfortunately outside the AEC-Q certificates for automotive there exist almost no "high reliability" part anymore. Also the "standards" with numbers for failure rates are a victim of the ravages of time respectively the technological advances. And because to my knowledge there is no organization that collects new statistic data on failure rates and modes of modern components, it can be very troublesome to calculate a realistic analysis of the failure rate of a circuit.
No modern electronics or software development without software tools. Software? Is the software of all tools in the project without errors? What happens if an error in a tools leads to en error in an artifact?
- Tool Classification: In a project for functional safety this means that all tools have to be classified. It must be shown whether and if yes which errors a tool can generate in an artifact.
- Tool Qualification: According to the result of the above analysis the tools must be qualified. I.e. it must be demonstrated that the tool as it is used does not generate this error or that the errors can be caught.