[chbot] Software and system safety / Therac25 (Re: Drone Delivery Service)

Wed Oct 16 05:31:34 BST 2013

Hello Helmut

I for one would find such a talk/discussion very interesting.

Although there have been some great strides in software safety, where I have 
looked into things I see quite a bit of "baffle them with BS".

You might have heard of the Navy Procurement Office version of Archimedes' 
Principle: "The ship doesn't float until the paperwork weighs more than the 
ship." It would seem that some software practices sometimes fall in this 
category.

It is rather telling that some modern projects generated with all the 
procedures still have had spectacular failures.

The Patriot Missile bug in 1991. Killed 28 and injured about 100 people. 
Unlike the Therac bug (which required a user to input keystrokes in a certain 
pattern) the Patriot bug **always** manifests if the system runs for more 
than about 24 hours. The Patriot bug was far more obvious than the Therac25 
bug, yet somehow the procedures and paperwork didn't catch it.

It is telling that this bug killed far more people than Therac25 yet people 
get far less upset about it. Perhaps people became a lot more used to, and 
less fearful of, software bugs in the interum.

Ariane 5 is another spectacular product of complex procedures and products, as 
well as using Ada (which many people regard as a magic formula for safety). 
Indeed, it was Ada that caused the Ariane crash.

Regards

Charles

On Tuesday 15 October 2013 22:31:21 Helmut Walle wrote:
> All very good points, Charles! Some further comments below...
>
> On 15/10/13 15:44, Charles Manning wrote:
> > [...]
> > If we have a life saving product that will save 1000 people per year,
> > but it has a bug that will kill 5 people per year then the cautionary
> > approach would be to delay release until we have ironed out the bug
> > that kills 5 people. Meanwhile we are not saving those other 1000
> > people per year, so by withholding the product we are letting people
> > die. Such delays in the name of safety can actually cause more
> > problems than they solve.
>
> Yes, there is a balance to be found here, and modern regulations and
> standards do reflect this very adequately. The concepts used are ALARP
> risks (As Low As Reasonably Practicable), which is a threshold to be
> looked at as part of the risk analysis. And what is considered
> acceptable as ALARP is qualified in the respective standards. For
> medical equipment specifically, there also is the concept of "the
> benefits outweighing the risks".
>
> Example: using a wheelchair is a lot more hazardous than you might think
> when you see a wheelchair. Now wheelchair design is somewhat constrained
> by the need to be able to manoeuvre inside buildings, which usually
> leads to a short vehicle with narrow track width. On top of that, it
> needs to have some height, so that the user can "sit" at the table like
> on a regular chair, and also to be able to reach things. The result is a
> significant tipping risk, and if you are looking at the accident
> statistics you will find that most fatal wheelchair accidents involve
> tipping the vehicle. Restraint systems exist, but are of limited use in
> an open vehicle like this. So these risks are well known, and they are
> consciously accepted, because the mobility benefit people get out of
> these vehicles are so great that a couple hundred fatal accidents
> worldwide per year pale in comparison. This is sad for the people who
> die in these accidents, but basically what we are saying here is a
> reflection on something like "staying in bed is a lot safer than getting
> up and leaving the house" - never mind that if you stayed in bed all day
> you would get decubitus ulcers.
>
> So to summarise this: we are not permitted to make things with
> completely arbitrary risks to the user, by using the cheap excuse that
> nothing in life is perfectly safe. Manufacturers need to analyse risks,
> and assess and evaluate them and come to the honest conclusion that
> their products are acceptable in this regard. And most of them selling
> into well-regulated markets (NZ, AU, EU, US) will do this, because if
> they don't a single incident can be enough for the respective
> authorities to close down their business.
>
> > We see a similar thing happening after Sept 11th when planes were
> > grounded in the interests of public safety and later when people chose
> > not to fly. This lead to more car usage and approx 1500 more car
> > deaths in USA than in the previous or next year.
>
> Yes, this is different, because it is not about making any products.
> It's just a social and administrative response to a one-off incident.
> And we are all just human and as such often act irrationally.
> (Interesting point on the side regarding aviation safety - what is the
> purpose of these buoyancy vests that you find under each seat in
> passenger airliners? One more of these irrational things: if you crash
> into the water hard like AF447 everybody on board is dead before anyone
> could even say "vest"; if you manage a soft landing on water, which is
> technically quite possible, either the water is so cold that you freeze
> to death pretty quickly, or it is warm enough for big sharks...
> Although, maybe I am judging too harshly - did any of the passengers in
> Captain Sullenbergers Hudson River landing benefit from the vests?
> Probably...).
>
> > We get worried about electronics & software failing in a braking
> > system and such, but are less worried about mechanical failures (eg.
> > broken cables) which are far more common.
>
> Yes, and there clearly is a difference between the wider public worrying
> on one side, and how a proper product or system risk analysis is
> performed. Electro-mechanical systems that incorporate a diverse range
> of technologies and parts are required to be analysed at system and
> component levels. And it is understandable to some extent that software
> is subject to intense scrutiny, because it is far less tangible or
> visible than mechanical parts.
>
> > As a species we have a very irrational way of looking at risk, and,
> > back to my main point, lawyers exploit that irrationality to make it
> > really hard to release new products.
>
> Agreed on the human factor's irrationality. Regarding the lawyers, I
> don't know. From my personal experience of working in the areas of
> medical equipment and electrical power supply and control systems, and
> also from taking a look at some whiteware regulations, I have to say
> that we now have some very good and appropriate regulations in these
> areas that also help to get the job done in an organised way. What the
> lawyers then do after a product has been released to the market is a
> different story altogether, and there are huge international
> differences. I know a European bicycle manufacturer that does not export
> any bikes to the US, because the litigation risk just isn't acceptable
> to them from a business point of view.
>
> System and software safety is an interesting topic, and it is highly
> relevant to robotics - if there is further interest in this I could give
> a talk about regulatory frameworks and how they are applied in practice
> sometime, and also present some examples of incidents as case quick case
> studies. It may first sound like a dry topic, but once it is seen in
> relation to people and how they might come to harm and should be
> protected it actually becomes much more interesting. Let me know if
> there is interest in this...
>
> Kind regards,
>
> Helmut.
>
> _______________________________________________
> Chchrobotics mailing list Chchrobotics at lists.linuxnut.co.nz
> http://lists.ourshack.com/mailman/listinfo/chchrobotics
> Mail Archives: http://lists.ourshack.com/pipermail/chchrobotics/
> Web site: http://kiwibots.org
> Meetings 3rd Monday each month at Tait Radio Communications, 175 Roydvale
> Ave, 6.30pm
>
> When replying, please edit your Subject line to reflect new content.