RISKS-LIST: RISKS-FORUM Digest  Tuesday 24 January 1989   Volume 8 : Issue 14

        FORUM ON RISKS TO THE PUBLIC IN COMPUTERS AND RELATED SYSTEMS
   ACM Committee on Computers and Public Policy, Peter G. Neumann, moderator

Contents:
  Re: Medical Software -- testing and verification (Dave Parnas)
  NSA and the Internet (Vint Cerf)
  Re: Losing systems (Geoff Lane)
  Computer Emergency Response Team (CERT) (Brian M. Clapper)
  Probability and Product Failure (Geoff Lane)      [lack of independence]
  Probabilities and airplanes (Robert Colwell, Mike Olson, Dale Worley)

----------------------------------------------------------------------

Date: Mon, 23 Jan 89 07:48:13 EST
From: parnas@qucis.queensu.ca (Dave Parnas)
Subject: Re: Medical Software (Are computer risks different?) (RISKS-8.7)

In his contribution to Risks 8.7 Jon Jacky makes the statement that the
problems of testing and verification are broadly similar whether the machine
includes a computer or not.  I have heard this argument from many "old-time"
engineers and consider it quite false.  In the testing of conventional (analog)
devices we make use of the fact that the functions are continuous and that one
can put an upper bound on the derivatives or the frequency spectrum.  We use
those mathematical properties in deciding how many tests are required to
validate a device.  When digital technology is involved there are no limits on
the rate of change.  Further, with digital technology, the number of tests
required for black-box testing increases sharply with the lowest known
upper-bound on the number of states in the device.  If we do "white-box"
testing, we can reduce the number of tests required by exploiting the
regularity of the state space.  In practice, the regularity is present and
helpful for the testing of hardware but not terribly useful for software
testing.  In short, the technology being used does make a big difference in
testing and validation.

While I agree with Jon's statement that industry practices in software
development are often much worse than for other kinds of technology, that is
not the only explanation of our "special problem".  The technology itself is a
great contributor and always will be.

David Parnas, Queen's University, Kingston Ontario

------------------------------

Date: 23 Jan 1989 01:11-EST
From: CERF@A.ISI.EDU
Subject: NSA and the Internet

John Gilmore asks why NSA has 5 IMPs if they are NOT monitoring the
Internet. So far as I know, NSA does not have 5 IMPs on the Internet. It has
one to support Dockmaster. The agency has a variety of internal networks, of
course, but none are likely to be linked to the Internet since they are used
for classified applications for which the Internet is not approved.

Does Mr. Gilmore have some evidence he wishes to present that suggests the
NSA is engaging in an unacceptable activity on the Internet?
                                                                  Vint Cerf

------------------------------

Date:       Mon, 23 Jan 89 09:34:58 GMT
From: "Geoff. Lane. Tel UK-061 275 6051" <ZZASSGL@CMS.UMRCC.AC.UK>
Subject:    Re: Losing systems

In my experence the single most probable cause of a software project failing is
that the people who started the project have no real idea what they want in the
end. Almost everything else can be coped with but when you have to deal with a
constant stream of "design changes" not even the best people with the best
equipment can succeed.
                                        Geoff. Lane, UMRCC

------------------------------

Date: Mon, 23 Jan 89 10:28:18 PST
From: Dave Platt <dplatt@coherent.com>
Subject: Computing Projects that Failed

On the subject of computing projects that failed for one reason or
another: I recommend that interested Risks readers look up some of Bob
Glass's books on this subject.  Glass has collected quite a number of
case-studies, changed the names to protect the innocent [and the
guilty, too], and organized them into categories according to the
primary reason for the failure (immature technology, wrong technology,
mismanagement, misimplementation, politics, etc.).  Some of the stories
are roaringly funny... f'rinstance, the mainframe at "Cornbelt U." that
survived a series of mishaps during installation (including being watered
by the University's lawn sprinklers), only to end up destroying itself
(and most of the building) during an earthquake.

Glass has written half-a-dozen books on the computing industry (most of
them date back to the '70s and early '80s).  The three most applicable
to Risks issues are: "Computing Projects that Failed", "Computer Messiahs:
More Computing Projects that Failed", and "Computing Catastrophies".
[I may be off a bit in the exact wording of the titles;  my copies are
at home.]

Based on the recent contributions to Risks concerning recent software-
project failures, it sounds to me as if most of the pitfalls that Glass
wrote about back in the '70s are alive and well in the late '80s!

Dave Platt    FIDONET:  Dave Platt on 1:204/444        VOICE: (415) 493-8805
  UUCP: ...!{ames,sun,uunet}!coherent!dplatt     DOMAIN: dplatt@coherent.com
  USNAIL: Coherent Thought Inc.  3350 West Bayshore #205  Palo Alto CA 94303

------------------------------

Date: Mon, 23 Jan 89 13:54:59 pst
From: Benjamin Ellsworth <ben%hpcvlx@hp-sde.sde.hp.com>
Subject: Re: Object Oriented Programming

Recently a professor from the local university taught a class on OOP at our
site.  During the first lecture, he said that via OOP one can add
functionality to the module without changing the code.  I asked
incredulously, "Without changing *any* code?"  He said, "Yes."  A manager at
the class sagely nodded his head.

I should hope the risks are obvious.

------------------------------

Date: Tue, 24 Jan 89 02:20:32 -0500
From: attcan!utzoo!henry@uunet.UU.NET
Subject: re: Losing Systems

>The losing systems almost always contain some elements of newness; in fact on
>close inspection they usually contain several such elements...

To quote from John Gall's SYSTEMANTICS:  "A complex system that works is
invariably found to have evolved from a simple system that worked."  So
perhaps it's not so surprising that a lot of these done-yet-again-from-
scratch systems (how many different county records systems does the
world NEED?!?) fail.

                                     Henry Spencer at U of Toronto Zoology
                                 uunet!attcan!utzoo!henry henry@zoo.toronto.edu

------------------------------

Date: Tue, 24 Jan 89 10:18:01 EST
From: clapper@NADC.ARPA (Brian M. Clapper)
Subject: Computer Emergency Response Team (CERT)

Excerpted from UNIX Today!, January 23, 1989 (reprinted without permission)

WASHINGTON -- The federal government's newly formed Computer Emergency
Response Team (CERT) is hoping to sign up 100 technical experts to aid in
its battle against computer viruses.
     CERT, formed last month by the Department of Defense's Advanced
Research Project Agency (DARPA) ... expects to sign volunteers from
federal, military and civilian agencies to act as advisors to users facing
possible network invasion.
     DARPA hopes to sign people from the National Institute of Science and
Technology, the National Security Agency, the Software Engineering
Institute and other government-funded university laboratories, and even the
FBI.
     The standing team of UNIX security experts will replace an ad hoc
group pulled together by the Pentagon last November to deal with the
infection of UNIX systems allegedly brought on by Robert Morris Jr., a
government spokesman said.
     CERT's charter will also include an outreach program to help educate
users about what they can do the prevent security lapses, according to Susan
Duncal, a spokeswoman for CERT.  The group is expected to produce a
"security audit" checklist to which users can refer when assessing their
network vulnerability.  The group is also expected to focus on repairing
security lapses that exist in current UNIX software.  To contact CERT, call
the Software Engineering Institute at Carnegie-Mellon University in
Pittsburgh at (412) 268-7090; or use the Arpanet mailbox address
cert@sei.cmu.edu.

------------------------------

Date:       Mon, 23 Jan 89 09:17:33 GMT
From: "Geoff. Lane. Tel UK-061 275 6051" <ZZASSGL@CMS.UMRCC.AC.UK>
Subject:    Probability and Product Failure

Unfortunately, from reports here in Britain after the M1 plane crash, it
appears that there is a real problem with "Common Mode" failures in aircraft
engines. So if one fails then the probability of a second failing during the
same flight is much higher than would be expected.  The probabilities of
failure are not independent.

(BTW - in "fly by wire" systems they attempt to avoid common mode errors in the
software by having three independent groups implementing the system on three
different types of processor. Firstly this does NOT eliminate the problems of
errors in the system specification from which all three designs are derived.
Secondly what happens 10 years later when the software is updated to
incorporate new developments - are three more independent software houses
commissioned to produce the new software - or would this be done in-house by
some part-time students?)
                                           Geoff Lane UMRCC.

------------------------------

Date: Sun, 22 Jan 89 14:20:00 EST
From: mfci!colwell@uunet.UU.NET (Robert Colwell)
Subject: Probabilities (Re: RISKS-8.12)
Organization: Multiflow Computer Inc., Branford Ct. 06405

There is a definite danger to this analysis, stemming mostly from its essential
correctness.  There was a plane within the last two years (if memory serves)
that lost all three of its engines on a flight precisely because such events
are not necessarily independent.  Turned out that the same mechanic had worked
on all three and made the same mistake on all three (left off an oil seal, I
think).  Another example is the nuclear reactor fire of a couple of years ago,
where all the redundant control wiring was for nought because somebody routed
them all through the same conduit, so they were all destroyed at the same time.

One must be extremely careful with abstract analyses like these -- they can
be seductive, and they can lead to unjustified conclusions.

------------------------------

Date: 23 Jan 89 10:05:05 PST (Mon)
From: mao@blia.UUCP (Mike Olson)
Cc: buck@siswat.UUCP, LordBah@cup.portal.com
Subject: real discrete probability and airplanes

as at least two people have pointed out, my analysis of the likelihood of
failure was wrong.  i claimed that the probability of two engines failing
out of three was 6(p**2); the correct answer, of course, is 3(p**2).

thanks to A. Lester Buck (siswat!buck) and LordBah@cup.portal.com for
pointing out my error in a way befitting the kinder, gentler nation we
now live in.  it's not quite clear what i was computing, but it certainly
wasn't probability.  it wasn't even conditional probability, since i got
the independence argument wrong.

it's important to remember one of the real risks of the network -- the
potential for embarassing yourself in front of hundreds (thousands?) of
intelligent people.  next time, i check my work.
                    mike olson, britton lee, inc.
                    ...!ucbvax!mtxinu!blia!mao

   [Also noted by Mike Wescott (m.wescott@ncrcae.Columbia.NCR.COM) and
   Dale Worley.  PGN]

------------------------------

Date: Mon, 23 Jan 89 10:11:29 EST
From: worley@compass.UUCP (Dale Worley)
Subject: Probability

Actually, given that the probability of an engine failing during the trip,
year, etc. is p, and the probability of it not failing is q = 1 - p, then:

the probability of 0 engines failing is q**3
the probability of exactly 1 engine failing is 3 p q**2
the probability of exactly 2 engines failing is 3 p**2 q
the probability of all 3 engines failing is p**3

Given (we hope!) that p is very small, q is essentially 1, then p >>
p**2 >> p**3, so we can approximate:

  The probability of (at least) 1 engine failing is 3 p .
  The probability of (at least) 2 engines failing is 3 p**2 .

The trouble with "the probability of one engine failing is ... and the
probability of one of the remaining two failing is ..." is that is
double-counts the failures, for instance the probability of engine A failing,
*then* engine B is approximately 1/2 p**2, not p**2 as assumed by the previous
poster -- the other 1/2 p**2 times, engine B fails before engine A.

Dale Worley, Compass, Inc.                      compass!worley@think.com

------------------------------

End of RISKS-FORUM Digest 8.14
************************
-------