[chbot] Is there a statistician in the house?

Sun Oct 18 07:25:42 BST 2020

You can use a normal distribution for your random variable (the number of failures within a
sample) if the pass-fail outcomes of the individual elements of that sample (the units) are
statistically independent of each other, and identically distributed (but these individual
outcomes need not be following a normal distribution themselves).
In practice this means that you cannot easily analyse the situation if your process is drifting
throughout the production of one batch - in that case you would have to stabilise your process
first. But as long as there are just some random variations, you just have to select the units
that go into the sample as randomly as possible.

With that given, the number of failures in a sample then follows a normal distribution (for
large sample sizes, that is - strictly speaking, it is a binomial distribution for finite sample
sizes, but in practice this can be approximated by a normal distribution even for sample sizes
as low as 100). You can calculate the average and standard deviation of the distribution.
Confidence intervals for a certain target confidence level can then be expressed as multiples of
the standard deviation. If you want a higher confidence obviously your confidence intervals will
be wider. Conversely, if you want a narrower range of outcomes your confidence will be low.

A few references that may help for your use case:

https://www.qualtrics.com/au/experience-management/research/determine-sample-size/?rid=ip&prevsite=en&newsite=au&geo=NZ&geomatch=au
https://www.dummies.com/education/math/statistics/choosing-a-confidence-level-for-a-population-sample/
https://www.quanterion.com/test-samples-how-many-are-needed/

Kind regards,

Helmut.

On 18/10/2020 17:57, Stephen Irons wrote:
> I seem to remember doing calculations like this a long time ago...there are a number of
> variations which are probably all related. I have not been able to find any Google search terms
> that give me anything useful.
> 
> A factory produces a batch of 10_000 units.
> 
>   * I test 100 units; there are 3 failures. What failure rate can I expect from the whole batch?
>     What is my confidence in that estimate?
>   * I test 100 units; there are 0 failures. What failure rate can I expect from the whole batch?
>     What is my confidence in that estimate?
>   * How many units do I need to test to have 99% confidence that there will be less than 1%
>     failure rate from the whole batch?
> 
> Can someone tell me what you call this type of calculation? Point me to a suitable reference site?
> 
> All of the examples I find online are of the form: a factory produces widgets with x% failure
> rate; out of a sample of y units, what is the probability of finding z defective units...this is
> probably the same calculation from the other direction.
> 
> This is just for interest. In my specific case, I had 36 failures out of a sample of 50 taken
> from a batch of a few thousand -- this is clearly not acceptable. But we now have a repeatable
> test that causes the failure.
> 
> Stephen Irons
> 
> _______________________________________________
> Chchrobotics mailing list Chchrobotics at lists.ourshack.com
> https://lists.ourshack.com/mailman/listinfo/chchrobotics
> Mail Archives: http://lists.ourshack.com/pipermail/chchrobotics/
> Meetings usually 3rd Monday each month. See http://kiwibots.org for venue, directions and dates.
> When replying, please edit your Subject line to reflect new subjects.
>