Surname survival program- final remarks

Moderator: MOD_nyhetsgrupper

Svar
raylopez99

Surname survival program- final remarks

Legg inn av raylopez99 » 01 okt 2007 15:31:11

http://www.sendspace.com/file/evao2n

Refer to the above link to download a Word version of the below
remarks, which might be easier to read.

RL


I finished the second version of the genealogy program referenced in
these threads here: http://tinyurl.com/2ozylv and here: http://tinyurl.com/37avfk
..

Below are the results.

CONCLUSION:

See below for explanation. The first "N" below is what I term the
"default" or "normal" rate.

N=#boys / Iterations (Simulated Family Histories) / Average Surname
Survival / Extinction Rate

N= 1 or anything / 6000 iterations / 25% survival / 75% extinction
rate

N= 2 / 2000 iterations / 40% survival / 60% extinction rate

N= 3 / 2000 iterations / 55% survival / 75% extinction rate

N= 4 / 1000 iterations / 67% survival / 33% extinction rate

N= 5 / 2000 iterations / 74% survival / 26% extinction rate

N= 6 / 2000 iterations / 78% survival / 22% extinction rate

N= 7 / 1000 iterations / 82% survival / 18% extinction rate

N= 8 / 2000 iterations / 87% survival / 13% extinction rate

N= 9 / 2000 iterations / 90% survival / 10% extinction rate

N= 10 / 3000 iterations / 92% survival / 8% extinction rate



BACKGROUND:

The topic is what percent of surnames survive in a population over
time, given that western surname survival is a Galton Watson process
(http://en.wikipedia.org/wiki/Galton-Watson_process). If you want to
be scientific, it's also the percent survival of the male "Y"
chromosome for any one person or founder of a family line, over time.

The distribution of males that reach reproductive age (hereinafter
"boys") was assumed to be a Poisson distribution having mean (lambda)
1.15, as per a suggestion by WG Whalley, which is the historical
figure for western populations.

Various initial scenarios are assumed. The normal, and default, is
having any number of boys (including zero), or having one boy in the
first generation (turns out these two scenarios are nearly logically
the same, since having exactly one boy in the first generation simply
shifts the generation survival question by one generation but has no
other effect). For this first normal scenario of having one or any
other number of boys (including zero) in the first generation, the
survival rate, after 6000 simulations (equivalent to a computer
simulation of a family history) is an average of 25% survival of
surnames. The other scenarios are having any number of boys N
initially, where N=2,3,4,5,6,7,8,9 and 10; these results are
summarized above.

BIZARRE OR UNUSUAL FINDINGS:

Having a large family with lots of initial boys is no absolute
guarantee of survival of family surname. While having 9 or 10 boys in
a family gives over 90% survival rate for the surname of the founder
(and/or "Y" chromosome), there's always that 10% chance of going
extinct. See below examples of families going extinct with 9 or 10
initial boys, in as early as four generations! This happened quite
often for any run of 1000 simulations. Four generations is the
shortest I have found for N=9 or 10; nothing even quicker. For the
below examples of 9 or 10 initial boys that died out in four
generations, you can check for yourself, by hand, that the program
output is correct.

On the other end of the spectrum, the program found many generations
that went extinct even after many previous generations of survival.
This is seen most spectacularly in families with large numbers of
initial boys (because these family surnames usually survive, but when
they don't, it's spectacular to see them fail-and often it takes a
while for these lines to fail). For example, while most families
either succeed or fail in surname survival before about 25 to 50
generations, a family with two boys failed (surname went extinct) in
156 generations; another family with three boys initially died out
after 88 generations; another family with four initial boys died after
113 generations; another family with five initial boys died after 162
generations; and another with 10 initial boys died in 111 generations!

PROGRAM OUTPUT:

How to read the below array: the array below shows the offspring of
each male, and is traversed "breath-first" or by "level - order"
traversal in a tree starting from the initial founder (see
http://en.wikipedia.org/wiki/Tree_traversal note the example "Level-
order traversal sequence: F, B, G, A, D, I, C, E, H"-this is how the
program here works). The first line is the number of boys the initial
founder had, then the next lines refer to the number of boys his boys
had, and so on, as per the "level-order traversal" example in
Wikipedia, also see this example: http://en.wikipedia.org/wiki/Breadth-first_traversal
.. A simple example: the lines 1,2,1,2,0,0,0 means the founder had
one boy, who had two boys, each of whom had 1 and 2 boys,
respectively, and these three boys had zero boys (the last three
zeros) so the family surname died out.

Examples (arrays are truncated for brevity):

This array went extinct after 156 generations! (Initial input 2 boys)
---+++------------------
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
Date: 9/30/2007 14:51:28; elap.time 0.2103024 extct? True % :39 itt:
214 cur.gen 156 TheNodeTree.NodeCount() 1434
CandidateNodesQueue.Count:0!END
Time now is9/30/2007 14:51:28
--
This array went extinct after 88 generations (initial input 3 boys)
---+++------------------
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 3
Date: 9/30/2007 14:59:27; elap.time 0.1802592 extct? True % :54 itt:
434 cur.gen 88 TheNodeTree.NodeCount() 784 CandidateNodesQueue.Count:
0!END

This array, with initial input 4 boys, went extinct after 113
generations!
---+++------------------
OffSpringBoys [0,i] is: 4
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
Date: 9/30/2007 15:15:05; elap.time 0.1902736 extct? True % :67 itt:
779 cur.gen 113 TheNodeTree.NodeCount() 1239
CandidateNodesQueue.Count:0!END

--
Generation went extinct despite 10 initial boys (in 111 generations)!
---+++------------------
OffSpringBoys [0,i] is: 10
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
Date: 9/30/2007 15:29:38; elap.time 0.2103024 extct? True % :93 itt:
668 cur.gen 111 TheNodeTree.NodeCount() 1441
CandidateNodesQueue.Count:0!END

--
---+++------------------
--
Generation went extinct despite 10 boys in just 4 gen! (can be
verified below)

---+++------------------
OffSpringBoys [0,i] is: 10
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 3 <--too late, never reached! Family surname
went extinct just before this

Date: 9/30/2007 15:43:47; elap.time 0.1802592 extct? True % :85 itt:
14 cur.gen 4 TheNodeTree.NodeCount() 19 CandidateNodesQueue.Count:0!
END
--

Went extinct despite 10 boys in a mere 4 generations!
Time now is9/30/2007 16:13:05
---+++------------------
OffSpringBoys [0,i] is: 10
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 3 <--three boys in this line did not help!
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 3 <--three boys in same "line" failed to
help!
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0 <--went extinct here, too late for the 3"in
the next line!
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 2
Date: 9/30/2007 16:13:05; elap.time 0.2203168 extct? True % :91 itt:
354 cur.gen 4 TheNodeTree.NodeCount() 25 CandidateNodesQueue.Count:0!
END
--

Generation with 9 initial boys that went extinct in four generations!
Time now is10/1/2007 05:23:48

---+++------------------
OffSpringBoys [0,i] is: 9
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 0 <--went extinct here!
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 5
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 1
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 0
OffSpringBoys [0,i] is: 4
OffSpringBoys [0,i] is: 2
OffSpringBoys [0,i] is: 3
OffSpringBoys [0,i] is: 2
Date: 10/1/2007 05:23:48; elap.time 0.2103024 extct? True % :89 itt:
59 cur.gen 4 TheNodeTree.NodeCount() 21 CandidateNodesQueue.Count:0!
END

PROGRAM DETAILS

The program was written in C# ver. 2.0 using Microsoft Visual Studio
2005, console mode. It has been debugged extensively and seems to
work, after checking it by hand. It comprises several main parts.

First, a random number generator creates a 100000 integer array of
Poisson integers (from zero to six, the upper bound of boys that can
be born to each person), with the random number generator constantly
'reseeded' at least by one millisecond with the system clock to keep
it random.

Second, the 'heart' or 'engine' of the program, is a N-ary tree
builder (builds a tree where each node has up to "N" branches, where N
= 0 to 6), and a tree manager that traverses the tree, while it is
being build, and tracks the number of generations, linking each child
to their father, tracks program loops and iterations, and checks
whether the tree exceeds an arbitrary bound of nodes, namely, 10000 (I
assume that if the tree grows to 10000 nodes, that is, the family tree
having 10000 boys -both past and present, alive and deceased--as
offspring, then likely the family will grow forever. This is of
course completely arbitrary, since there's always a small chance that
even such a large family will die out-and in fact it's almost certain
such a large family that has died out exists, but for statistical
purposes these type families are not material and this assumption
seems reasonable.

Third, various log or diagnostics methods of the program either write
output to the screen and/or file, and receive input from the user (the
user can specify the number of boys for the first generation, as per
the above, as well as any sequence of boys, i.e. the sequence
2,2,0,1,3 etc (see above for explanation)), as well as calculate
statistics.

The program has been verified in all stages by hand, using artificial
inputs with known results, and the program agreed with all the known
results. Further, the program was checked with the Wikipedia citation
graph above for Poisson mean (lambda) of L = 0.9, 1.0, 1.05 and 1.1
(the four graphs shown) and the program output for extinction rates
agreed with this Wiki graph.

The program showed that output varied rather widely (the variance)
depending on the number of iterations (family history simulations),
hence at least 1000 simulations and preferably more (6000 for the
default case, see above) were used. However, even for small runs of
simulations of 100 to 200, the output for average was statistically
the same-as can be expected-for instance for 100-200 iterations, the
default case for survival rate over five such small runs was (%): 18,
21, 26, 28, 28, which averages to 24%, which is close to the 24.7% =
25% found for running 6000 simulation iterations.

CRITICISMS OF THE PROGRAM

A criticism of the program might be that it was limited to six boys
maximum per father. However, a far more important parameter is the
Poisson mean (lambda). For example, if Lambda is decreased a mere 15%
to 1.0 rather than 1.15, the survival rate drops to 2% for the default
normal rate of 25%.

The reason six boys was deemed the maximum limit is that according to
Poisson statistics having six boys occurs in 1.028 every 1000 times,
while seven boys occurs 1.67 every 10000 times. Far more common is
zero, one or two boys (32%, 36% and 21%, respectively). Since the
Poisson array had a limit of 100k array, to keep the 'granularity'
small I elected to take the Poisson statistics to no more than three
significant figures after the decimal point. This way, for every
"one" boy (the most common figure at 36%) the array would be filled
with 360 (1000 * 0.360) "1's". Then I scrambled the array (five times
randomly-which is probably overkill since after even one scrambling
the array looked random to me) to make each element in the array
random. But I would have needed a much larger array, possibly sized
1M or more, which would have taken the program 10 times longer to run,
to include seven or more boys, since it's not a good idea to have a
small array that gets filled with the 'common' numbers too quickly,
prior to scrambling. Further, these large number of boys typically
would have had little to no effect unless by chance they appeared
early on (since most families either survive or die out within 20 to
50 generations-or early on in the array). For this reason I included
a user defined input method to allow the user to input any number of
initial boys, and, as can be seen, even 10 boys does not necessarily
guarantee survival.

Nevertheless, it's possible that the statistics might change a bit if
more boys greater than six were permitted, but I doubt by much (since
these families are rare). It's also possible that the far more
important parameter, the Poisson mean (lambda), was higher for
western human populations in the past. This would change the
survival rate dramatically. In another version of the program one
could 'sprinkle' the Poisson array of 100k numbers with "sevens,
eights, nines, tens and above" to simulate these unusually large boy
families, but equally and more effective would be to increase the
Poisson mean (lambda) from the present 1.15.

<EOF>

raylopez99

Re: Surname survival program- final remarks

Legg inn av raylopez99 » 01 okt 2007 16:04:42

On Oct 1, 7:31 am, raylopez99 <raylope...@yahoo.com> wrote:

N= 3 / 2000 iterations / 55% survival / 75% extinction rate


Correction: should read "45%" extinction rate (100-55 = 45)

RL

Svar

Gå tilbake til «alt.genealogy»