Need population statistics: average # children, standard dev
Moderator: MOD_nyhetsgrupper
-
raylopez99
Need population statistics: average # children, standard dev
Hi all in alt.geneology--
I am looking for information to perfect my genealogy tree program that
will show how quickly a family male surname will die off. Refer to
this thread (http://tinyurl.com/37avfk - Usenet thread) and the data
below my sig line for more information.
I have 90% of the program already finished. The only remaining
remaining step is calculating an accurate sequence of random number
integers representing children per woman per generation. In order to
do this, I need to know several parameters such as:
1) average children per couple (I have assumed 2.33 children/mother,
which is the replacement rate for keeping a population going
indefinitely, but if anybody has a historical breakdown by region or
date, let me know);
2) the standard deviation of the average given in 1) above. I need
this because I'd like to set up a Gaussian (normal) distribution of
integers representing children born to a woman. The reason why is
explained below.
3) The number of girls per 1000 births, which I've found on the web,
but it varies by geographic region / nation (due to infantcide?). If
anybody has more info I'd like to see it.
4) Any other information that might be of interest (if anybody knows
of a Gaussian random integer sequencer, please let me know--Random.Org
doesn't seem to have a free one but I'll keep looking)
Rationale: I particularly need data 1) and 2) because I can estimate
the "zeros" better. The "zeros" are defined as people who don't
reproduce, which is, besides having all girls, the biggest factor in
not having the male surname survive. Here is a quick example of the
importance of zeros: suppose it is fated by a geni that a man and his
decendants will only have boys, for perpetuity, and lots of boys, at
least 10 boys per generation, except for two conditions: at least one
generation will have only one child, and at least one child in the
future will not have children. Any sequence will allow the male
surname to survive except this sequence: Original patriarch --> has
one child (a boy of course) --> who never has kids. Any other
sequence except this one will allow the name to live in perpetuity.
So the number of zeros (which is a function of the normal
distribution, namely average kids/ mother and std. dev), and their
placement (which is a random function of chance in the sequence),
determines whether the male surname survives.
Thanks
RL
-
http://tinyurl.com/37avfk (original thread)
http://homepages.newnet.co.uk/dance/webpjd/index.htm
This is old stochastic processes stuff, specifically discrete
branching theory. You can find it discussed, for example, in
"The Theory of Stochastic Processes" by Hilton David Miller
and David Roxbee Cox [1977]. You can buy it from Amazon or
look at the relevant bits on Google.
I think Miller and Cox's bottom line was for American lines
using some old census data, the probability of extinction of
the male lines of a single individual is 0.86. If you have
N males with the same surname as "initial conditions", the
probability of extinction is 0.86 ** N (assuming independent
breeding), which means the probability of extinction is pretty
low for most surnames in existence today (if birth patterns
continued as they were).
Cheers, B. (I am an applied mathematican, among other things.)
--
Dr. Brian Leverich Co-moderator, soc.genealogy.methods/GENMTD-L
Angeles Chapter LTC Admin Chair http://angeles.sierraclub.org/ltc/
P.O. Box 6831, Frazier Park, CA 93222-68
http://www.random.org/sequences/
William
Here is another book to check out; "The Inheritance of English
Surnames" by C. M. Sturges and B. C. Haggett, 1990, ISBN 0785548475.
It is in a few British and US libraries, but not on Amazon.com or ABE
books.
According to a Scientific American Computer Recreations article (May
1986, p 12-16), the authors were British Ministry of Defence
employees
who traced genealogy in their spare time. They were puzzled by the
gradual disappearance of surnames in the records. They wrote a
simulation program such as you describe. They used these factors
derived
from a statistical analysis of genealogical records.
Number of males
in a family Probability
who will marry
0 0.317
1 0.364
2 0.209
3 0.080
4 0.023
5 0.005
6 0.001
I.e., they estimated that in any given family, there was a 31.7%
chance
there would be no son to marry and (possibly) carry on the surname, a
36.4% chance that one son would marry, etc.
I am looking for information to perfect my genealogy tree program that
will show how quickly a family male surname will die off. Refer to
this thread (http://tinyurl.com/37avfk - Usenet thread) and the data
below my sig line for more information.
I have 90% of the program already finished. The only remaining
remaining step is calculating an accurate sequence of random number
integers representing children per woman per generation. In order to
do this, I need to know several parameters such as:
1) average children per couple (I have assumed 2.33 children/mother,
which is the replacement rate for keeping a population going
indefinitely, but if anybody has a historical breakdown by region or
date, let me know);
2) the standard deviation of the average given in 1) above. I need
this because I'd like to set up a Gaussian (normal) distribution of
integers representing children born to a woman. The reason why is
explained below.
3) The number of girls per 1000 births, which I've found on the web,
but it varies by geographic region / nation (due to infantcide?). If
anybody has more info I'd like to see it.
4) Any other information that might be of interest (if anybody knows
of a Gaussian random integer sequencer, please let me know--Random.Org
doesn't seem to have a free one but I'll keep looking)
Rationale: I particularly need data 1) and 2) because I can estimate
the "zeros" better. The "zeros" are defined as people who don't
reproduce, which is, besides having all girls, the biggest factor in
not having the male surname survive. Here is a quick example of the
importance of zeros: suppose it is fated by a geni that a man and his
decendants will only have boys, for perpetuity, and lots of boys, at
least 10 boys per generation, except for two conditions: at least one
generation will have only one child, and at least one child in the
future will not have children. Any sequence will allow the male
surname to survive except this sequence: Original patriarch --> has
one child (a boy of course) --> who never has kids. Any other
sequence except this one will allow the name to live in perpetuity.
So the number of zeros (which is a function of the normal
distribution, namely average kids/ mother and std. dev), and their
placement (which is a random function of chance in the sequence),
determines whether the male surname survives.
Thanks
RL
-
http://tinyurl.com/37avfk (original thread)
http://homepages.newnet.co.uk/dance/webpjd/index.htm
This is old stochastic processes stuff, specifically discrete
branching theory. You can find it discussed, for example, in
"The Theory of Stochastic Processes" by Hilton David Miller
and David Roxbee Cox [1977]. You can buy it from Amazon or
look at the relevant bits on Google.
I think Miller and Cox's bottom line was for American lines
using some old census data, the probability of extinction of
the male lines of a single individual is 0.86. If you have
N males with the same surname as "initial conditions", the
probability of extinction is 0.86 ** N (assuming independent
breeding), which means the probability of extinction is pretty
low for most surnames in existence today (if birth patterns
continued as they were).
Cheers, B. (I am an applied mathematican, among other things.)
--
Dr. Brian Leverich Co-moderator, soc.genealogy.methods/GENMTD-L
Angeles Chapter LTC Admin Chair http://angeles.sierraclub.org/ltc/
P.O. Box 6831, Frazier Park, CA 93222-68
http://www.random.org/sequences/
William
Here is another book to check out; "The Inheritance of English
Surnames" by C. M. Sturges and B. C. Haggett, 1990, ISBN 0785548475.
It is in a few British and US libraries, but not on Amazon.com or ABE
books.
According to a Scientific American Computer Recreations article (May
1986, p 12-16), the authors were British Ministry of Defence
employees
who traced genealogy in their spare time. They were puzzled by the
gradual disappearance of surnames in the records. They wrote a
simulation program such as you describe. They used these factors
derived
from a statistical analysis of genealogical records.
Number of males
in a family Probability
who will marry
0 0.317
1 0.364
2 0.209
3 0.080
4 0.023
5 0.005
6 0.001
I.e., they estimated that in any given family, there was a 31.7%
chance
there would be no son to marry and (possibly) carry on the surname, a
36.4% chance that one son would marry, etc.
-
raylopez99
Re: Need population statistics: average # children, standard
On Aug 15, 4:51 am, raylopez99 <raylope...@yahoo.com> wrote:
Subsequently to posting this, I found this code below, which purports
to do a Gaussian distribution, but if anybody has other code please
reply here.
RL
http://atlas.csd.net/~cgadd/knowbase/MATH0076.HTM
4) Any other information that might be of interest (if anybody knows
of a Gaussian random integer sequencer, please let me know--Random.Org
doesn't seem to have a free one but I'll keep looking)
Subsequently to posting this, I found this code below, which purports
to do a Gaussian distribution, but if anybody has other code please
reply here.
RL
http://atlas.csd.net/~cgadd/knowbase/MATH0076.HTM
-
WGWhalley
Re: Need population statistics: average # children, standard
2) the standard deviation of the average given in 1) above. I need
this because I'd like to set up a Gaussian (normal) distribution of
integers representing children born to a woman.
Why not use an actual distribution instead of a normal distribution? A
normal distribution will predict a certain percentage of women with
negative numbers of children.
The Sturges and Hackett data for number of males who will marry fits a
Poisson distribution with a lambda of 1.15 (fits it exactly). A Poisson
distribution might serve better.
-
raylopez99
Re: Need population statistics: average # children, standard
On Aug 15, 4:19 pm, WGWhalley <wgwhal...@gmail.com> wrote:
Thanks William Whalley. Indeed, as I found out yesterday when I
compiled and ran the Gaussian code to generate random numbers, that
with any reasonable x and sigma you get 'negative' births, which I was
just going to truncate until I saw your reply. Now the problem is to
find some code on the net that approximates a Poisson distribution.
[five minutes later] I found this site -- http://www.willnaylor.com/wnlib.html
, which looks promising, and I'll get too it as soon as I finish some
more pressing matters, including taking a quick vacation trip...
RL
2) the standard deviation of the average given in 1) above. I need
this because I'd like to set up a Gaussian (normal) distribution of
integers representing children born to a woman.
Why not use an actual distribution instead of a normal distribution? A
normal distribution will predict a certain percentage of women with
negative numbers of children.
The Sturges and Hackett data for number of males who will marry fits a
Poisson distribution with a lambda of 1.15 (fits it exactly). A Poisson
distribution might serve better.
Thanks William Whalley. Indeed, as I found out yesterday when I
compiled and ran the Gaussian code to generate random numbers, that
with any reasonable x and sigma you get 'negative' births, which I was
just going to truncate until I saw your reply. Now the problem is to
find some code on the net that approximates a Poisson distribution.
[five minutes later] I found this site -- http://www.willnaylor.com/wnlib.html
, which looks promising, and I'll get too it as soon as I finish some
more pressing matters, including taking a quick vacation trip...
RL
-
WGWhalley
Re: Need population statistics: average # children, standard
You may also want to review the Wikipedia article "Galton-Watson process".
-
raylopez99
Re: Need population statistics: average # children, standard
xOn Aug 17, 9:03 am, WGWhalley <wgwhal...@gmail.com> wrote:
Thanks again WG Whalley! You are very helpful. I just reviewed the
article http://en.wikipedia.org/wiki/Galton-Watson_process just after
literally finishing the beta version of my program to test the same a
few minutes ago (I was on vacation the last few weeks).
Just ran a first set of simulations, using Lambda = 1.15 (as you
suggested) in a Poisson distribution (maximum two decimals of
precision, making four boys the maximum upper bound, since PMF(x=4,
Lambda = 1.15) = .023 or, for an array of 10000 Poisson integers,
which I am using, 2.3=2 (truncated) integers, while PMF(x=5,..) = .5 =
0 (I could use a 100k array and get 5 ints for x=5, but I figure that
having a large number of boys will not really, for such small odds,
really affect the generation extinction issue).
My very preliminary calculations (haven't yet debugged it, but am
fairly confident of the overall algorithm, since I've tested it on a
smaller array already): out of 46 simulations, with Lambda = 1.15, 32
of 46 ended in extinction, so the "extinction rate" is about 70%,
while the survival rate is about 30%. According to the Wikipedia
chart this result looks "roughly" correct, since their lambda stops at
1.1, but extrapolating an extra 0.05 gives roughly 70-30%. BTW my
program allows for lambda to be adjusted programically, so as soon as
I debug it thoroughly I'll add different lambda.
Some questions I intend to answer:
1/ the effect on dynasty survival of making sure the first N
generations have M boys. For example, the grantor/founder of a
dynasty can stipulate in his will: "My first N decendants must have M
boys". What effect will this 'initial condition' of a "forced" number
of boys have on the overall long-term dynastic survival?
2/ I was going to see the effect of different "lambdas" in the
Poisson distribution have, but that was before I saw your Wikipedia
cite. Now I will simply confirm whether the Wiki chart is accurate.
BTW, I wrote this program in C# (Console Mode) as a programming
exercise only (I am familiar with C++, a close cousin of C#, but this
is the first program I wrote in C#); it's kind of fun to work on this
project. I'll make a Windows GUI version later, with the "tree" of
decendants graphically mapped, so you can see which 'branch' had the
'longest run' for any tree that goes extinct (some of my initial runs
have gotten close to the 10000 node limit I've arbitrarily set for
extinction--anything over 10k nodes (total boys born) I assume will
last forever). A hand simulation found that sometimes the family that
has many boys doesn't necessarily last longest, since some of the boys
never have boys.
On a more practical level, at least for me, it's a surprise that the
obsession with having boys with some fathers is really ill-placed,
since about 70% or so the male surname will go extinct anyway, no
matter what you do (with the caveats made above, to be determined).
RL
You may also want to review the Wikipedia article "Galton-Watson process".
Thanks again WG Whalley! You are very helpful. I just reviewed the
article http://en.wikipedia.org/wiki/Galton-Watson_process just after
literally finishing the beta version of my program to test the same a
few minutes ago (I was on vacation the last few weeks).
Just ran a first set of simulations, using Lambda = 1.15 (as you
suggested) in a Poisson distribution (maximum two decimals of
precision, making four boys the maximum upper bound, since PMF(x=4,
Lambda = 1.15) = .023 or, for an array of 10000 Poisson integers,
which I am using, 2.3=2 (truncated) integers, while PMF(x=5,..) = .5 =
0 (I could use a 100k array and get 5 ints for x=5, but I figure that
having a large number of boys will not really, for such small odds,
really affect the generation extinction issue).
My very preliminary calculations (haven't yet debugged it, but am
fairly confident of the overall algorithm, since I've tested it on a
smaller array already): out of 46 simulations, with Lambda = 1.15, 32
of 46 ended in extinction, so the "extinction rate" is about 70%,
while the survival rate is about 30%. According to the Wikipedia
chart this result looks "roughly" correct, since their lambda stops at
1.1, but extrapolating an extra 0.05 gives roughly 70-30%. BTW my
program allows for lambda to be adjusted programically, so as soon as
I debug it thoroughly I'll add different lambda.
Some questions I intend to answer:
1/ the effect on dynasty survival of making sure the first N
generations have M boys. For example, the grantor/founder of a
dynasty can stipulate in his will: "My first N decendants must have M
boys". What effect will this 'initial condition' of a "forced" number
of boys have on the overall long-term dynastic survival?
2/ I was going to see the effect of different "lambdas" in the
Poisson distribution have, but that was before I saw your Wikipedia
cite. Now I will simply confirm whether the Wiki chart is accurate.
BTW, I wrote this program in C# (Console Mode) as a programming
exercise only (I am familiar with C++, a close cousin of C#, but this
is the first program I wrote in C#); it's kind of fun to work on this
project. I'll make a Windows GUI version later, with the "tree" of
decendants graphically mapped, so you can see which 'branch' had the
'longest run' for any tree that goes extinct (some of my initial runs
have gotten close to the 10000 node limit I've arbitrarily set for
extinction--anything over 10k nodes (total boys born) I assume will
last forever). A hand simulation found that sometimes the family that
has many boys doesn't necessarily last longest, since some of the boys
never have boys.
On a more practical level, at least for me, it's a surprise that the
obsession with having boys with some fathers is really ill-placed,
since about 70% or so the male surname will go extinct anyway, no
matter what you do (with the caveats made above, to be determined).
RL
-
raylopez99
Re: Need population statistics: average # children, standard
On Sep 4, 3:09 pm, raylopez99 <raylope...@yahoo.com> wrote:
Ran another 57 simulations just now (I haven't automated it yet, which
I'll do easy enough soon, so for now I have to hit Enter everytime and
record results by hand) and found an extinction rate of 72% and a
survival rate of 28%, with Lambda=1.15.
I'm very confident the algorithm works, since for simple nodes (30 or
less) I traced the tree manually, by hand, and found that indeed the
tree goes extinct as predicted by the program.*
Another thing I might add to the program is see how often the family
dies out after X generations, such as 50 generations (the chart on
Wiki implies this is rare).
RL
* a "widow maker" sequence is: {0, ...ANYTHING}, which by definition
goes extinct the first generation; another is {2,0,0,ANYTHING} or
{3,0,0,0, ANYTHING}, which means two or three boys the first
generation, who don't have boys the next generation. These sequences
occur, by chance, quite frequently in a Poisson distribution with
Lambda = 1.15. But they have to occur initially (at the root of the
tree) since if the tree grows too large, there exist too many branches
for the tree to die out. However, even for several hundred nodes,
sometimes the tree dies out, which is surprising.
For example, here is a 24 node tree sequence of random Poisson
distribution numbers, lambda = 1.15, that dies out:
OffSpringBoys [0,i] is: 3 // first generation, after founder, is
three boys
OffSpringBoys [0,i] is: 1 // one of three boys has a boy, X
OffSpringBoys [0,i] is: 1 // second of three boys has a boy, Y
OffSpringBoys [0,i] is: 0 // third of three boys has no boy
OffSpringBoys [0,i] is: 2 // boy X has two boys A, B
OffSpringBoys [0,i] is: 1 // boy Y has one boy, C
OffSpringBoys [0,i] is: 1 // boy A has a boy, A1
OffSpringBoys [0,i] is: 1 // boy B has a boy, B1
OffSpringBoys [0,i] is: 1 // boy C has a boy, C1
OffSpringBoys [0,i] is: 0 // boy A1 has no boys
OffSpringBoys [0,i] is: 1 // boy B1 has a boy, B2
OffSpringBoys [0,i] is: 0 // boy C1 has no boys
OffSpringBoys [0,i] is: 2 // boy B2 has two boys, B2A and B2B
OffSpringBoys [0,i] is: 1 // boy B2A has one boy, B3A
OffSpringBoys [0,i] is: 2 // boy B2B has two boys, B3B1 and B3B2
OffSpringBoys [0,i] is: 2 // boy B3A has two boys, Frank and Joe
(sorry, I'm running out of numbers)
OffSpringBoys [0,i] is: 1 // boy B3B1 has a boy, Steve
OffSpringBoys [0,i] is: 1 // boy B3B2 has a boy, Moe
OffSpringBoys [0,i] is: 0 // Frank has no boys
OffSpringBoys [0,i] is: 1 //Joe has a boy, Joey
OffSpringBoys [0,i] is: 0 // Steve has no boys
OffSpringBoys [0,i] is: 0 // Moe is gay (and produces no boys)
OffSpringBoys [0,i] is: 0// Joey has no boy kids --line goes extinct
As you can see from this actual example, the "dynasty" can last quite
a long time (10 generations and 24 nodes, if I count the above
correctly), and yet still go extinct.
RL
My very preliminary calculations (haven't yet debugged it, but am
fairly confident of the overall algorithm, since I've tested it on a
smaller array already): out of 46 simulations, with Lambda = 1.15, 32
of 46 ended in extinction, so the "extinction rate" is about 70%,
while the survival rate is about 30%. According to the Wikipedia
chart this result looks "roughly" correct, since their lambda stops at
1.1, but extrapolating an extra 0.05 gives roughly 70-30%. BTW my
program allows for lambda to be adjusted programically, so as soon as
I debug it thoroughly I'll add different lambda.
Ran another 57 simulations just now (I haven't automated it yet, which
I'll do easy enough soon, so for now I have to hit Enter everytime and
record results by hand) and found an extinction rate of 72% and a
survival rate of 28%, with Lambda=1.15.
I'm very confident the algorithm works, since for simple nodes (30 or
less) I traced the tree manually, by hand, and found that indeed the
tree goes extinct as predicted by the program.*
Another thing I might add to the program is see how often the family
dies out after X generations, such as 50 generations (the chart on
Wiki implies this is rare).
RL
* a "widow maker" sequence is: {0, ...ANYTHING}, which by definition
goes extinct the first generation; another is {2,0,0,ANYTHING} or
{3,0,0,0, ANYTHING}, which means two or three boys the first
generation, who don't have boys the next generation. These sequences
occur, by chance, quite frequently in a Poisson distribution with
Lambda = 1.15. But they have to occur initially (at the root of the
tree) since if the tree grows too large, there exist too many branches
for the tree to die out. However, even for several hundred nodes,
sometimes the tree dies out, which is surprising.
For example, here is a 24 node tree sequence of random Poisson
distribution numbers, lambda = 1.15, that dies out:
OffSpringBoys [0,i] is: 3 // first generation, after founder, is
three boys
OffSpringBoys [0,i] is: 1 // one of three boys has a boy, X
OffSpringBoys [0,i] is: 1 // second of three boys has a boy, Y
OffSpringBoys [0,i] is: 0 // third of three boys has no boy
OffSpringBoys [0,i] is: 2 // boy X has two boys A, B
OffSpringBoys [0,i] is: 1 // boy Y has one boy, C
OffSpringBoys [0,i] is: 1 // boy A has a boy, A1
OffSpringBoys [0,i] is: 1 // boy B has a boy, B1
OffSpringBoys [0,i] is: 1 // boy C has a boy, C1
OffSpringBoys [0,i] is: 0 // boy A1 has no boys
OffSpringBoys [0,i] is: 1 // boy B1 has a boy, B2
OffSpringBoys [0,i] is: 0 // boy C1 has no boys
OffSpringBoys [0,i] is: 2 // boy B2 has two boys, B2A and B2B
OffSpringBoys [0,i] is: 1 // boy B2A has one boy, B3A
OffSpringBoys [0,i] is: 2 // boy B2B has two boys, B3B1 and B3B2
OffSpringBoys [0,i] is: 2 // boy B3A has two boys, Frank and Joe
(sorry, I'm running out of numbers)
OffSpringBoys [0,i] is: 1 // boy B3B1 has a boy, Steve
OffSpringBoys [0,i] is: 1 // boy B3B2 has a boy, Moe
OffSpringBoys [0,i] is: 0 // Frank has no boys
OffSpringBoys [0,i] is: 1 //Joe has a boy, Joey
OffSpringBoys [0,i] is: 0 // Steve has no boys
OffSpringBoys [0,i] is: 0 // Moe is gay (and produces no boys)
OffSpringBoys [0,i] is: 0// Joey has no boy kids --line goes extinct
As you can see from this actual example, the "dynasty" can last quite
a long time (10 generations and 24 nodes, if I count the above
correctly), and yet still go extinct.
RL
-
raylopez99
Re: Need population statistics: average # children, standard
Just ran a couple of thousand simulations on my program--pretty
confident it has few bugs and is working properly.
Here are my results, with explanations:
for L, lamda (Poisson mean) --> probability of surname survival (%)
L < 1.0 --> 0% (as predicted in http://en.wikipedia.org/wiki/Galton-Watson_process)
L = 1.0 --> 1%
L = 1.05 --> 3.5% (lower than the 10% predicted on the Wikipedia
graph)
L = 1.1 --> 12.5% (lower than the 20% predicted on the Wiki graph)
L = 1.15--> 20.5% (somewhat lower than predicted; this lambda is the
actual lambda for a real population, see this thread)
The explanation for the lower numbers in the simulation versus what is
shown in Wikipedia I believe is because in my program the maximum
number of boys per family was limited to 4. This is because even four
boys is rather rare in the Poisson distribution for Lambda L = 1.15
(which is what I concentrated on), since four boys happens only 2% of
the time, and five boys only 0.5% of the time. However, it appears
that a population occasionally having a large number of boys (i.e.,
five or greater) will double or triple the chances of surname survival
(though it still is low, see the Wikipedia graph).
I will increase the number of boys to include families with five and
six boys, and see if this increases the survival rate, as it should.
Also I will include a stipulation in the population sequence where a
"forced" number of boys N appears in the beginning, that is, say N=4
boys in the beginning (first generation) and see how this increases
the survival rate. This is analogous to a grantor saying in a will
"to inhereit my money, my offspring must have N boys in the first
generation". This should increase the survival rate--it will be
interesting to see how big "N" must be to get to a 50% survival rate.
RL
On Sep 4, 3:09 pm, raylopez99 <raylope...@yahoo.com> wrote:
confident it has few bugs and is working properly.
Here are my results, with explanations:
for L, lamda (Poisson mean) --> probability of surname survival (%)
L < 1.0 --> 0% (as predicted in http://en.wikipedia.org/wiki/Galton-Watson_process)
L = 1.0 --> 1%
L = 1.05 --> 3.5% (lower than the 10% predicted on the Wikipedia
graph)
L = 1.1 --> 12.5% (lower than the 20% predicted on the Wiki graph)
L = 1.15--> 20.5% (somewhat lower than predicted; this lambda is the
actual lambda for a real population, see this thread)
The explanation for the lower numbers in the simulation versus what is
shown in Wikipedia I believe is because in my program the maximum
number of boys per family was limited to 4. This is because even four
boys is rather rare in the Poisson distribution for Lambda L = 1.15
(which is what I concentrated on), since four boys happens only 2% of
the time, and five boys only 0.5% of the time. However, it appears
that a population occasionally having a large number of boys (i.e.,
five or greater) will double or triple the chances of surname survival
(though it still is low, see the Wikipedia graph).
I will increase the number of boys to include families with five and
six boys, and see if this increases the survival rate, as it should.
Also I will include a stipulation in the population sequence where a
"forced" number of boys N appears in the beginning, that is, say N=4
boys in the beginning (first generation) and see how this increases
the survival rate. This is analogous to a grantor saying in a will
"to inhereit my money, my offspring must have N boys in the first
generation". This should increase the survival rate--it will be
interesting to see how big "N" must be to get to a 50% survival rate.
RL
On Sep 4, 3:09 pm, raylopez99 <raylope...@yahoo.com> wrote:
-
raylopez99
Re: Need population statistics: average # children, standard
On Sep 13, 4:11 pm, raylopez99 <raylope...@yahoo.com> wrote:
Just ran about 1000 simulations, Lambda at 1.15, and preliminary
calculations show that increasing the Poisson sequence to include six
boys increases the survival rate from 21% to about 27%, which is
closer to what looks like passing at http://en.wikipedia.org/wiki/Galton-Watson_process.
The longest generation that ultimately went extinct was 77 generations
(in other simulations I've had over 150 generations before going
extinct, but most surnames will go extinct--if at all--before 25
generations or so.
Also at Lambda = 1.05, after 1000 simulations the survival rate is
11%, which is close to the Wikipedia graph of 10%. This shows the
program is working properly once larger families are included
(incidentally, the longest surname generation that went extinct was 97
generations in this run).
All in all, it shows the program works that a critical parameter for
surname survival is large families of boys on occasion--this is true
even if the Poisson mean (lambda) stays the same.
What this means is that the next step in the program, having an intial
"surge" of boys the first generation, should increase the odds of
survival dramatically. I'll see and post here for future reference
what this increase is.
If anybody reading this thread wants to see my source code, in C#,
drop me a line.
RL
I will increase the number of boys to include families with five and
six boys, and see if this increases the survival rate, as it should.
Just ran about 1000 simulations, Lambda at 1.15, and preliminary
calculations show that increasing the Poisson sequence to include six
boys increases the survival rate from 21% to about 27%, which is
closer to what looks like passing at http://en.wikipedia.org/wiki/Galton-Watson_process.
The longest generation that ultimately went extinct was 77 generations
(in other simulations I've had over 150 generations before going
extinct, but most surnames will go extinct--if at all--before 25
generations or so.
Also at Lambda = 1.05, after 1000 simulations the survival rate is
11%, which is close to the Wikipedia graph of 10%. This shows the
program is working properly once larger families are included
(incidentally, the longest surname generation that went extinct was 97
generations in this run).
All in all, it shows the program works that a critical parameter for
surname survival is large families of boys on occasion--this is true
even if the Poisson mean (lambda) stays the same.
What this means is that the next step in the program, having an intial
"surge" of boys the first generation, should increase the odds of
survival dramatically. I'll see and post here for future reference
what this increase is.
If anybody reading this thread wants to see my source code, in C#,
drop me a line.
RL
-
WGWhalley
Re: Need population statistics: average # children, standard
raylopez99 wrote:
I thought it it somewhat suspicious that the Galton-Watson rates of
propagating male children exactly fit the Poisson distribution. The
distribution may be an approximation of the the actual situation.
Something like that did happen, at least occasionally. A Whalley assumed
the name and arms of Gardiner when he married into that family.
Just ran a couple of thousand simulations on my program--pretty
confident it has few bugs and is working properly.
for L, lamda (Poisson mean) --> probability of surname survival (%)
I thought it it somewhat suspicious that the Galton-Watson rates of
propagating male children exactly fit the Poisson distribution. The
distribution may be an approximation of the the actual situation.
This is analogous to a grantor saying in a will
"to inherit my money, my offspring must have N boys in the first
generation". This should increase the survival rate--it will be
interesting to see how big "N" must be to get to a 50% survival rate.
Something like that did happen, at least occasionally. A Whalley assumed
the name and arms of Gardiner when he married into that family.
-
raylopez99
Re: Need population statistics: average # children, standard
On Sep 16, 5:53 am, WGWhalley <wgwhal...@gmail.com> wrote:
Thanks WG Whalley for the reply--you are the architectural inspiration
behind this program--please feel free to provide any other suggestions
on genealogy related topics I can model, even database related (I'm
also learning SQL language).
Indeed, or, the Poisson distribution may be very "chaotic", depending
on what the initial inputs are, which I am beginning to think is the
case. Let me explain: it seems, contrary to my initial impression,
that indeed large number of boys will guarantee the survival of a
surname. When I increased the maximum number of boys from five to
six, and also slightly increased the number of such maximum boy
occurrences (by shifting the decimal place: before I was using, for
X=5 in the Poisson mass density function of = 5.3=5 per 1000 to 53 per
10000, to get an extra three), the survival rate jumped from the low
twenties to the high twenties, and for the other values of lamba, the
odds increased as per my last thread. But, as before, the entire
process is very unstable--it varies by up to 3% (per 1000 simulations--
I have a loop now) plus/minus from a mean. Sometimes you'll get 30%
and sometimes 25--meaning the intial values are important and these
rare "maximum number of boys" values are making a big difference.
That's why I'm excited by the next phase of the program, as I
discussed. TOday I mapped out what needs to be changed in the
program, and I should be done in a few hours (I would be done quicker
but I have to detour since I'm also making the user input more fancy,
as a programming exercise).
Interesting. The problem of morganatic marriages also goes to mind,
though I think in such marriages the surname is retained, so in any
event the program is indifferent. http://www.answers.com/topic/morganatic
RL
PS--just ran another set of simulations for Lambda 1.15, here is the
output-- 28% survival rate:
Extinct? 28% of total simulations
Longest Generation that ultimately went extinct: 122, at
set(245/1000) {<--I've had up to 180 generations before the
generation surname went extinct!--RL}
Press any key to continue . . .
raylopez99 wrote:
Thanks WG Whalley for the reply--you are the architectural inspiration
behind this program--please feel free to provide any other suggestions
on genealogy related topics I can model, even database related (I'm
also learning SQL language).
Just ran a couple of thousand simulations on my program--pretty
confident it has few bugs and is working properly.
for L, lamda (Poisson mean) --> probability of surname survival (%)
I thought it it somewhat suspicious that the Galton-Watson rates of
propagating male children exactly fit the Poisson distribution. The
distribution may be an approximation of the the actual situation.
Indeed, or, the Poisson distribution may be very "chaotic", depending
on what the initial inputs are, which I am beginning to think is the
case. Let me explain: it seems, contrary to my initial impression,
that indeed large number of boys will guarantee the survival of a
surname. When I increased the maximum number of boys from five to
six, and also slightly increased the number of such maximum boy
occurrences (by shifting the decimal place: before I was using, for
X=5 in the Poisson mass density function of = 5.3=5 per 1000 to 53 per
10000, to get an extra three), the survival rate jumped from the low
twenties to the high twenties, and for the other values of lamba, the
odds increased as per my last thread. But, as before, the entire
process is very unstable--it varies by up to 3% (per 1000 simulations--
I have a loop now) plus/minus from a mean. Sometimes you'll get 30%
and sometimes 25--meaning the intial values are important and these
rare "maximum number of boys" values are making a big difference.
That's why I'm excited by the next phase of the program, as I
discussed. TOday I mapped out what needs to be changed in the
program, and I should be done in a few hours (I would be done quicker
but I have to detour since I'm also making the user input more fancy,
as a programming exercise).
This is analogous to a grantor saying in a will
"to inherit my money, my offspring must have N boys in the first
generation". This should increase the survival rate--it will be
interesting to see how big "N" must be to get to a 50% survival rate.
Something like that did happen, at least occasionally. A Whalley assumed
the name and arms of Gardiner when he married into that family.
Interesting. The problem of morganatic marriages also goes to mind,
though I think in such marriages the surname is retained, so in any
event the program is indifferent. http://www.answers.com/topic/morganatic
RL
PS--just ran another set of simulations for Lambda 1.15, here is the
output-- 28% survival rate:
Extinct? 28% of total simulations
Longest Generation that ultimately went extinct: 122, at
set(245/1000) {<--I've had up to 180 generations before the
generation surname went extinct!--RL}
Press any key to continue . . .
-
raylopez99
Re: Need population statistics: average # children, standard
UPDATE:
I'm finding that the Galton-Watson process is highly chaotic.
Below are some examples at different Poisson distributions, and at
different "shufflings" of the array (I take a Poisson Array of numbers
that is not shuffled randomly, and randomly shuffle it N times, where
N=3, N=5, N=10). I've checked the randomness of the tree carefully,
making sure I pick a true random number sequence every iteration.
L=Lambda / N=shuffles / Extinction Rate (%) / Maximum generation #
that went extinct -- (just for fun to see which simulation had the
longest surname survivor generation was that ultimately nevertheless
went extinct, with for example the thirtieth generation from the
founder being N=30, and most generations either going extinct or
surviving by the 50th generation)
All simulations comprise 1000 runs (1000 simulated "family histories"
if you will):
L=1.15 / 10 / 26% / 107
L=1.15 / 10 / 23% / 66
L=1.15 / 10 / 22% / 195 195 * 25 years/gen. = 4875 years! [if you
plot this family tree graphically, which in a later GUI-version of the
program I plan to do, you'll probably see, as I have with other such
long runs, an amazing number of 'near extinction' events that were
saved 'at the last minute' by one or more boys being born, until at
last a string of girls and/or no children fatally killed all extant
branches of the family tree]
L=1.15 / 10 / 23% / 122
L=1.15 / 10 / 26% / 201(!) new record, 201* 25 years/gen. = 5025
years!
L=1.15 / 5 / 23% / 35 (35 is a new 'least longest' record, with 1000
simulations per series run)
L=1.15 / 3 / 25% / 63
L=1.15 / 3 / 22% / 52
L=1.05 / 10 / 10% / 190
L=1.05 / 3 / 10% / 322! (this is a different lambda, but even more
impressive since the extinction rate for this lambda is higher than
for L=1.15)
RL
I'm finding that the Galton-Watson process is highly chaotic.
Below are some examples at different Poisson distributions, and at
different "shufflings" of the array (I take a Poisson Array of numbers
that is not shuffled randomly, and randomly shuffle it N times, where
N=3, N=5, N=10). I've checked the randomness of the tree carefully,
making sure I pick a true random number sequence every iteration.
L=Lambda / N=shuffles / Extinction Rate (%) / Maximum generation #
that went extinct -- (just for fun to see which simulation had the
longest surname survivor generation was that ultimately nevertheless
went extinct, with for example the thirtieth generation from the
founder being N=30, and most generations either going extinct or
surviving by the 50th generation)
All simulations comprise 1000 runs (1000 simulated "family histories"
if you will):
L=1.15 / 10 / 26% / 107
L=1.15 / 10 / 23% / 66
L=1.15 / 10 / 22% / 195 195 * 25 years/gen. = 4875 years! [if you
plot this family tree graphically, which in a later GUI-version of the
program I plan to do, you'll probably see, as I have with other such
long runs, an amazing number of 'near extinction' events that were
saved 'at the last minute' by one or more boys being born, until at
last a string of girls and/or no children fatally killed all extant
branches of the family tree]
L=1.15 / 10 / 23% / 122
L=1.15 / 10 / 26% / 201(!) new record, 201* 25 years/gen. = 5025
years!
L=1.15 / 5 / 23% / 35 (35 is a new 'least longest' record, with 1000
simulations per series run)
L=1.15 / 3 / 25% / 63
L=1.15 / 3 / 22% / 52
L=1.05 / 10 / 10% / 190
L=1.05 / 3 / 10% / 322! (this is a different lambda, but even more
impressive since the extinction rate for this lambda is higher than
for L=1.15)
RL
-
raylopez99
Re: Need population statistics: average # children, standard
I finished the beta version 2.0 of my program, which allows user
input. I'll double check later, but it seems to be working.
The ver 2.0 allows a user to input the first generation. For example,
if you say the first generation has 2 boys, you input two. You can
input a string as well (2,2,3 which means two boys, who each have two
and three boys, respectively).
Results (survival rate) are as follows: the survival rate went, for
Lambda 1.15, from the low 20% to as follows (100 simulations--you
really need to run 1000 to get stable results, see my other posts):
2 boys (first generation) --> 39% survival rate
3 " " --> 60% " "
4 " " --> 64% " "
5 " " --> 78% " "
10 " " --> 91%
and the sequence of 1,2 (i.e. the first generation being told, by
their father in a will, "you must have two boys to inhereit my money")
1,2 (one boy in the first generation having two boys) -->43% to 49%
(100 to 500 simulations).
In my next version of the program I'll see what the "winning
sequences" are in all probability--that is, what sequence of boys will
give a 50% surname survival rate? Clear 1,2 (see above) is one such
sequence, but there are others.
RL
On Sep 16, 5:53 am, WGWhalley <wgwhal...@gmail.com> wrote:
input. I'll double check later, but it seems to be working.
The ver 2.0 allows a user to input the first generation. For example,
if you say the first generation has 2 boys, you input two. You can
input a string as well (2,2,3 which means two boys, who each have two
and three boys, respectively).
Results (survival rate) are as follows: the survival rate went, for
Lambda 1.15, from the low 20% to as follows (100 simulations--you
really need to run 1000 to get stable results, see my other posts):
2 boys (first generation) --> 39% survival rate
3 " " --> 60% " "
4 " " --> 64% " "
5 " " --> 78% " "
10 " " --> 91%
and the sequence of 1,2 (i.e. the first generation being told, by
their father in a will, "you must have two boys to inhereit my money")
1,2 (one boy in the first generation having two boys) -->43% to 49%
(100 to 500 simulations).
In my next version of the program I'll see what the "winning
sequences" are in all probability--that is, what sequence of boys will
give a 50% surname survival rate? Clear 1,2 (see above) is one such
sequence, but there are others.
RL
On Sep 16, 5:53 am, WGWhalley <wgwhal...@gmail.com> wrote:
-
singhals
Re: Need population statistics: average # children, standard
raylopez99 wrote:
Is anyone else having a problem figuring out exactly it is
that Ray is doing?
In one post he discusses number of children (no qualifiers,
no adjectives)
In the next, he discusses number of adult children; then in
another, it's number of reproducing children; then suddenly,
it's number of sons, and now the birth-order of the sons.
Meanwhile, issues such as the 17th, 18th and 19th century
birthrates are being ignored as is the 21st century trend
toward women retaining and using their birth-name.
So is the issue at issue (so t'speak) how long it will take
for any given MAN's descendants (male or female) to die off,
how long it will take for any given MAN's straight-line male
descendants to die off, how long any given MAN's Y-chromo
will survive, or how long any given Y-chromo will survive as
an expression? The last two are nearly self-exclusive,
after all.
Cheryl
I finished the beta version 2.0 of my program, which allows user
input. I'll double check later, but it seems to be working.
The ver 2.0 allows a user to input the first generation. For example,
if you say the first generation has 2 boys, you input two. You can
input a string as well (2,2,3 which means two boys, who each have two
and three boys, respectively).
Results (survival rate) are as follows: the survival rate went, for
Lambda 1.15, from the low 20% to as follows (100 simulations--you
really need to run 1000 to get stable results, see my other posts):
2 boys (first generation) --> 39% survival rate
3 " " --> 60% " "
4 " " --> 64% " "
5 " " --> 78% " "
10 " " --> 91%
and the sequence of 1,2 (i.e. the first generation being told, by
their father in a will, "you must have two boys to inhereit my money")
1,2 (one boy in the first generation having two boys) -->43% to 49%
(100 to 500 simulations).
In my next version of the program I'll see what the "winning
sequences" are in all probability--that is, what sequence of boys will
give a 50% surname survival rate? Clear 1,2 (see above) is one such
sequence, but there are others.
RL
Is anyone else having a problem figuring out exactly it is
that Ray is doing?
In one post he discusses number of children (no qualifiers,
no adjectives)
In the next, he discusses number of adult children; then in
another, it's number of reproducing children; then suddenly,
it's number of sons, and now the birth-order of the sons.
Meanwhile, issues such as the 17th, 18th and 19th century
birthrates are being ignored as is the 21st century trend
toward women retaining and using their birth-name.
So is the issue at issue (so t'speak) how long it will take
for any given MAN's descendants (male or female) to die off,
how long it will take for any given MAN's straight-line male
descendants to die off, how long any given MAN's Y-chromo
will survive, or how long any given Y-chromo will survive as
an expression? The last two are nearly self-exclusive,
after all.
Cheryl
-
Tara
Re: Need population statistics: average # children, standard
"singhals" <singhals@erols.com> wrote in message
news:l-qdnWl3b_7S8WfbnZ2dnUVZ_smnnZ2d@rcn.net...
I thought the point was to see how many statistical analysis posts he could
upload to a genealogy newsgroup before everyone blocked him.
--
Tara Larkin
Remove NO SPAM to reply by email.
news:l-qdnWl3b_7S8WfbnZ2dnUVZ_smnnZ2d@rcn.net...
Is anyone else having a problem figuring out exactly it is that Ray is
doing?
In one post he discusses number of children (no qualifiers, no adjectives)
In the next, he discusses number of adult children; then in another, it's
number of reproducing children; then suddenly, it's number of sons, and
now the birth-order of the sons.
Meanwhile, issues such as the 17th, 18th and 19th century birthrates are
being ignored as is the 21st century trend toward women retaining and
using their birth-name.
So is the issue at issue (so t'speak) how long it will take for any given
MAN's descendants (male or female) to die off, how long it will take for
any given MAN's straight-line male descendants to die off, how long any
given MAN's Y-chromo will survive, or how long any given Y-chromo will
survive as an expression? The last two are nearly self-exclusive, after
all.
Cheryl
I thought the point was to see how many statistical analysis posts he could
upload to a genealogy newsgroup before everyone blocked him.
--
Tara Larkin
Remove NO SPAM to reply by email.
-
Lesley Robertson
Re: Need population statistics: average # children, standard
"singhals" <singhals@erols.com> wrote in message
news:OLydnQMUeP4INmfbnZ2dnUVZ_rTinZ2d@rcn.net...
Is he still at it?
Remember the story of the student's statistics report? 33.3% of the mice
lived, 33.3% of the mice died and the third mouse escaped.
Lesley Robertson
news:OLydnQMUeP4INmfbnZ2dnUVZ_rTinZ2d@rcn.net...
Tara wrote:
I thought the point was to see how many statistical analysis posts he
could upload to a genealogy newsgroup before everyone blocked him.
Oh. Well. Then. That's different, innit?
Is he still at it?
Remember the story of the student's statistics report? 33.3% of the mice
lived, 33.3% of the mice died and the third mouse escaped.
Lesley Robertson
-
raylopez99
Re: Need population statistics: average # children, standard
On Sep 27, 12:56 am, "Lesley Robertson" <l.a.robert...@tnw.tudelft.nl>
wrote:
Hey, looks like I woke up the peanut gallery! What took you all so
long?
RL
wrote:
Tara wrote:
I thought the point was to see how many statistical analysis posts he
could upload to a genealogy newsgroup before everyone blocked him.
Oh. Well. Then. That's different, innit?
Is he still at it?
Remember the story of the student's statistics report? 33.3% of the mice
lived, 33.3% of the mice died and the third mouse escaped.
Lesley Robertson
Hey, looks like I woke up the peanut gallery! What took you all so
long?
RL
-
WGWhalley
Re: Need population statistics: average # children, standard
Here is an article discussing the relationship of surnames with
Y-chromosome genetics.
http://www.le.ac.uk/genetics/maj4/SurnamesForWeb.pdf
Y-chromosome genetics.
http://www.le.ac.uk/genetics/maj4/SurnamesForWeb.pdf
-
raylopez99
Re: Need population statistics: average # children, standard
On Sep 28, 7:43 am, WGWhalley <wgwhal...@gmail.com> wrote:
Thanks, very interesting and deep article, I'll have to digest it
later.
Perhaps the use of surnames going back 5000 years and the fact most
surnames go extinct (except for a favored few) is why everybody in
China is named "CHOU"(!) In Turkey and especially Iceland, by
contrast, we can expect more surname diversity (see below).
RL
Most populations now use hereditary
surnames, although the date of their
establishment varies greatly around the
world, from almost 5000 years ago in
China, to only 68 years ago in Turkey.
There is also variation among regions
within countries and among social
classes. In Japan, for example, the
governing classes took hereditary
surnames from the 13th century AD,
but prohibited their use by other people
until 1868 (Ref. 1). Some societies still
do without them and use, for example,
names based on father's forename (e.g. in
Iceland), which therefore change each
generation.
Here is an article discussing the relationship of surnames with
Y-chromosome genetics.
http://www.le.ac.uk/genetics/maj4/SurnamesForWeb.pdf
Thanks, very interesting and deep article, I'll have to digest it
later.
Perhaps the use of surnames going back 5000 years and the fact most
surnames go extinct (except for a favored few) is why everybody in
China is named "CHOU"(!) In Turkey and especially Iceland, by
contrast, we can expect more surname diversity (see below).
RL
Most populations now use hereditary
surnames, although the date of their
establishment varies greatly around the
world, from almost 5000 years ago in
China, to only 68 years ago in Turkey.
There is also variation among regions
within countries and among social
classes. In Japan, for example, the
governing classes took hereditary
surnames from the 13th century AD,
but prohibited their use by other people
until 1868 (Ref. 1). Some societies still
do without them and use, for example,
names based on father's forename (e.g. in
Iceland), which therefore change each
generation.