Event-based databases

Steve Hayes · Legg inn av **Steve Hayes** » 19. februar 2006 kl. 18.58

Since I first asked about eventbased programs I've had recommendations of two
from various people:

a) The Master Genealogyist
b) Genbox

So I've downloaded the evalusation versions of each to try them.

I hope to look at them in 3 ways:

1) with samp[le data that comes with the program, to learn how to use it
2) With my own data, imported, to see what each does with it
3) Starting from scratch, to see how the event-based features work

Both seemed much improved since the last time I tried them, but I still had a
couple of problems getting started with the trial, which I hope experienced
users can help with.

In the Master Genealogist I tried to start a new project with the Wizard, and
it said "Click the Next button to continue", but I couldn't see the Next
button.

In Genbox I started a new database, tried to enter an Event, but couldn't see
how to save it.

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Bob Velke · Legg inn av **Bob Velke** » 19. februar 2006 kl. 23.52

Steve said:

In the Master Genealogist I tried to start a new project with the Wizard, and
it said "Click the Next button to continue", but I couldn't see the Next
button.

It would help to know what step in the Wizard you were on. The Next button
is disabled (but not invisible) if you haven't completed the requested
fields on that screen.

If you can't see any buttons along the bottom at all, then it would seem
that the window is bigger than the screen and you need to reduce it (e.g.,
by dragging the top edge top and then dragging the window up).

Bob Velke
Wholly Genes Software
http://www.WhollyGenes.com

Steve Hayes · Legg inn av **Steve Hayes** » 20. februar 2006 kl. 3.29

On Sun, 19 Feb 2006 21:58:59 +0000 (UTC), [email protected] (Bob Velke)
wrote:

Steve said:

In the Master Genealogist I tried to start a new project with the Wizard, and
it said "Click the Next button to continue", but I couldn't see the Next
button.

It would help to know what step in the Wizard you were on. The Next button
is disabled (but not invisible) if you haven't completed the requested
fields on that screen.

If you can't see any buttons along the bottom at all, then it would seem
that the window is bigger than the screen and you need to reduce it (e.g.,
by dragging the top edge top and then dragging the window up).

Thanks, I amanaged to do that and see the buttons at the bottom of the screen.

But then there were fields to fill in -- it asked for my name, date of birth
etc. But I could not get to the field below county, and I could not get down
to it, and could find no scroll bars to move the screen up so that I could
reach it.

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Lars Erik Bryld

Scripsit Steve Hayes:

In Genbox I started a new database, tried to enter an Event, but
couldn't see how to save it.

There is no save option i Genbox, because everything is saved
instantaneously the moment you leave any particular field. Only
exception is if you leave every field of the event form empty.

--
Regards
Lars Erik Bryld

Gjest · Legg inn av **Gjest** » 20. februar 2006 kl. 18.49

On Mon, 20 Feb 2006 17:20:21 +0100, Lars Erik Bryld
<[email protected]> wrote:

Scripsit Steve Hayes:

In Genbox I started a new database, tried to enter an Event, but
couldn't see how to save it.

There is no save option i Genbox, because everything is saved
instantaneously the moment you leave any particular field. Only
exception is if you leave every field of the event form empty.

Which is why I didn't like the program.

--

Dennis K.

wim · Legg inn av **wim** » 22. februar 2006 kl. 15.31

virtual every database works this way. If you don't want the record:
adapt it or delete it. I don't see why there should be a save option
for every record. That's like like having to save every typed paragraph
in Word separately.

Wim

Gjest · Legg inn av **Gjest** » 22. februar 2006 kl. 19.14

On 22 Feb 2006 06:31:17 -0800, "wim" <[email protected]> wrote:

virtual every database works this way. If you don't want the record:
adapt it or delete it. I don't see why there should be a save option
for every record. That's like like having to save every typed paragraph
in Word separately.

Most programs I use don't perform saves as I tab from field to field on
a single window.

I'm not going to get into an argument here. Software developers are free
to implement things as they see fit. I was just explaining my reason for
not buying Genbox. Others may like their implementation. I didn't. Maybe
the Genbox people want to hear the reasons why prospective customers
walk away from their product?

--

Dennis K.

Lars Erik Bryld

Scripsit Dennis K.:

Most programs I use don't perform saves as I tab from field to field
on a single window.

I'm not going to get into an argument here. Software developers are
free to implement things as they see fit. I was just explaining my
reason for not buying Genbox. Others may like their implementation.
I didn't. Maybe the Genbox people want to hear the reasons why
prospective customers walk away from their product?

I'd say the price could be a more pressing issue, and that's been
taken care of, recently

)

I was under the impression, that your genealogy application might be
Legacy, and at least that application use the database default
behaviour of instant update when editing fields, just like Genbox. If
memory serves, so did Heredis and PAF. What TMG or FTM does, i have
repressed, and Brother's Keeper, I've never tried.

--
Regards
Lars Erik Bryld

Gjest · Legg inn av **Gjest** » 23. februar 2006 kl. 18.20

On Thu, 23 Feb 2006 17:31:17 +0100, Lars Erik Bryld
<[email protected]> wrote:

I was under the impression, that your genealogy application might be
Legacy

Correct.

, and at least that application use the database default
behaviour of instant update when editing fields

Partially correct (unless I am misunderstanding what you mean). I think
that editing from the Name List sort of works that way (I think the
database gets updated when you move to another record or move off the
Edit tab --- no Save button involved). Every other dialog that I can
think of has a Save button.

My only point was that my first impression with Genbox was "confusion".
Since I trialed Genbox 2 years ago I don't remember much else (other
than having many import problems).

--

Dennis K.

Lars Erik Bryld

Scripsit Dennis K.:

, and at least that application [i.e. Legacy] use the database
default behaviour of instant update when editing fields

Partially correct (unless I am misunderstanding what you mean). I
think that editing from the Name List sort of works that way (I
think the database gets updated when you move to another record or
move off the Edit tab --- no Save button involved). Every other
dialog that I can think of has a Save button.

Except location - also works like Names, as far as I remember

My only point was that my first impression with Genbox was
"confusion". Since I trialed Genbox 2 years ago I don't remember
much else (other than having many import problems).

Genbox is confusing for many good reasons. I believe, that is the
unavoidable prize to pay for the refined features it offers. Genbox
excels in source conflicts, name variants, Byzantic administrative
circumstances, and unilateral kinship linking, to name a few.
Unfortunately, the implication is that Genbox not always does behave
the way you'd expected it to do. In such cases, the error usually is
to blame on the user, not the application.

I generally recommend Legacy to Genealogy or Application novices, the
migrating to Genbox when seasoned in both departments.

--
Regards
Lars Erik Bryld

Gjest · Legg inn av **Gjest** » 24. februar 2006 kl. 15.56

On Thu, 23 Feb 2006 20:40:55 +0100, Lars Erik Bryld
<[email protected]> wrote:

, and at least that application [i.e. Legacy] use the database
default behaviour of instant update when editing fields

Partially correct (unless I am misunderstanding what you mean). I
think that editing from the Name List sort of works that way (I
think the database gets updated when you move to another record or
move off the Edit tab --- no Save button involved). Every other
dialog that I can think of has a Save button.

Except location - also works like Names, as far as I remember

Nope. Editing a Master Location involves opening a dialog, editing the
data, and clicking Save. Unless you are thinking about something else.

My only point was that my first impression with Genbox was
"confusion". Since I trialed Genbox 2 years ago I don't remember
much else (other than having many import problems).

Genbox is confusing for many good reasons. I believe, that is the
unavoidable prize to pay for the refined features it offers.

I should have specified that my confusion was over the user interface,
not the features. I've heard good things about Genbox, which is why I
demo'd it a couple of years ago. But I couldn't get past those hurdles
(IU and import).

Not that I think Legacy is perfect. I don't like their use of modal
dialogs for everything. I think that the program is beginning to suffer
a little bloat ... actually a lot of bloat. I think their reports (both
print and html) could use some work. Etc.

I guess if I want the perfect program I will have to design and write it
myself. ;-)

Lars Erik Bryld

Scripsit Dennis K.:

move off the Edit tab --- no Save button involved). Every other
dialog that I can think of has a Save button.

Except location - also works like Names, as far as I remember

Nope. Editing a Master Location involves opening a dialog, editing
the data, and clicking Save. Unless you are thinking about
something else.

I probably use the word "edit" in an imprecise manner. If you fill in
a Location-field, Legacy will suggest a name already in the list as
you type. If you digress from the content of the Master Location List,
then your typed name will be instantly added to the list as you leave
the field (without a save-button).

I should have specified that my confusion was over the user
interface, not the features. I've heard good things about Genbox,
which is why I demo'd it a couple of years ago. But I couldn't get
past those hurdles (IU and import).

I won't excuse import problems other than remark, that most of these
generally are the results of other application's export problems

)

Not that I think Legacy is perfect. I don't like their use of modal
dialogs for everything. I think that the program is beginning to
suffer a little bloat ... actually a lot of bloat. I think their
reports (both print and html) could use some work. Etc.

Actually, I think that Reports in Legacy are rather good. Not as
delivered, but very customisable. I find that my customised version of
Legacy produces better narratives than Genbox in my native (Danish)
language. If only they could leave that bells'n'whistles thing with
in-built web browsers, and instead start adding actually useful
features, like witness support, then I'd probably never migrated.

--
Med venlig hilsen
Lars Erik Bryld

Gjest · Legg inn av **Gjest** » 24. februar 2006 kl. 18.28

On Fri, 24 Feb 2006 17:17:04 +0100, Lars Erik Bryld
<[email protected]> wrote:

Nope. Editing a Master Location involves opening a dialog, editing
the data, and clicking Save. Unless you are thinking about
something else.

I probably use the word "edit" in an imprecise manner. If you fill in
a Location-field, Legacy will suggest a name already in the list as
you type. If you digress from the content of the Master Location List,
then your typed name will be instantly added to the list as you leave
the field (without a save-button).

Gotcha. That also happens in several other places when the field you are
leaving involves a related table.

I should have specified that my confusion was over the user
interface, not the features. I've heard good things about Genbox,
which is why I demo'd it a couple of years ago. But I couldn't get
past those hurdles (IU and import).

I won't excuse import problems other than remark, that most of these
generally are the results of other application's export problems )

That's what every application wants you to believe. ;-)

--

Dennis K.

Doug McDonald · Legg inn av **Doug McDonald** » 24. februar 2006 kl. 19.25

Dennis K. wrote:

Most programs I use don't perform saves as I tab from field to field on
a single window.

That's true. But most relational databases do. And the genealogy
programs seem to be built on relational database engines ... this is
one reason their data files are so obscenely bloated.

Doug McDonald

Gjest · Legg inn av **Gjest** » 24. februar 2006 kl. 19.48

On Fri, 24 Feb 2006 12:25:27 -0600, Doug McDonald
<mcdonald@SnPoAM_scs.uiuc.edu> wrote:

Most programs I use don't perform saves as I tab from field to field on
a single window.

That's true. But most relational databases do.

Could you explain this for me? I think of the concept of "tabbing" as a
function of the user interface, not the database. Even in MS Access (the
UI which interfaces with the underlying database engine), the database
doesn't get updated if I tab to another field in the same row. The
update takes place when I move to another row.

Anyway, it's been so long since I trialed Genbox that I don't exactly
remember at what point the actual updating takes place. All I do
remember is that the lack of a "Save" button was not helping my
understanding of what was going on. Do they still offer free trials?
Maybe I should try it again to see if it makes any more sense to me now.

--

Dennis K.

john · Legg inn av **john** » 24. februar 2006 kl. 20.13

Dennis K. wrote:

On Fri, 24 Feb 2006 12:25:27 -0600, Doug McDonald
mcdonald@SnPoAM_scs.uiuc.edu> wrote:

Most programs I use don't perform saves as I tab from field to field on
a single window.

That's true. But most relational databases do.

Could you explain this for me? I think of the concept of "tabbing" as a
function of the user interface, not the database. Even in MS Access (the
UI which interfaces with the underlying database engine), the database
doesn't get updated if I tab to another field in the same row. The
update takes place when I move to another row.

Anyway, it's been so long since I trialed Genbox that I don't exactly
remember at what point the actual updating takes place. All I do
remember is that the lack of a "Save" button was not helping my
understanding of what was going on. Do they still offer free trials?
Maybe I should try it again to see if it makes any more sense to me now.

You aren't probably tabbing from field to field in the same row. The
database is unlikely to be a single flat table.
The fields are probably in different tables and are just displayed on a
form on the screen. So tabbing is from table to table.

Dennis Lee Bieber

On Fri, 24 Feb 2006 13:48:41 -0500, Dennis K. declaimed the following in
soc.genealogy.computing:

Could you explain this for me? I think of the concept of "tabbing" as a
function of the user interface, not the database. Even in MS Access (the
UI which interfaces with the underlying database engine), the database
doesn't get updated if I tab to another field in the same row. The
update takes place when I move to another row.

It sounds like you are viewing single tables at a time in Access.

Problem is, something like TMG stores names in one table, Event date
and type in another, the memo text is a third table, the place is
11-records in another table (obscene yes... I'll summarize places
later).

So what you see as "one" record on screen is really multiple records
in many tables.

TMG places:

Instead of a record of:

ID# field1 field2 field3 field4 ... field11

22 null null "San Jose" "Santa Clara" "CA" ... null

it uses

ID# Place# fieldPos fieldVal

99 22 3 "San Jose"
100 22 5 "CA"
....
206 22 4 "Santa Clara"

(the jump in ID# reflects, for example, that the county name had not
been original to the data entry, but was added later, after some 100
other place data records).

The event record only knows that it is "place 22", and has to
fetch/sort/position all data associated with "22" from another table. If
you change the county, say, and tab to the next place field, it has to
modify the "position 4" record for that place (if it didn't have one,
one shall be created).

{It's actually worse -- there can be a "place style", and the style
defines labels for each position; so there is another table that has to
be referenced.}
--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

Steve Hayes · Legg inn av **Steve Hayes** » 24. februar 2006 kl. 20.26

On Fri, 24 Feb 2006 12:25:27 -0600, Doug McDonald
<mcdonald@SnPoAM_scs.uiuc.edu> wrote:

That's true. But most relational databases do. And the genealogy
programs seem to be built on relational database engines ... this is
one reason their data files are so obscenely bloated.

Eh?

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Leif B. Kristensen

Doug McDonald skrev:

Dennis K. wrote:

Most programs I use don't perform saves as I tab from field to field
on a single window.

That's true. But most relational databases do.

No they don't. As has already been mentioned, that would be a function
of the interface, not of the database.

And the genealogy
programs seem to be built on relational database engines ... this is
one reason their data files are so obscenely bloated.

One of the major benefits of relational databases, is the elimination of
recurring text strings by the formal process of "normalization". Thus,
I can't quite see how you've got the impression that relational db's
should have to be "obscenely bloated".

I've got my genealogy data in a PostgreSQL database, and a compressed
SQL dump of the entire dataset of about 14000 persons still fits on a
floppy disk.
--
Leif Biberg Kristensen
http://solumslekt.org/

Gjest · Legg inn av **Gjest** » 24. februar 2006 kl. 21.11

On Fri, 24 Feb 2006 19:13:58 GMT, Dennis Lee Bieber
<[email protected]> wrote:

Could you explain this for me? I think of the concept of "tabbing" as a
function of the user interface, not the database.

[snip]

It sounds like you are viewing single tables at a time in Access.

Problem is, something like TMG stores names in one table, Event date
and type in another, the memo text is a third table, the place is
11-records in another table (obscene yes... I'll summarize places
later).

I understand what you are saying and why the database gets updated in
this situation. Without the context you provided, I just didn't
understand his statement that "most relational databases do" with
regards to saving when moving from field to field.

Thanks,

Dennis K.

Paul Blair · Legg inn av **Paul Blair** » 24. februar 2006 kl. 23.28

Dennis K. wrote:

On Thu, 23 Feb 2006 20:40:55 +0100, Lars Erik Bryld
[email protected]> wrote:

, and at least that application [i.e. Legacy] use the database
default behaviour of instant update when editing fields
Partially correct (unless I am misunderstanding what you mean). I
think that editing from the Name List sort of works that way (I
think the database gets updated when you move to another record or
move off the Edit tab --- no Save button involved). Every other
dialog that I can think of has a Save button.
Except location - also works like Names, as far as I remember

Nope. Editing a Master Location involves opening a dialog, editing the
data, and clicking Save. Unless you are thinking about something else.

My only point was that my first impression with Genbox was
"confusion". Since I trialed Genbox 2 years ago I don't remember
much else (other than having many import problems).
Genbox is confusing for many good reasons. I believe, that is the
unavoidable prize to pay for the refined features it offers.

I should have specified that my confusion was over the user interface,
not the features. I've heard good things about Genbox, which is why I
demo'd it a couple of years ago. But I couldn't get past those hurdles
(IU and import).

Not that I think Legacy is perfect. I don't like their use of modal
dialogs for everything. I think that the program is beginning to suffer
a little bloat ... actually a lot of bloat. I think their reports (both
print and html) could use some work. Etc.

I guess if I want the perfect program I will have to design and write it
myself.

Bloat...ah yes. Results for import from one GEDCOM of 8000 folk, no pics.

FTW = 24.7Mb
Legacy = 15.8Mb
TMG = 43.5Mb
Genbox = 15.6Mb

TMG uses a lot of files, so there is probably a heap of slack space in
there. And all their dates insist on wasting space with a leading zero,
writing "08 Feb 2006" :-)

In truth, it's probably time for software houses to rethink their
products. Software itself is changing from stand-alone to web-centric.
The American family pattern, which underlies most commercial offerings,
is now less valid than before, as the world changes.

Paul

Doug McDonald · Legg inn av **Doug McDonald** » 24. februar 2006 kl. 23.41

Leif B. Kristensen wrote:

Doug McDonald skrev:

And the genealogy
programs seem to be built on relational database engines ... this is
one reason their data files are so obscenely bloated.

One of the major benefits of relational databases, is the elimination of
recurring text strings by the formal process of "normalization". Thus,
I can't quite see how you've got the impression that relational db's
should have to be "obscenely bloated".

For example, my own Access datafile from Legacy is 13 meghabytes, for
3200 people and lot of notes text. A Gedcom of the file is 4.2 megabytes,
and a zip backup of the file is 2.6 megabytes. That's bloat.

A gedcom without notes is 1.26 megabytes. The Legacy/Access .fdb from this is
7.5 megabytes, and the .zip of it is 1.8 megabytes.

Doug McDonald

Gjest · Legg inn av **Gjest** » 24. februar 2006 kl. 23.44

On Sat, 25 Feb 2006 09:28:25 +1100, Paul Blair <[email protected]>
wrote:

Bloat...ah yes. Results for import from one GEDCOM of 8000 folk, no pics.

Well, I was referring to code bloat. I believe the Legacy 5.0 executable
was about 15Mb while 6.0 is about 19Mb. Plus 6.0 comes with the new 20Mb
(uncompressed) Research Guidance file, which many users don't even use.

--

Dennis K.

Paul Blair · Legg inn av **Paul Blair** » 24. februar 2006 kl. 23.49

Dennis K. wrote:

On Sat, 25 Feb 2006 09:28:25 +1100, Paul Blair <[email protected]
wrote:

Bloat...ah yes. Results for import from one GEDCOM of 8000 folk, no pics.

Well, I was referring to code bloat. I believe the Legacy 5.0 executable
was about 15Mb while 6.0 is about 19Mb. Plus 6.0 comes with the new 20Mb
(uncompressed) Research Guidance file, which many users don't even use.

Sort of suggests its time to rationalise and house clean...

Paul

T.M. Sommers · Legg inn av **T.M. Sommers** » 25. februar 2006 kl. 2.49

Doug McDonald wrote:

Leif B. Kristensen wrote:
Doug McDonald skrev:

And the genealogy programs seem to be built on relational database
engines ... this is
one reason their data files are so obscenely bloated.

One of the major benefits of relational databases, is the elimination of
recurring text strings by the formal process of "normalization". Thus,
I can't quite see how you've got the impression that relational db's
should have to be "obscenely bloated".

For example, my own Access datafile from Legacy is 13 meghabytes, for
3200 people and lot of notes text. A Gedcom of the file is 4.2 megabytes,
and a zip backup of the file is 2.6 megabytes. That's bloat.

No, that is a consequence of using fixed-width fields in the
database, which technique is not unique to relational databases.
There are very good reasons for using fixed-width fields.

--
Thomas M. Sommers -- [email protected] -- AB2SB

Steve Hayes · Legg inn av **Steve Hayes** » 25. februar 2006 kl. 4.59

On Sat, 25 Feb 2006 09:49:34 +1100, Paul Blair <[email protected]> wrote:

Well, I was referring to code bloat. I believe the Legacy 5.0 executable
was about 15Mb while 6.0 is about 19Mb. Plus 6.0 comes with the new 20Mb
(uncompressed) Research Guidance file, which many users don't even use.

Sort of suggests its time to rationalise and house clean...

This is a message that needs to get across to software developers.
One-size-fits-all programs generally become so bloated that the collapse under
their own weight.

I haven't seem this "Research Guidance" thingy having seen no need to upgrade
Legacy until it offers *real* improvements, but it sounds as though it should
be in a separate progra,

I remember there was a rather useful disk utility called PCTools. It helped
one to repair disk defects, recover lost data etc. And then the developers
tried to turn it into an "everything" program - diary, address book, etc, and
put in all kinds of features that other programs did better.

It is sad to see the same thing happening to Legacy.

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Steve Hayes · Legg inn av **Steve Hayes** » 25. februar 2006 kl. 5.04

On Fri, 24 Feb 2006 21:01:43 +0100, "Leif B. Kristensen"
<[email protected]> wrote:

Doug McDonald skrev:

Dennis K. wrote:

Most programs I use don't perform saves as I tab from field to field
on a single window.

That's true. But most relational databases do.

No they don't. As has already been mentioned, that would be a function
of the interface, not of the database.

And the genealogy
programs seem to be built on relational database engines ... this is
one reason their data files are so obscenely bloated.

One of the major benefits of relational databases, is the elimination of
recurring text strings by the formal process of "normalization". Thus,
I can't quite see how you've got the impression that relational db's
should have to be "obscenely bloated".

Quite - the whole idea of relational databases is to save space.

I've given up on relational database programs (except for specialisty packages
like genealogy programs) because no sooner have you begun to learn them than
they are obsolete and you have to start learningf all over again.

Sop I keep things li8ke parish register transcriptions in a flat file
database. It means that if I transcribe 100 records from one parish, the
parish name appears 100 times in the database. If it were a relational
database, the parish name would be stored only once. So the "obscenely
bloated" is a great mystery.

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

T.M. Sommers · Legg inn av **T.M. Sommers** » 25. februar 2006 kl. 6.40

Steve Hayes wrote:

On Sat, 25 Feb 2006 09:49:34 +1100, Paul Blair <[email protected]> wrote:

Well, I was referring to code bloat. I believe the Legacy 5.0 executable
was about 15Mb while 6.0 is about 19Mb. Plus 6.0 comes with the new 20Mb
(uncompressed) Research Guidance file, which many users don't even use.

Sort of suggests its time to rationalise and house clean...

This is a message that needs to get across to software developers.
One-size-fits-all programs generally become so bloated that the collapse under
their own weight.

The developers know that. Try to get the managers and
marketroids to understand it. They are the ones who love feature
matrices.

--
Thomas M. Sommers -- [email protected] -- AB2SB

T.M. Sommers · Legg inn av **T.M. Sommers** » 25. februar 2006 kl. 6.50

Steve Hayes wrote:

I've given up on relational database programs (except for
specialisty packages
like genealogy programs) because no sooner have you begun to learn them than
they are obsolete and you have to start learningf all over again.

Which ones? The ones I know about (free ones) seem to be pretty
stable. After all, no one wants to break existing apps. And
there is always xBase, which hasn't been under active development
in years. And who says you can't keep using an older version, if
drastic changes are made?

Sop I keep things li8ke parish register transcriptions in a flat file
database. It means that if I transcribe 100 records from one parish, the
parish name appears 100 times in the database. If it were a relational
database, the parish name would be stored only once.

For something that small, you could build a relational database
using text files, using things like awk and join. The AWK book
even contains an implementation of a subset of SQL in awk. If
you are not using some flavor of Unix, download Cygwin.

--
Thomas M. Sommers -- [email protected] -- AB2SB

Dennis Lee Bieber

On Sat, 25 Feb 2006 09:28:25 +1100, Paul Blair <[email protected]>
declaimed the following in soc.genealogy.computing:

TMG uses a lot of files, so there is probably a heap of slack space in
there. And all their dates insist on wasting space with a leading zero,
writing "08 Feb 2006"

Most of the TMG total would likely be in the indexes into those

files. They went out of their way to avoid empty space in the place
table, for instance.

And internally, unless it is an irregular date, dates are stored
quite compactly.

The table descriptions (for version 3.6, but they haven't changed
much) used to be available on the WG web site.
--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

Dennis Lee Bieber

On Sat, 25 Feb 2006 06:04:47 +0200, Steve Hayes <[email protected]>
declaimed the following in soc.genealogy.computing:

Sop I keep things li8ke parish register transcriptions in a flat file
database. It means that if I transcribe 100 records from one parish, the
parish name appears 100 times in the database. If it were a relational
database, the parish name would be stored only once. So the "obscenely
bloated" is a great mystery.

Only if the designer of the database schema designed it with a
"parish" table, which can have just (say)

ID# Parish_Name

and your transcribed records all then link, using the ID# to the parish
table.

But tables just for parish records are unlikely to be a concept for
most genealogy programs -- the schema designers could have felt the
overhead of a second table with an ID# and need for a join operation to
retrieve data did not win out over a direct text field in each record
(especially if it was felt that the typical use would not have lots of
repeats).
--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

Steve Hayes · Legg inn av **Steve Hayes** » 25. februar 2006 kl. 8.53

On Sat, 25 Feb 2006 00:50:36 -0500, "T.M. Sommers" <[email protected]> wrote:

Steve Hayes wrote:

I've given up on relational database programs (except for
specialisty packages
like genealogy programs) because no sooner have you begun to learn them than
they are obsolete and you have to start learningf all over again.

Which ones? The ones I know about (free ones) seem to be pretty
stable. After all, no one wants to break existing apps. And
there is always xBase, which hasn't been under active development
in years. And who says you can't keep using an older version, if
drastic changes are made?

Paradox.

I was just beginning to get the hang of doing things with it when they changed
the whole thing with ObjectPal, and when I went to Widnows 98 my Paradox 4.5
would no longer work, and by then the original disks were bad so I could no
longer reinstal to see if that would help.

I have Access and OpenOffice Base, but I really don't have much clue how to
use them. I xan design tables and link them, but the forms come out looking
like a dog's breakfast, with boxes all different sizes. And by the time I've
learnt how to do that they'll bring out an "Upgrade" that won't run on my
hardware, and by the time I've saved up for new hardware the upgrade will be
obsolete.

So instead of wasting my time trying to learn how to work programs in a
tortoise and hare fashion, so I'll die before I begin entering any data, a
rather spent my time entering data in the programs I already have.

Sop I keep things li8ke parish register transcriptions in a flat file
database. It means that if I transcribe 100 records from one parish, the
parish name appears 100 times in the database. If it were a relational
database, the parish name would be stored only once.

For something that small, you could build a relational database
using text files, using things like awk and join. The AWK book
even contains an implementation of a subset of SQL in awk. If
you are not using some flavor of Unix, download Cygwin.

But then I'd have to learn to use AWK. I tried it once, and I thought I'd
leave it to the hackers.

But perhaps if hackers and users and developers collaborate, we might be able
to help developers improve existing programs, and create a few new
collaborative open source ones (in Base?).

Thjat's one reason I started the gensoft forum:

http://groups.yahoo.com/group/gensoft/

Tell us about AWK there!

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Steve Hayes · Legg inn av **Steve Hayes** » 25. februar 2006 kl. 8.58

On Sat, 25 Feb 2006 06:10:00 GMT, Dennis Lee Bieber <[email protected]>
wrote:

On Sat, 25 Feb 2006 06:04:47 +0200, Steve Hayes <[email protected]
declaimed the following in soc.genealogy.computing:

Sop I keep things li8ke parish register transcriptions in a flat file
database. It means that if I transcribe 100 records from one parish, the
parish name appears 100 times in the database. If it were a relational
database, the parish name would be stored only once. So the "obscenely
bloated" is a great mystery.

Only if the designer of the database schema designed it with a
"parish" table, which can have just (say)

ID# Parish_Name

and your transcribed records all then link, using the ID# to the parish
table.

But tables just for parish records are unlikely to be a concept for
most genealogy programs -- the schema designers could have felt the
overhead of a second table with an ID# and need for a join operation to
retrieve data did not win out over a direct text field in each record
(especially if it was felt that the typical use would not have lots of
repeats).

The developer of the schema is me.

The "schema" (if you can call it that, is this -- I designed it.

Define Data Structure
Name of structure: BAPREG
Description line (optional): Recording entries from baptism registers
Record ID field(s): ID
Order key field(s): SURNAME GIVNAME
LABEL NAME INDEX SORT EMPHASIS
ID * Y 9 1
EN ENTNUM T 9 1
BP BAPDATE T 4 1
BD BIRDATE T 4 1
SX SEX N
SN SURNAME Y 9 1
GN GIVNAME Y 9 1
FS FATHSNAME Y 9 1
MS MOTHSNAME Y 9 1
OC OCCUPAT N
AB ABODE N
SP SPONSORS K 9 1
NO NOTES K 9 1
CO COMMENTS N
CL CLERGY N
PN PAGE N
CP PLACE K 9 1
DN DENOM N
DS DATESEEN N
SO SOURCE K 9 1

Stop words: A AN AND BY FOR FROM IN OF ON THE TO WITH
Leading articles: THE A AN

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Herman Viaene · Legg inn av **Herman Viaene** » 25. februar 2006 kl. 10.02

Steve Hayes wrote:

On Sat, 25 Feb 2006 00:50:36 -0500, "T.M. Sommers" <[email protected]> wrote:

Steve Hayes wrote:

I've given up on relational database programs (except for
specialisty packages
like genealogy programs) because no sooner have you begun to learn them
than they are obsolete and you have to start learningf all over again.

Which ones? The ones I know about (free ones) seem to be pretty
stable. After all, no one wants to break existing apps. And
there is always xBase, which hasn't been under active development
in years. And who says you can't keep using an older version, if
drastic changes are made?

Paradox.

Well, that is not a relational database to begin with. Just like it's
"brother" at that time dBASE, Which i used a lot.

I was just beginning to get the hang of doing things with it when they
changed the whole thing with ObjectPal, and when I went to Widnows 98 my
Paradox 4.5 would no longer work, and by then the original disks were bad
so I could no longer reinstal to see if that would help.

I know the feeling, and again it was a mistake of them to mix interface
language and database language in one. I've ever written a few small apps
in dBASE SQL (I never came across anyone else who did the same) and it was
quite easy for me to port these later to Oracle.

I have Access and OpenOffice Base..
snip...

I have been teaching Access, that's enough to steer away from it as far as
possible.
And all the other issues you mention are purely interface issues (might be
important on their own), but have again nothing to do with the underlying
database.
So, I would say, learn about database normalization (if you didn't before),
exercise SQL and pick an interface designer - or design langauge - which
suits you and supports SQL, and you'll be happy.

Herman Viaene

Joe Makowiec · Legg inn av **Joe Makowiec** » 25. februar 2006 kl. 11.50

On 25 Feb 2006 in soc.genealogy.computing, Steve Hayes wrote:

But perhaps if hackers and users and developers collaborate, we
might be able to help developers improve existing programs, and
create a few new collaborative open source ones (in Base?).

SQLite: http://sqlite.org/

Open source, single program/file, relational. Binaries available for
Linux, Windows, OS X

--
Joe Makowiec
http://makowiec.org/
Email: http://makowiec.org/contact/?Joe

Doug McDonald · Legg inn av **Doug McDonald** » 25. februar 2006 kl. 20.47

T.M. Sommers wrote:

For example, my own Access datafile from Legacy is 13 meghabytes, for
3200 people and lot of notes text. A Gedcom of the file is 4.2 megabytes,
and a zip backup of the file is 2.6 megabytes. That's bloat.

No, that is a consequence of using fixed-width fields in the database,
which technique is not unique to relational databases. There are very
good reasons for using fixed-width fields.

Logical fixed-length fields do not imply the use of fixed-length
fields in stored files on disk. The DO result in extreme bloat,
and reduced performance. This reduced performance appears, of
course, only if the fields are mostly empty, which they usually are
in Legacy or other genealogical file. This excludes "memo" fields
which are variable length.

Doug McDonald

Dennis Lee Bieber

On Sat, 25 Feb 2006 09:59:04 +0200, Steve Hayes <[email protected]>
declaimed the following in soc.genealogy.computing:

The "schema" (if you can call it that, is this -- I designed it.

My regrets, but it doesn't show /me/ (at least) the type of

information I'd be looking for in a schema (SQL DDL) [and I don't have a
good example of linked tables; when I upgraded to a new computer, the
VisualCE data from my PDA didn't import correctly, JET added an
additional autonumber field instead of using the index I already had
defined -- subsequent synchronizations fail]

LABEL NAME INDEX SORT EMPHASIS
ID * Y 9 1
EN ENTNUM T 9 1
BP BAPDATE T 4 1
BD BIRDATE T 4 1
SX SEX N

I have no understanding of what your "sort" or "emphasis" columns
represent, nor what size any field is.

SN SURNAME Y 9 1
GN GIVNAME Y 9 1
FS FATHSNAME Y 9 1
MS MOTHSNAME Y 9 1

That looks like an awful lot of indices to maintain; especially
since each index typically duplicates the field data and then adds a
pointer to the record.

OC OCCUPAT N

I'll admit I'm being picky here, but if that is "occupation", might
I ask "whose"? Father, Mother, or individual being baptized?

DN DENOM N

I'd probably create a separate table just to hold denominations and
a unique ID/index. The field in this table then would only hold the ID#
of the record in the other table.

{The following bit of nonsense is one of the unlinked tables I have for
a web site I update at times; this is a MySQL table create statement.
This is a wasteful table, as I use fixed width character fields (the
longest "name" is 56, "URL" is 32, no "banner" currently, "site" and
"notes" are 32 and 75 -- they operate faster as the engine can just
jump from record to record; a single varchar() field requires the engine
to read sequentially to find record boundaries. Field width for integers
is display width, not internal [which are 32 or 64 bit; 4 or 8 byte]

CREATE TABLE `conventions` (
`ID` int(11) NOT NULL auto_increment,
`name` char(100) NOT NULL default '',
`URL` char(75) default NULL,
`banner` char(50) default NULL,
`width` int(11) default NULL,
`height` int(11) default NULL,
`sortdate` date NOT NULL default '0000-00-00',
`dates` char(50) NOT NULL default 'TBD',
`site` char(75) NOT NULL default 'TBD',
`notes` char(100) default NULL,
PRIMARY KEY (`ID`),
KEY `sortdate` (`sortdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1'

}
--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

Steve Hayes · Legg inn av **Steve Hayes** » 26. februar 2006 kl. 2.50

On Sat, 25 Feb 2006 22:03:36 GMT, Dennis Lee Bieber <[email protected]>
wrote:

On Sat, 25 Feb 2006 09:59:04 +0200, Steve Hayes <[email protected]
declaimed the following in soc.genealogy.computing:

The "schema" (if you can call it that, is this -- I designed it.

My regrets, but it doesn't show /me/ (at least) the type of
information I'd be looking for in a schema (SQL DDL) [and I don't have a
good example of linked tables; when I upgraded to a new computer, the
VisualCE data from my PDA didn't import correctly, JET added an
additional autonumber field instead of using the index I already had
defined -- subsequent synchronizations fail]

LABEL NAME INDEX SORT EMPHASIS
ID * Y 9 1
EN ENTNUM T 9 1
BP BAPDATE T 4 1
BD BIRDATE T 4 1
SX SEX N

I have no understanding of what your "sort" or "emphasis" columns
represent, nor what size any field is.

The size of any field is determined by the data in it, plus the label.

The Index column indicates a Term index (first 80 characters, for sorting), a
keyword index (words only, for searching) or Y for both types of index. The
sort type indicates whether it is word for word, letter for letter, numbers
having letter values, numbers having numeric values, date (sort code 4), or
code (for library cataloguing codes etc).

The Empgasis column has to do with repeating fields. Are they all treated
equally? Or is only the first one counted. For example if you have an Update
field that shows when the record was updated, the emphasis code can make sure
it is sorted on the most recent update only.

SN SURNAME Y 9 1
GN GIVNAME Y 9 1
FS FATHSNAME Y 9 1
MS MOTHSNAME Y 9 1

That looks like an awful lot of indices to maintain; especially
since each index typically duplicates the field data and then adds a
pointer to the record.

Yes, but if you are looking for all the children of certain parents in order
to try to sort into potential families, it's what makes it useful. There's a
certain amount of blopat, but a lot less than sending HTML copies of e-mail
messages.

OC OCCUPAT N

I'll admit I'm being picky here, but if that is "occupation", might
I ask "whose"? Father, Mother, or individual being baptized?

Whatever is recorded in that column in the register, sometimes headed
"Quality, Trade or Profession". In the case of infants, it's usually the
father's, but in the case of those of riper years, it may be the candidate's.

DN DENOM N

I'd probably create a separate table just to hold denominations and
a unique ID/index. The field in this table then would only hold the ID#
of the record in the other table.

Yes, so would I, if it were a relational database. But it isn't, because of
the difficulties I mentioned of keeping up with the "uopgrades".

I did say that this is a "flat file" database, and i usde it in spite of its
shortcomings because I know how to use it, and can spend more time entering
data than learning the program .

{The following bit of nonsense is one of the unlinked tables I have for
a web site I update at times; this is a MySQL table create statement.
This is a wasteful table, as I use fixed width character fields (the
longest "name" is 56, "URL" is 32, no "banner" currently, "site" and
"notes" are 32 and 75 -- they operate faster as the engine can just
jump from record to record; a single varchar() field requires the engine
to read sequentially to find record boundaries. Field width for integers
is display width, not internal [which are 32 or 64 bit; 4 or 8 byte]

CREATE TABLE `conventions` (
`ID` int(11) NOT NULL auto_increment,
`name` char(100) NOT NULL default '',
`URL` char(75) default NULL,
`banner` char(50) default NULL,
`width` int(11) default NULL,
`height` int(11) default NULL,
`sortdate` date NOT NULL default '0000-00-00',
`dates` char(50) NOT NULL default 'TBD',
`site` char(75) NOT NULL default 'TBD',
`notes` char(100) default NULL,
PRIMARY KEY (`ID`),
KEY `sortdate` (`sortdate`)
) ENGINE=MyISAM DEFAULT CHARSET=latin1'

}

Somewhere I've got a copy of MySQL, but I never learnt how to use it.

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Dennis Lee Bieber

On Sun, 26 Feb 2006 03:50:29 +0200, Steve Hayes <[email protected]>
declaimed the following in soc.genealogy.computing:

I did say that this is a "flat file" database, and i usde it in spite of its
shortcomings because I know how to use it, and can spend more time entering
data than learning the program .

I suspect my CompSci background (and 25 years at Lockheed) makes me

shudder at file structures with repetitive data. Even old Hierarchical
database systems tried to reduce common items -- though they did get
messy at times. Relational databases actual started out as a theoretical
basis for viewing data, not for storing it; the idea being that the data
might still be in a hierarchical or network DBMS, but the user
views/accesses using relational algebra/calculus (or the combination now
known as SQL). xBase variants took the view and made it the storage (one
file per table), adding programmability to link between relations [a
nit: one table IS a relation -- the data in a row(tuple) is related to
itself, and to the key; links between tables are "joins"].

Forgive me, I'm rambling.

Somewhere I've got a copy of MySQL, but I never learnt how to use it.

With 4.1+, one can now download GUI based admin and query browsers
to access it. Quite an improvement over the command line tool set.

--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

T.M. Sommers · Legg inn av **T.M. Sommers** » 26. februar 2006 kl. 6.40

Steve Hayes wrote:

On Sat, 25 Feb 2006 00:50:36 -0500, "T.M. Sommers" <[email protected]> wrote:
Steve Hayes wrote:

I've given up on relational database programs (except for
specialisty packages
like genealogy programs) because no sooner have you begun to learn them than
they are obsolete and you have to start learningf all over again.

Which ones? The ones I know about (free ones) seem to be pretty
stable. After all, no one wants to break existing apps. And
there is always xBase, which hasn't been under active development
in years. And who says you can't keep using an older version, if
drastic changes are made?

Paradox.

I was just beginning to get the hang of doing things with it when they changed
the whole thing with ObjectPal, and when I went to Widnows 98 my Paradox 4.5
would no longer work, and by then the original disks were bad so I could no
longer reinstal to see if that would help.

Ah, I see. I was thinking of the file structure and the API,
which usually are fairly stable, or at least backwards
compatible. GUIs built on top of the db are another matter.

Sop I keep things li8ke parish register transcriptions in a flat file
database. It means that if I transcribe 100 records from one parish, the
parish name appears 100 times in the database. If it were a relational
database, the parish name would be stored only once.

For something that small, you could build a relational database
using text files, using things like awk and join. The AWK book
even contains an implementation of a subset of SQL in awk. If
you are not using some flavor of Unix, download Cygwin.

But then I'd have to learn to use AWK. I tried it once, and I thought I'd
leave it to the hackers.

If you know ObjectPascal (which is what I assume "ObjectPal" was
supposed to be), awk should be no problem at all.

But perhaps if hackers and users and developers collaborate, we might be able
to help developers improve existing programs, and create a few new
collaborative open source ones (in Base?).

Some open source programs already exist.

--
Thomas M. Sommers -- [email protected] -- AB2SB

T.M. Sommers · Legg inn av **T.M. Sommers** » 26. februar 2006 kl. 6.44

Doug McDonald wrote:

T.M. Sommers wrote:

For example, my own Access datafile from Legacy is 13 meghabytes, for
3200 people and lot of notes text. A Gedcom of the file is 4.2
megabytes,
and a zip backup of the file is 2.6 megabytes. That's bloat.

No, that is a consequence of using fixed-width fields in the database,
which technique is not unique to relational databases. There are very
good reasons for using fixed-width fields.

Logical fixed-length fields do not imply the use of fixed-length
fields in stored files on disk. The DO result in extreme bloat,
and reduced performance. This reduced performance appears, of
course, only if the fields are mostly empty, which they usually are
in Legacy or other genealogical file. This excludes "memo" fields
which are variable length.

I would think that fixed-length fields in the files would improve
performance:

Variable:

Read length of field
Read field

Fixed:

Read field

One read instead of two, no calculations, easy to jump to any row
in the file, and so on.

--
Thomas M. Sommers -- [email protected] -- AB2SB

T.M. Sommers · Legg inn av **T.M. Sommers** » 26. februar 2006 kl. 6.52

Steve Hayes wrote:

On Sat, 25 Feb 2006 22:03:36 GMT, Dennis Lee Bieber <[email protected]
wrote:

I'd probably create a separate table just to hold denominations and
a unique ID/index. The field in this table then would only hold the ID#
of the record in the other table.

Yes, so would I, if it were a relational database. But it isn't, because of
the difficulties I mentioned of keeping up with the "uopgrades".

I did say that this is a "flat file" database, and i usde it in spite of its
shortcomings because I know how to use it, and can spend more time entering
data than learning the program .

You can always fake relations with flat files. Instead of doing
a join, just manually seek the appropriate rows in the other table.

--
Thomas M. Sommers -- [email protected] -- AB2SB

Steve Hayes · Legg inn av **Steve Hayes** » 26. februar 2006 kl. 12.18

On Sun, 26 Feb 2006 00:52:27 -0500, "T.M. Sommers" <[email protected]> wrote:

I did say that this is a "flat file" database, and i usde it in spite of its
shortcomings because I know how to use it, and can spend more time entering
data than learning the program .

You can always fake relations with flat files. Instead of doing
a join, just manually seek the appropriate rows in the other table.

There ARE no tables. It's a tag-type database.

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Joe Makowiec · Legg inn av **Joe Makowiec** » 26. februar 2006 kl. 12.59

On 26 Feb 2006 in soc.genealogy.computing, T.M. Sommers wrote:

If you know ObjectPascal (which is what I assume "ObjectPal" was
supposed to be), awk should be no problem at all.

Nope, ObjectPAL, where PAL is an acronym for "Paradox Application
Language".

http://en.wikipedia.org/wiki/ObjectPAL

Some open source programs already exist.

MySQL, SQLite, PostgreSQL for starters. I think the OpenOffice.org folks
are working on a database component, too.

--
Joe Makowiec
http://makowiec.org/
Email: http://makowiec.org/contact/?Joe

Joe Makowiec · Legg inn av **Joe Makowiec** » 26. februar 2006 kl. 13.03

On 26 Feb 2006 in soc.genealogy.computing, T.M. Sommers wrote:

Doug McDonald wrote:
T.M. Sommers wrote:

For example, my own Access datafile from Legacy is 13 meghabytes,
for 3200 people and lot of notes text. A Gedcom of the file is
4.2 megabytes, and a zip backup of the file is 2.6 megabytes.
That's bloat.

No, that is a consequence of using fixed-width fields in the
database, which technique is not unique to relational databases.
There are very good reasons for using fixed-width fields.

Logical fixed-length fields do not imply the use of fixed-length
fields in stored files on disk. The DO result in extreme bloat, and
reduced performance. This reduced performance appears, of course,
only if the fields are mostly empty, which they usually are in
Legacy or other genealogical file. This excludes "memo" fields
which are variable length.

I would think that fixed-length fields in the files would improve
performance:

Variable:

Read length of field
Read field

Fixed:

Read field

One read instead of two, no calculations, easy to jump to any row
in the file, and so on.

MySQL can use VARCHAR fields for text data. They do exactly the
process you suggest, but the performance hit, if any, is minimal. But
it saves substantially on storage space.

http://dev.mysql.com/doc/refman/5.0/en/char.html

--
Joe Makowiec
http://makowiec.org/
Email: http://makowiec.org/contact/?Joe

Leif B. Kristensen

Joe Makowiec skrev:

MySQL, SQLite, PostgreSQL for starters. I think the OpenOffice.org
folks are working on a database component, too.

OpenOffice has a database GUI builder, with a JDBC connection. So far,
I've had no luck in getting it to work witt my PostgreSQL db.
--
Leif Biberg Kristensen
http://solumslekt.org/

Steve Hayes · Legg inn av **Steve Hayes** » 26. februar 2006 kl. 19.15

On Sun, 26 Feb 2006 11:59:21 GMT, Joe Makowiec <[email protected]>
wrote:

On 26 Feb 2006 in soc.genealogy.computing, T.M. Sommers wrote:

If you know ObjectPascal (which is what I assume "ObjectPal" was
supposed to be), awk should be no problem at all.

Nope, ObjectPAL, where PAL is an acronym for "Paradox Application
Language".

http://en.wikipedia.org/wiki/ObjectPAL

Some open source programs already exist.

MySQL, SQLite, PostgreSQL for starters. I think the OpenOffice.org folks
are working on a database component, too.

Already available -- it's called "Base".

--
Steve Hayes from Tshwane, South Africa
http://www.geocities.com/Athens/7734/stevesig.htm
E-mail - see web page, or parse: shayes at dunelm full stop org full stop uk

Dennis Lee Bieber

On Sun, 26 Feb 2006 00:40:56 -0500, "T.M. Sommers" <[email protected]>
declaimed the following in soc.genealogy.computing:

If you know ObjectPascal (which is what I assume "ObjectPal" was
supposed to be), awk should be no problem at all.

No... "Object(oriented) Paradox Application Language"

At least it isn't ASPECT (something I doubt very many have seen --
since there isn't much call these days for terminal emulation programs
for connecting to dial-up systems... Even though ProComm also has telnet
[on top of dial-up; not much of a change once you get above the
low-level send/receive path], news, and email modules -- all I use it
for is FTP) ASPECT was ProComm's scripting language -- looked like a
hybrid between C (things like atol, atof, ceil) and Visual BASIC (no
parens around arguments in calls: addtopath front back), with commands
for building GUI interfaces.
--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

Dennis Lee Bieber

On Sun, 26 Feb 2006 11:59:21 GMT, Joe Makowiec
<[email protected]> declaimed the following in
soc.genealogy.computing:

MySQL, SQLite, PostgreSQL for starters. I think the OpenOffice.org folks
are working on a database component, too.

Firebird is an independent spawning from the old Borland RDBM
(Interbase, was it) (The Mozilla people renamed their browser effort to
avoid conflict -- Firefox was originally Firebird)
--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

T.M. Sommers · Legg inn av **T.M. Sommers** » 27. februar 2006 kl. 5.06

Joe Makowiec wrote:

On 26 Feb 2006 in soc.genealogy.computing, T.M. Sommers wrote:

If you know ObjectPascal (which is what I assume "ObjectPal" was
supposed to be), awk should be no problem at all.

Nope, ObjectPAL, where PAL is an acronym for "Paradox Application
Language".

My mistake. At least they are both from Borland. Anyway, I
think my point remains, that learning awk should be no problem
for someone who knows ObjectPa{sc}al.

--
Thomas M. Sommers -- [email protected] -- AB2SB

Doug McDonald · Legg inn av **Doug McDonald** » 27. februar 2006 kl. 16.24

Dennis Lee Bieber wrote:

I suspect my CompSci background (and 25 years at Lockheed) makes me
shudder at file structures with repetitive data. Even old Hierarchical
database systems tried to reduce common items -- though they did get
messy at times. Relational databases actual started out as a theoretical
basis for viewing data, not for storing it; the idea being that the data
might still be in a hierarchical or network DBMS, but the user
views/accesses using relational algebra/calculus (or the combination now
known as SQL). xBase variants took the view and made it the storage (one
file per table), adding programmability to link between relations [a
nit: one table IS a relation -- the data in a row(tuple) is related to
itself, and to the key; links between tables are "joins"].

Forgive me, I'm rambling.

Genealogy is not exactly the perfect use for a true
relational system ... there are too many "gotchas". An
example of a true good use is the database I use for my
rock collection, and even that is not really perfectly
served, because some rocks may be "undertyped", and a true
relational database makes it hard to show the multiple
possibilities properly. It's possible to do as I group the
possibilities into "groups", but that's a kludge. This is
the same problem as one has in genealogy where one knows
that one or another person is, surely, a parent of another,
but you don;t know for sure which.

Doug McDonald

Doug McDonald · Legg inn av **Doug McDonald** » 27. februar 2006 kl. 17.58

T.M. Sommers wrote:

which are variable length.

I would think that fixed-length fields in the files would improve
performance:

Variable:

Read length of field
Read field

Fixed:

Read field

One read instead of two, no calculations, easy to jump to any row in the
file, and so on.

Uh, no, in many cases. The first calculation is "is the data
in the RAM-memory data cache, or do we need to go to disk to
find it?". A smaller total database means that there is a
higher probability that you won't need to go to disk.

Without an in-RAM disk cache, and if the disk file system
allows random disk seeks controlled by the program, then
that is where the fixed length fields .. and thus locations
in disk blocks ... speeds things up.

Doug McDonald

Doug McDonald · Legg inn av **Doug McDonald** » 27. februar 2006 kl. 17.59

Joe Makowiec wrote:

On 26 Feb 2006 in soc.genealogy.computing, T.M. Sommers wrote:

If you know ObjectPascal (which is what I assume "ObjectPal" was
supposed to be), awk should be no problem at all.

Nope, ObjectPAL, where PAL is an acronym for "Paradox Application
Language".

ObjectPAL is arguable the most difficult subject in the
world to learn. Certainly it is harder than quantum
mechanics or even quantum electrodynamics. Maybe quantum
gravity is harder, anybody know both?

Doug McDonald

Dennis Lee Bieber

On Mon, 27 Feb 2006 10:59:58 -0600, Doug McDonald
<mcdonald@SnPoAM_scs.uiuc.edu> declaimed the following in
soc.genealogy.computing:

ObjectPAL is arguable the most difficult subject in the
world to learn. Certainly it is harder than quantum
mechanics or even quantum electrodynamics. Maybe quantum
gravity is harder, anybody know both?

I suspect the research labs for the various quantum disciplines are

using MySQL, PostgreSQL, et al. rather than Paradox <G>
--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

T.M. Sommers · Legg inn av **T.M. Sommers** » 27. februar 2006 kl. 19.35

Doug McDonald wrote:

T.M. Sommers wrote:

which are variable length.

I would think that fixed-length fields in the files would improve
performance:

Variable:

Read length of field
Read field

Fixed:

Read field

One read instead of two, no calculations, easy to jump to any row in
the file, and so on.

Uh, no, in many cases. The first calculation is "is the data in the
RAM-memory data cache, or do we need to go to disk to find it?".

That calculation will be done by hardware or the OS, depending on
which cache.

A
smaller total database means that there is a
higher probability that you won't need to go to disk.

Depending on what else is using the cache. The question is
whether the hypothetical speedup is significant, or even
measurable. Has it been measured?

Without an in-RAM disk cache, and if the disk file system allows random
disk seeks controlled by the program, then that is where the fixed
length fields .. and thus locations in disk blocks ... speeds things up.

Once the first read is done, the data for the second will
probably be in a file buffer, so there will be only one disk
seek. But there are still two reads, two system calls, plus the
other overhead.

--
Thomas M. Sommers -- [email protected] -- AB2SB

T.M. Sommers · Legg inn av **T.M. Sommers** » 28. februar 2006 kl. 6.17

Joe Makowiec wrote:

MySQL can use VARCHAR fields for text data. They do exactly the
process you suggest, but the performance hit, if any, is minimal. But
it saves substantially on storage space.

Most, if not all, RDBMSes have something similar. I don't know
how much of a performance difference there is, hut I would also
say today that, with today's disks, the disk saving, except on
very large databases, is also minimal, relatively speaking. On a
100 gigabyte disk, who cares about a few tens of megabytes more
or less?

--
Thomas M. Sommers -- [email protected] -- AB2SB

Leif B. Kristensen

T.M. Sommers skrev:

I would think that fixed-length fields in the files would improve
performance:

That was true many years ago. Modern RDBMSes still support fixed-length
fields for the sake of backwards compatibility, but internally they are
usually implemented as variable-length fields. Strings are not stored
in the same physical tablespace as the integers. There, you'll find
just a pointer to the string. The overhead is negligible.

Consequently, you may as well use the general TEXT type for any textual
content of a database. But in some cases, as for consistency, you may
still want to use a CHAR or VARCHAR type where the length of the string
actually matters.
--
Leif Biberg Kristensen
http://solumslekt.org/

Dennis Lee Bieber

On Tue, 28 Feb 2006 08:06:18 +0100, "Leif B. Kristensen"
<[email protected]> declaimed the following in
soc.genealogy.computing:

That was true many years ago. Modern RDBMSes still support fixed-length
fields for the sake of backwards compatibility, but internally they are
usually implemented as variable-length fields. Strings are not stored
in the same physical tablespace as the integers. There, you'll find
just a pointer to the string. The overhead is negligible.

While the FoxPro variants (and maybe Paradox) separate fixed

numerics and text, not all RDBMs do...

MySQL "MyISAM" doesn't, and that is a common RDBM to find on web
hosting servers, its "InnoDB" format does use multiple files -- but that
is only to keep each file down to some "optimal" size, not to split
text; and if one were to specify BDB format, one has whatever that
engine does... What JET (Access) does is anyone's guess, at it stuffs
everything into a single file (so you still have lots of jumping around
within a file). I'm not sure what MaxDB (the MySQL supported release of
SAP DB) is doing. I'd have to check my textbook for Firebird/interbase
(unfortunately, MaxDB has very little hard-copy documentation).

--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

Leif B. Kristensen

Dennis Lee Bieber skrev:

MySQL "MyISAM" doesn't, and that is a common RDBM to find on web
hosting servers, its "InnoDB" format does use multiple files -- but
that is only to keep each file down to some "optimal" size, not to
split text; and if one were to specify BDB format, one has whatever
that engine does...

I don't know if a database engine that doesn't support neither
transactions nor foreign keys can be called "a modern RDBMS". I rather
thought of real RDBMSes like Oracle or PostgreSQL, both of which I have
some knowledge about.
--
Leif Biberg Kristensen
http://solumslekt.org/

Dennis Lee Bieber · Legg inn av **Dennis Lee Bieber** » 1. mars 2006 kl. 7.00

On Tue, 28 Feb 2006 19:27:34 +0100, "Leif B. Kristensen"
<[email protected]> declaimed the following in
soc.genealogy.computing:

I don't know if a database engine that doesn't support neither
transactions nor foreign keys can be called "a modern RDBMS". I rather

Use the InnoDB backend, and you get both. OTOH, you lose the ability
to have a fulltext index long text fields (useful if you want keyword
searches). The BDB backend gives transactions, but probably not foreign
key.

thought of real RDBMSes like Oracle or PostgreSQL, both of which I have
some knowledge about.

The scary part about Oracle is that the company, as I understand it,
has bought out both Sleepycat (the producers of the BDB engine) and Inno
Oy (producers of the InnoDB engine)... That almost feels like a
Microsoft move -- buy out the competitor rather than try to compete on
your own [if Oracle can now block MySQL AB from using the alternate
storage engines... OTOH, one of the Firebird developers has now gone to
MySQL...]

Being pedantic (for the last time -- even I feel this has gone
rather off-thread)...

I spent a few hours today crawling through various textbooks to come
up with not very much...

MySQL
MyISAM
If all fields are fixed length, it uses a "static" table that
allows for fast access -- record 'n' is found at 'n*len'. If any field
is variable length, even fixed CHAR fields are treated as VARCHAR.
Static tables can be recovered easily as there is no problem finding the
start of the next record if one is corrupted. "Dynamic" tables need to
be defragmented at times; if a field update extends the data beyond the
original insert, a 6-byte link is used to connect to the extended data.

CHAR and VARCHAR (at least through 4.0; my more up-to-date books
were at home) allow for 0..255 characters (book hints that a future
version will go to 65K or more). CHAR is padded to the field length on
storage, and trailing spaces are stripped on retrieval. VARCHAR, OTOH,
strips spaces on storage (non-ANSI behavior). TEXT is a case-insensitive
BLOB, and does not do trailing space stripping.

Indices can be created on CHAR, VARCHAR, and TEXT fields. And the
table type supports full text search.

InnoDB
ACID transaction, and foreign key constraints. Indices are
B-Tree structured; the row data itself is stored with the primary index
value; each record has pointers to each field of the record.

Can not index TEXT/BLOB; cannot create a unique index on part of a
field

BDB
Transactions.

B-tree structure.

Primary key stored with the data; secondary keys do not point to
data records, but contain the value of the primary key.

Firebird
CHAR fields have trailing spaces stripped on storage, and are padded
on retrieval (just the opposite of MySQL.

VARCHAR does not strip or pad, but includes any supplied trailing
spaces in the "length" count when storing the data.

The above implies that BOTH CHAR and VARCHAR are /stored/ as
variable width fields. Length can be up 32K.

BLOB (text type) can not be indexed. Stored in segments, default
size is 80-bytes per segment, application code (at low level) is
responsible for handling the segmentation.

I am not going to scan the various MSDE/SQL-Server or PostgreSQL
books I have to find what they have to say. And the only documentation
for MaxDB seems to be online HTML.

--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

Chad Hanna · Legg inn av **Chad Hanna** » 1. mars 2006 kl. 13.29

In message <[email protected]>, T.M. Sommers
<[email protected]> writes

Joe Makowiec wrote:
MySQL can use VARCHAR fields for text data. They do exactly the
process you suggest, but the performance hit, if any, is minimal. But
saves substantially on storage space.

Most, if not all, RDBMSes have something similar. I don't know how
much of a performance difference there is, hut I would also say today
that, with today's disks, the disk saving, except on very large
databases, is also minimal, relatively speaking. On a 100 gigabyte
disk, who cares about a few tens of megabytes more or less?

I do care about a few tens of megabytes as I'm concerned with databases

on servers that may be dealing with several queries at the same time.

Disks typically spin at 7200 rpm (latest Seagate 400Gbyte SATA/300),
though high-end SCSI drives can achieve 15,000, say 8 milliseconds per
revolution. That means, average access times to random data are a few
milliseconds. Data might be transferred off the disk at maybe 100 Mbytes
a second, that's 100 milliseconds for 10 Mbytes.

Information in chip memory can be accessed many thousands of times
faster. Faster and multiple processors mean that almost any scheme for
getting more information into chip memory will be worthwhile. Similarly
halving the amount of data stored on disk from 10 Mbytes to 5 Mbytes
will save you 50 milliseconds when you come to read it.

In genealogical terms, the savings are even greater - the average length
of an English surname is 6.5 characters and
'cholmondeley-featherstonehaugh; is 31 characters. Assuming we need a
byte count for varchar that's 7.5 compared to 31 a saving of over 75%.

The physical design (as opposed to the logical design) of databases is a
complex subject and genealogical databases have their own quirks e.g.
key distributions aren't uniform and dates are problematical.

Chad

--
Chad Hanna
IT Manager Berkshire Family History Society http://www.berksfhs.org.uk
Systems Developer FamilyHistoryOnline http://www.familyhistoryonline.net
FreeBSD Apache MySQL Perl mod_perl

Leif B. Kristensen

Dennis Lee Bieber skrev:

Leif B. Kristensen skrev:

I don't know if a database engine that doesn't support neither
transactions nor foreign keys can be called "a modern RDBMS". I
rather

Use the InnoDB backend, and you get both. OTOH, you lose the ability
to have a fulltext index long text fields (useful if you want keyword
searches). The BDB backend gives transactions, but probably not
foreign key.

I had the impression that we were discussing the MyISAM engine, which,
as I wrote, has neither foreign keys nor transactions. Of course I know
that InnoDB has built-in support for both. But that wasn't the issue.

thought of real RDBMSes like Oracle or PostgreSQL, both of which I
have some knowledge about.

The scary part about Oracle is that the company, as I understand it,
has bought out both Sleepycat (the producers of the BDB engine) and
Inno Oy (producers of the InnoDB engine)... That almost feels like a
Microsoft move -- buy out the competitor rather than try to compete on
your own [if Oracle can now block MySQL AB from using the alternate
storage engines... OTOH, one of the Firebird developers has now gone
to MySQL...]

I can't say that I like those moves by Oracle either, but it has
absolutely no bearing upon Oracle as a DB engine. I'm using Oracle at
work. That has nothing to do with whether I like the Oracle company's
policies or not. I fancy that you too might have done things at work
that your personal sense of morality does not quite agree with.

I spent a few hours today crawling through various textbooks to come
up with not very much...

MySQL
MyISAM
If all fields are fixed length, it uses a "static" table that
allows for fast access -- record 'n' is found at 'n*len'. If any field
is variable length, even fixed CHAR fields are treated as VARCHAR.

Yeah, that's what I said ... more or less.

CHAR and VARCHAR (at least through 4.0; my more up-to-date books
were at home) allow for 0..255 characters (book hints that a future
version will go to 65K or more). CHAR is padded to the field length on
storage, and trailing spaces are stripped on retrieval. VARCHAR, OTOH,
strips spaces on storage (non-ANSI behavior). TEXT is a
case-insensitive BLOB, and does not do trailing space stripping.

That's how MySQL handles it.

In my opinion, MySQL is a piece of over-hyped crap, and not much more
than a toy. If you want a top-notch open-source db engine, go with
PostgreSQL. It has everything, and is rock stable.

By the way, your surname coincides a lot with my middle name. It doesn't
perchance originate in the Sundsvall area of Sweden?
--
Leif Biberg Kristensen
http://solumslekt.org/

Dennis Lee Bieber · Legg inn av **Dennis Lee Bieber** » 4. mars 2006 kl. 6.55

On Fri, 03 Mar 2006 19:16:48 +0100, "Leif B. Kristensen"
<[email protected]> declaimed the following in
soc.genealogy.computing:

I had the impression that we were discussing the MyISAM engine, which,
as I wrote, has neither foreign keys nor transactions. Of course I know
that InnoDB has built-in support for both. But that wasn't the issue.

I believe the prior comments hadn't been specific to the storage

engine, just to MySQL in general; which is why I dropped in the InnoDB
comment (which is also the default engine in recent versions).

I can't say that I like those moves by Oracle either, but it has
absolutely no bearing upon Oracle as a DB engine. I'm using Oracle at

My concern was with regards to what Oracle (company) might do to
cripple future MySQL development should they get "nasty" over usage of
the two backend engines.

By the way, your surname coincides a lot with my middle name. It doesn't
perchance originate in the Sundsvall area of Sweden?

Not that I know of, though I've been unable (in my few hours a year)
to track the family back to the point of emmigration. There are a whole
slew of "Bieber"s in the Pennsylvania/Ohio region -- but my family seems
to appear from nowhere.

General belief would put it as central German (or whatever entity
occupied that space in the early 1800s). German word for "beaver"...
--

==============================================================
[email protected] | Wulfraed Dennis Lee Bieber KD6MOG
[email protected] | Bestiaria Support Staff
==============================================================
Home Page: <http://www.dm.net/~wulfraed/
Overflow Page: <http://wlfraed.home.netcom.com/

T.M. Sommers · Legg inn av **T.M. Sommers** » 5. mars 2006 kl. 9.59

Chad Hanna wrote:

In message <[email protected]>, T.M. Sommers
[email protected]> writes
Joe Makowiec wrote:

MySQL can use VARCHAR fields for text data. They do exactly the
process you suggest, but the performance hit, if any, is minimal.
But saves substantially on storage space.

Most, if not all, RDBMSes have something similar. I don't know how
much of a performance difference there is, hut I would also say today
that, with today's disks, the disk saving, except on very large
databases, is also minimal, relatively speaking. On a 100 gigabyte
disk, who cares about a few tens of megabytes more or less?

I do care about a few tens of megabytes as I'm concerned with databases
on servers that may be dealing with several queries at the same time.

I was being, at least partly, facetious.

Disks typically spin at 7200 rpm (latest Seagate 400Gbyte SATA/300),
though high-end SCSI drives can achieve 15,000, say 8 milliseconds per
revolution. That means, average access times to random data are a few
milliseconds. Data might be transferred off the disk at maybe 100 Mbytes
a second, that's 100 milliseconds for 10 Mbytes.

I would expect that in normal operation one would rarely read 10
Mbytes from a genealogical database at one time. The drive and
the OS also will have buffers.

--
Thomas M. Sommers -- [email protected] -- AB2SB