Normalisation

supermessiah · May 7, 2020, 7:49pm

Right, I’ve been through countless tutorials on how to normalise a database. It’s something I have successfully done in the past, but quite a long time ago so I’ve forgotten a lot of it. Here is my UNF (normalised form) of a soccer match database:

UNF

Season
Tier
Division
Fixture
Result
Date
Status
Time
StadiumName
Attendance
RefereeName
First Half Goals
Second Half Goals

I’m struggling with 1NF. I’m having difficulty understanding the meaning of ‘repeating data’. The tutorials I’ve been through say you need to separate the repeating and non-repeating data. Okay, but I can look at all of the above and say it’s repeating, because… all of the data under each of those headings repeats somewhere. I’m confused about it.

I know that things from the above would have to be broken down during the normalisation process, such as ‘Fixture’, ‘Result’, ‘RefereeName’ and ‘First Half Goals’ & ‘Second Half Goals’, but I would appreciate any help for getting me started on this.

Thanks.

aducom · May 7, 2020, 8:25pm

The main trick of creating a datamodal is to prevent typing data more than once. So if you have a competition of clubs and you want to store the trainer and players of a team for every match then you will see that you are repeatingly typing over all data which is already there. Creating a good data-modal is not easy, but the steps are not too difficult if you take a practical attitude.
You don’t want to type every player, trainer and refereename over-and-over again. Suppose you have a typing error in referee and you want to make a list of all games for that referee you’;ll be in trouble.

So as a first step, just to get you started (didn’t think it all through):

tables for:

referee: number and name
player: number, and name (and other data: age etc).
club: number and name (and other data: main stadium, city etc)
team: clubnumber, season, playernumber
match: matchnumber, season, clubnumberhome, clubnumberguest, refereenumber, date, result
matchplayer: matchnumber, clubnumber, playernumber, fromminute, tominute (for changes)
matchgoal: matchnumber, clubnumber, playernumber, minute

supermessiah · May 7, 2020, 8:25pm

[QUOTE=aducom;12785]The main trick of creating a datamodal is to prevent typing data more than once. So if you have a competition of clubs and you want to store the trainer and players of a team for every match then you will see that you are repeatingly typing over all data which is already there. Creating a good data-modal is not easy, but the steps are not too difficult if you take a practical attitude.
You don’t want to type every player, trainer and refereename over-and-over again. Suppose you have a typing error in referee and you want to make a list of all games for that referee you’;ll be in trouble.

So as a first step, just to get you started (didn’t think it all through):

tables for:

referee: number and name
player: number, and name (and other data: age etc).
club: number and name (and other data: main stadium, city etc)
team: clubnumber, season, playernumber
match: matchnumber, season, clubnumberhome, clubnumberguest, refereenumber, date, result
matchplayer: matchnumber, clubnumber, playernumber, fromminute, tominute (for changes)
matchgoal: matchnumber, clubnumber, playernumber, minute[/QUOTE]Thank you for your response. I get what the benefits of a relational database is, and I roughly know what it’s going to look like in terms of the final table structures, but I need to do it for a project I’m developing for a client. I need to demonstrate the steps I’ve taken to normalise the data (using 1NF, 2NF and 3NF). But the problem is, I’m struggling with 1NF. I don’t quite get how to separate repeating and non-repeating data from the data I have. To me, it all looks repeating — but I know it’s not. What is the repeating and non-repeating data from the ones I’ve given (in my previous post)?

aducom · May 7, 2020, 8:25pm

1st NV means that you have to separate datafields containing repeating data into separate fields. So if you have a field ‘salary’ which contains textdata like : ‘scale 10, amount 1000’, then you need to split this into separate fields. A field must contain only one single value. So if you have a field ‘telephone’ which can contain more than one number, you must make separate fields for this (or make a separate reference table). I suspect this i.e. for attendance (what does this field contain) and goals. (Do you want to keep the players who scored?). Hope this helps. The first NV means that each field is ‘atomic’ and can contain only one value (per record)

supermessiah · May 7, 2020, 8:25pm

[QUOTE=aducom;12788]1st NV means that you have to separate datafields containing repeating data into separate fields. So if you have a field ‘salary’ which contains textdata like : ‘scale 10, amount 1000’, then you need to split this into separate fields. A field must contain only one single value. So if you have a field ‘telephone’ which can contain more than one number, you must make separate fields for this (or make a separate reference table). I suspect this i.e. for attendance (what does this field contain) and goals. (Do you want to keep the players who scored?). Hope this helps. The first NV means that each field is ‘atomic’ and can contain only one value (per record)[/QUOTE]Okay, so from my data, I would say (in brackets below):

UNF

Season
Tier (this is a number that would define what level the division is. Is this necessary?)
Division
Fixture (home team v away team ? these two can be separate fields)
Result (home goals v away goals ? these two can be separate fields)
Date
Status
Time
StadiumName
Attendance
RefereeName (first and last name ? these could be separate fields)
First Half Goals (home goals & away first half goals ? these could be separate fields)
Second Half Goals (home goals & away second half goals ? these could be separate fields)

So yeah, I could do that. But is that my ‘repeating data’ that I’ve put alongside brackets?

aducom · May 7, 2020, 8:25pm

I think so. Do you want to store the maker(s) of the goals? Then you should add a field for goalmaker. Later on (other NV) you will recognize this as a repeating field related to a master (game). Then you will create a separate table for this. For now you only define the separated fields needed to create the datamodal.

supermessiah · May 7, 2020, 8:25pm

[QUOTE=aducom;12791]I think so. Do you want to store the maker(s) of the goals? Then you should add a field for goalmaker. Later on (other NV) you will recognize this as a repeating field related to a master (game). Then you will create a separate table for this. For now you only define the separated fields needed to create the datamodal.[/QUOTE]No, I only want to store the data headings I have listed. It’s obvious that there will be several tables by the end of the normalisation process:

Match
Team
Referee
Stadium
Division

Okay, so for my 1NF, I’ve addressed the issue of having multiple values in each field by breaking down some of the headings. Problem is, the table split up into two (meant to be repeating and non-repeating data). However, I’m sure I’ve done this wrong. In the examples I’ve seen, they conveniently split up the table because they have appropriate data to fit into two tables. Looking at my set, I don’t know how to split it into two without splitting up what is necessary in each table.

Season
Division
Status
Date
Time
StadiumName
Attendance

HomeTeam
AwayTeam
HomeGoals
AwayGoals
RefereeFirstName
RefereeLastName
HomeFirstHalfGoals
AwayFirstHalfGoals
HomeSecondHalfGoals
AwaySecondHalfGoals

aducom · May 7, 2020, 8:25pm

As soon as you go into 2nd and 3rd NV you will see that you will need (primary and foreign) keys to identify datasets. You’ll have to split these data into groups by recognizing the repeating groups and assign keys to them. You have to identify the n:m relationships and split these into 1:n relationships. But for now, I don’t see what’s wrong with your list. Be aware that samples on the web always have a kind of ‘ideal’ situation in them.

supermessiah · May 7, 2020, 8:25pm

[QUOTE=aducom;12810]As soon as you go into 2nd and 3rd NV you will see that you will need (primary and foreign) keys to identify datasets. You’ll have to split these data into groups by recognizing the repeating groups and assign keys to them. You have to identify the n:m relationships and split these into 1:n relationships. But for now, I don’t see what’s wrong with your list. Be aware that samples on the web always have a kind of ‘ideal’ situation in them.[/QUOTE]After going to another forum and getting a second opinion, it seems like I should do away with the first half and second half goals fields, because they clash with the homeGoals and AwayGoals. It’s repeating data. I know you answered my question in the other thread about creating automatic columns that add up the data in fields. So it is possible in that respect, but I’ll leave it out for now to avoid confusion.

Well this is what someone else said for 1NF:

SeasonID
DivisionID
HomeTeamID
AwayTeamID
StadiumID
RefereeID
StatusID

HomeGoals
AwayGoals
Attendance
Date
Time

Problem now is how do I assign primary keys to that data? I’m guessing all the ones in the top table are primary keys. And then in 2NF would I split it all up in its individual tables with a table for each of the IDs above? Or would that come in 3NF?

aducom · May 7, 2020, 8:25pm

No, now you have to search for the repeating groups. Then you will find the keys (in your case I expect that statusID would not be found in the 2nd NV). The final step is to look into the determinated tables to look for dependencies. Normally you would enter ‘status’. But if you want to be sure that everybody is using the same status and avoid typing errors this value has a dependency. So you create a table for this. the primary keys need to be decided upon your structure. If you have a n-m relationship you need a table to convert this to 1-n relation ship (2nd nv).