How read a csv file that has several and different sections

giovannino · May 7, 2020, 9:14pm

Hi all,

I got to read a particular csv file with three sections: header, data registered/body and footer. (It’s comes from a temperature logger device…)
Wich is the best way to manage such kind of structured file import into mySql.
Do I need three different tables to store the three different data infos ?
How can I split the content of file and send data sections to different table ? Is it possible to test the first 2 chars (if not a number then is body data ??) ?.
Is it better to use a unique table and repeat for each record header and footer data (all the same for all rows) ?
More or less I can take first 15 rows and then the following data after 16 but I really do not how to take the footer content itself and stopping to read the body part.

“Serial No”;“MU-CD-0169-0226”
“H/w version”;“0.2u”
“Descr.”;“TEST”
“Battery”;“OK”
“Times are”;“Sender Local”
“Start”;“05/03/2014 15:25:33”
“Finish”;“05/03/2014 23:45:03”
“Sampling”;“30 seconds”
“Readings”;“1000 (29970 seconds)”
“Active”;“Temperatura dell’Aria -40.0 to 70.0 ?C”
“Spec.”;“Temperatura dell’Aria 4.0 to 8.0 ?C; Out of Spec = 999 ( 29970 seconds )”
“Average”;"";"";"";“19.5”
“Highest”;"";"";"";“23.8”
“Lowest”;"";"";"";“15.4”

“Reading”;“Elapsed (seconds)”;“Date”;“Time”;“Temperatura dell’Aria (?C)”
“1”;“0”;“05/03/2014”;“15:25:33”;“23.6”
“2”;“30”;“05/03/2014”;“15:26:03”;“23.8”
“3”;“60”;“05/03/2014”;“15:26:33”;“23.8”
“4”;“90”;“05/03/2014”;“15:27:03”;“23.7”
“5”;“120”;“05/03/2014”;“15:27:33”;“23.6”
“6”;“150”;“05/03/2014”;“15:28:03”;“23.5”
“7”;“180”;“05/03/2014”;“15:28:33”;“23.4”
“8”;“210”;“05/03/2014”;“15:29:03”;“23.3”
“9”;“240”;“05/03/2014”;“15:29:33”;“23.2”
“10”;“270”;“05/03/2014”;“15:30:03”;“23.1”
“11”;“300”;“05/03/2014”;“15:30:33”;“23.0”
“12”;“330”;“05/03/2014”;“15:31:03”;“23.0”
“13”;“360”;“05/03/2014”;“15:31:33”;“22.9”
“14”;“390”;“05/03/2014”;“15:32:03”;“22.8”
“15”;“420”;“05/03/2014”;“15:32:33”;“22.8”
“16”;“450”;“05/03/2014”;“15:33:03”;“22.7”
…
…
“994”;“29790”;“05/03/2014”;“23:42:03”;“19.5”
“995”;“29820”;“05/03/2014”;“23:42:33”;“19.5”
“996”;“29850”;“05/03/2014”;“23:43:03”;“19.5”
“997”;“29880”;“05/03/2014”;“23:43:33”;“19.5”
“998”;“29910”;“05/03/2014”;“23:44:03”;“19.5”
“999”;“29940”;“05/03/2014”;“23:44:33”;“19.5”
“1000”;“29970”;“05/03/2014”;“23:45:03”;“19.5”

“List of periods out-of-specification”
“Temperatura dell’Aria”;“1050 seconds from 15.1 to 16.0 ?C”
“”;“2550 seconds from 16.1 to 17.0 ?C”
“”;“1710 seconds from 17.1 to 18.0 ?C”
“”;“570 seconds from 18.1 to 19.0 ?C”
“”;“17505 seconds from 19.1 to 20.0 ?C”
“”;“4350 seconds from 20.1 to 21.0 ?C”
“”;“10 minutes from 21.1 to 22.0 ?C”
“”;“1350 seconds from 22.1 to 23.0 ?C”
“”;“285 seconds from 23.1 to 24.0 ?C”

MU-CD-0169-0226-0008.zip (6.77 KB)

aducom · May 7, 2020, 8:34pm

Well, you need ot parse each line and see what it is to decide. I think you need different tables as the fields differ. I would recommend to read each line one-by-one and look into the record to see if it’s a new declaration and then set a switch to select the correct sql statement. Something like this:

[code[
$MyName = {import}; // read filename

$file = fopen({import},“r”);

$data = fgetcsv($file,0,{separator});
if ($data[0] != ‘myheader’) { // test first record, must be header record
sc_error_message(‘This is not a valid file’);
sc_error_exit();
}

$err=’’;
$ok=’’;

while(! feof($file)) {
$data = fgetcsv($file,0,{separator});
$mykey=$data[0];

// test if data or header and select correct sql.

} // end while

fclose($file);

giovannino · May 7, 2020, 8:34pm

Hi Albert,
thanks for hints , I will study it !!
Bye

jsbinca · May 7, 2020, 8:34pm

Hi,
I thought I jump on this because it brought up long forgotten memories about programming a mainframe in cobol some 25 years ago.
Since the data is coming from an automated? device I assume the file has always the same structure. Also there is no error checking yet.
The idea is to handle the whole kaboodle as a string and chop it up into three chunks. You have to be careful with the quotation marks, it’s hard
to read but in most casese it is ’ " ; " ’ (single-double-character-double-single)

$myfile = file_get_contents(‘path/to/file.csv’);

$myfile = str_replace(’";"’,’","’,$myfile); // replace the field separator, we need comma for mysql

$header = substr($myfile,0,strpos($myfile,"

“)); // get the header part, it’s separated from the body by an empty line, hence two line breaks
$header = str_replace(’”,"","","","’,’","’,$header); // remove the needles columns
$header_array = explode("
“,$header); // stick the header into an array
foreach($header_array as $key => $value)
{
$header_array[$key] = substr($value,strpos($value,’”,"’)+2); // remove the field labels
}

$myfile = substr($myfile,strpos($myfile,"

")+2); // cut off the header block
$myfile = substr($myfile,strpos($myfile,’“1”’)); // and the column header

$body_array = explode("
“,substr($myfile,0,strpos($myfile,”

"))); // again pop it into an array

$myfile = substr($myfile,strpos($myfile,"

“)+2); //cut off body
$myfile = substr($myfile,strpos($myfile,’”,"’)+2);

$footer_array = str_replace(’"","’,’"’,explode("
",$myfile));// remove the empty column and convert it into an array

// create the insert values
$header_values = “(”.implode(’,’,$header_array).")";
$body_values = “(”.implode(’),(’,$body_array).")";
$footer_values = “(”.implode(’),(’,$footer_array).")";

// insert into db (transaction is strongly recommended)
sc_exec_sql(“INSERT INTO header_table VALUES $header_values”);
sc_exec_sql(“INSERT INTO body_table VALUES $body_values”);
sc_exec_sql(“INSERT INTO footer_table VALUES $footer_values”);

Just an idea.

jsb

giovannino · May 7, 2020, 8:34pm

Great Jsb !!!
that’s so interesting and it’s just really my thought.
I’ll update about my tests.
Bye

aducom · May 7, 2020, 8:34pm

This is a good solution, but there are a few things to consider too:

Reading the full file into memory has it’s drawbacks when the inputfile is large. I recommend to parse record-by-record
Parsing the header to generate an sql statement is a great solution. But only if the header values matches the database field names, but header fields can be changed easily
CSV can have different separators, but sometimes also quotes to take care of

Besides that, great solution.