Friday, April 27, 2012

Statistical View of Data in a Clustered Bar Chart

In this article, I shows us how to go about creating a statistical view of data, in the form of a clustered bar chart, using PHP and MySQL.  He provides a real-life scenario, along with accompanying code, to demonstrate how the job can be accomplished.
Statistical view of data in the shape of a clustered bar chart is somewhat similar to having to select between two attributes of a single entity. However, a real-life case scenario is far more complicated. As in most cases, the situation has a lot of entities strongly related with each other through E-R diagrams obeying the normalization rules. Moreover, in such cases, data is not only two-dimensional but, rather, it may be multi-dimensional.
Let's look at the following E-R diagram from which we have to present a statistical view (in the form of a clustered bar chart). How can we do this? This article is an attempt to explain the solution.
Prerequisites
You should have Apache web Server with PHP and MySQL running.  A fair amount of knowledge in PHP is also required. I have provided a zip archive of a demo along with this article.
The database consists of four tables.
  1. Students
  2. Programs
  3. Subjects
  4. Marks
Each of the first three tables has a one-to-many Relationship with the Marks table.
Database Schema
SQL statements to create these tables are as follows:
Marks Table
CREATE TABLE marks (
id int(11) NOT NULL auto_increment,
student_id int(11) DEFAULT '0' NOT NULL,
program int(11) DEFAULT '0' NOT NULL,
subject int(11) DEFAULT '0' NOT NULL,
marks int(11) DEFAULT '0' NOT NULL,
entry_date date,
PRIMARY KEY (id)
);
Programs Table
CREATE TABLE programs (
id int(11) NOT NULL auto_increment,
program varchar(255),
PRIMARY KEY (id),
UNIQUE program (program)
);
Students Table
CREATE TABLE students (
id int(11) NOT NULL auto_increment,
name varchar(100),
PRIMARY KEY (id)
);
Subjects Table
CREATE TABLE subjects (
id int(11) NOT NULL auto_increment,
subject varchar(255),
PRIMARY KEY (id),
UNIQUE subject (subject)
);
An HTML user interface is used to populate all of the four tables. First fill the programs and subjects tables, then populate the students table and in the final, give records to the marks table.
I shall not go into explaining how these four tables have been populated from the html forms, as I’m under the impression that this would be a trivial task.  Rather, I shall focus on how the statistical view is created.
Our input HTML form and related PHP processing code has been written in such a way that it can be enhanced easily. The code snippet for the user input form is composed of three related web controls which pull the data from three tables. Here, these three controls are web control arrays:
<?php
$query = "select s.id , s.name from students as s order by s.name";
$result = mysql_query($query);    
php?>
<select name= "sel_id[]" size = "10"  multiple  style = "background-color: #ffffe8;" >
<?php
while ($row = mysql_fetch_row($result))
echo "<option value=" . $row[0] . ">" . $row[1] . "</option>";   
php?>
</select>
The remaining two web controls are also populated in the same fashion:
<?php
$query = "select p.id , p.program from programs as p order by p.program";
$result = mysql_query($query);    
php?>
<select name= "sel_program_id[]" size = "5"  multiple  style = "background-color: #ffffe8;" >
<?php
while ($row = mysql_fetch_row($result))
echo "<option value=" . $row[0] . ">" . $row[1] . "</option>";   
php?>
</select>
<?php
$query = "select s.id , s.subject from subjects as s order by s.subject";
$result = mysql_query($query);    
php?>
<select name= "sel_subject_id[]" size = "5"  multiple  style = "background-color: #ffffe8;" >
<?php
while ($row = mysql_fetch_row($result))
echo "<option value=" . $row[0] . ">" . $row[1] . "</option>";   
php?>
</select>
Also, the range of a date can be provided as an input from the same page. The code snippet will further explain this; it has been repeated twice, once for Date (From) and again for Date (To).
<?php
$months = array (1=>"January" , "February" , "March" , "April" , "May" , "June" , "July" , "August" , "September" , "October" , "November" , "December");
php?>

<tr><td colspan=2 align=center><table border=0>
<tr><td>Date (From)</td>
<td>
<select name="sel_day_from">
<?php
for ($i = 1; $i < 10 ; $i++)
echo "<option value=0".$i.">" . $i . "</option>";
for ($i = 10; $i < 32; $i++)
echo "<option value=".$i.">" . $i . "</option>";
php?>
</select>
<select name="sel_month_from">
<?php
for ($i = 1; $i < 10 ; $i++)
echo "<option value=0".$i.">" . $months[$i] . "</option>";
for ($i = 10; $i < 13; $i++)
echo "<option value=".$i.">" . $months[$i] . "</option>";
php?>
</select>
<?php $dt_y = intval(date(Y)); php?>
<select name="sel_year_from">
<?php
for ($i = $dt_y; $i > 2000; $i--)
echo "<option value=" . $i . ">" . $i . "</option>";
php?>
</select>
</td></tr>
This is just a simple way in which to offer more options.  There is very little you have to do on the processing side.  Also, a new dimension will be processed in your bar chart.
So far on our input forms, there are four criteria the user can choose from:
  • student name
  • subject
  • program
  • and range of Date
Our database is now ready to be subjected to operations under drawing bar charts.
The processing PHP code, which is located in the file statistics_action.php in the support file , consists of a function named Collect_Data(....) and three FOR loops as the inputs are appearing from three multi-select web controls and one date control.
From the three multiple controls more than one choice can be made.  The fourth choice would appear from range of date, which is a single choice.
The processing code is organized in such a way that it needs three nested FOR loops.  You can increase or decrease them to add/drop a new dimension.
$dt_from = $sel_year_from . $sel_month_from . $sel_day_from;
$dt_to   = $sel_year_to . $sel_month_to . $sel_day_to;
$dt_valid = "False";
$back = "<a rel="nofollow" onclick="javascript:_gaq.push(['_trackPageview', '/outgoing/article_exit_link/975024']);" href="" onclick="history.back()">Back</a>";
if ( !(checkdate(intval($sel_month_from) , intval($sel_day_from) , 
$sel_year_from)) )
$msg = "<font color=red>Invalid From Date...</font>";
elseif ( !(checkdate(intval($sel_month_to) , intval($sel_day_to) ,
$sel_year_to)) )
$msg = "<font color=red>Invalid To Date...</font>";
else
{
$msg = formate_date($sel_year_from."-".$sel_month_from."-".$sel_day_from) . " ---- " .
formate_date($sel_year_to."-".$sel_month_to."-".$sel_day_to);
$dt_valid = "True";
}
The block of code above stores the date (from) and date (to) in two global variables (global variables have their visibility accessible inside the user defined funtion).
$dt_valid serves as a flag.  If it is true, then processing proceeds; otherwise, display a relevant message. We have used our checkdate() function, which returns true or false if the passing arguments are wrong dates.
If both dates are correct, then range of the date is stored in the $msg variable.
echo "<table><tr><td  bgcolor=whitesmoke>" . $msg . "..." . $back . "</td></tr></table>";
Next, we initialize three variable for the upper limits of the three nested FOR loops:
$total_counts_student_id = count($sel_id);  
$total_counts_program_id = count($sel_program_id);  
$total_counts_subject_id = count($sel_subject_id);
if ( ($total_counts_student_id > 0) && ($total_counts_program_id > 0) && ($total_counts_subject_id > 0) && ($dt_valid == 'True') ) 
{
echo "<table border=0 cellpadding=0 cellspacing=0 ><caption><h2>Student Marks</h2></caption>";
For each selected Program List from the input form, it will scan each program and subsequent student data:
for ($c=0; $c < $total_counts_program_id; $c++)
{
for ($b=0; $b < $total_counts_subject_id; $b++)
{
// Collection of Data
for ($a=0; $a < $total_counts_student_id; $a++)
{
Collect_Data($sel_id[$a] , $sel_program_id[$c] , $sel_subject_id[$b]);
} // end of the sel_id loop
if ($sum > 0)  // if data found  ...
{   
// Draw Graph            
$total_counts = count($student_data);
for ($i=0; $i < $total_counts; $i++)
{   
$bar_width = multi_factor * ( (100 * $student_data[$i] ) / $sum );     
$bar = ($i % bar_counter)."_bar.gif";  
echo "<tr><td><img src='images/$bar' width=$bar_width  height='10'>   ";
echo number_format($bar_width/multi_factor , 2, '.', '')."%    (".$student_name[$i] . " , " . $program[$i] . " , " . $subject[$i] . ")</td></tr>";    
}
// Display Data in Tabular Format   
echo "<tr><td><table border=1 bordercolor=blue>";
echo "<tr><th>Name</th><th>Program</th><th>Subject</th> <th>Marks</th><th>Date</th></tr>";
for ($i = 0; $i < $total_counts; $i++)
{
echo "<tr><td>" . $student_name[$i] . "</td><td>" . $program[$i] . "</td><td>" . $subject[$i] . "</td>";
if (isset($student_data[$i]) )  echo "<td>" . $student_data[$i] . "</td>";
else                            echo "<td>Not Taken</td>";
echo "<td>" . formate_date($student_dt[$i]) . "</td></tr>";
}
echo "</table></td></tr>";    $x = 0; $sum = 0;  echo "<tr><td><hr></td></tr>"; 
for ($i=0; $i < $total_counts; $i++)  /// purge the array ... better to purge all the arrays.....
{ array_pop($student_data); }
}
} // end of subject_id loop
} // end of program_id loop 

} // end of if
The second-last loop consists of a body with three sections:
  1. The inner-most loop iterates through each student’s name.  It calls the function Collect_Data(..) and stores it in five arrays:    $student_name[], $program[], $subject[], $student_data[], and $student_dt.  It also stores their sum in variable $sum.  From dividing $student_data by $sum, we can calculate the percentage. The width of the image is determined according to this percentile.  Also, an intelligent code is used to draw a new bar each time.  Then, using this data, it draws the bar chart accordingly.
  2. Display the information in tabular format by iterating and retrieving data from the above five arrays.
  3. Third part terminates the table.
  4. This part purges the above five arrays. This is essential in that if array size is smaller in the next iteration, the previous data may be written.
function Collect_Data($a_no , $program_id , $subject_id )
{
global $student_name;
global $program;
global $subject;
global $student_data;
global $student_dt;
global $sum;
global $x;
global $dt_from;
global $dt_to;
The above variables have been declared global so that their values can be accessed outside this user defined function; otherwise, they would become local variables, thus rendering them out of scope.
$query = "select s.name , p.program , sb.subject , m.marks , m.entry_date
from students as s inner join marks as m on s.id = m.student_id
inner join programs as p on m.program = p.id
inner join subjects as sb on sb.id = m.subject
where  s.id = $a_no and p.id = $program_id  and sb.id = $subject_id and m.entry_date >= $dt_from and m.entry_date <=$dt_to " ;
$result = mysql_query($query);
while ($row = mysql_fetch_row($result) )
{
$student_name[$x] = $row[0]; 
$program[$x] = $row[1];
$subject[$x] = $row[2];
$student_data[$x] = $row[3]; 
$student_dt[$x] =  $row[4];

$sum += $row[3];
$x++; // increment the array counter....
}
}  // end of the funciton Collect_Data()....
Our query in this function is extracting data from the four tables through inner-joins.  You can add more tables as per your requirements. The query extracts five values, which have been stored in five global arrays. Here, $x is acting as a counter. Every call to this function may increase its value if a record is found.
Processing time is calculated based on the following code:
function getmicrotime()
{
list($usec,$sec)=explode(" ",microtime());
return ((float)$usec+(float)$sec);
}
$start_time = getmicrotime();
$time_diff = (float)(getmicrotime() - $start_time);
echo "<tr><td colspan=6><i>Processing Time:- " . number_format($time_diff , 3,'.','') . " Seconds</i></td></tr>";
Plotting the clustered bar chart is really simple, even if we have multi-columned information. The logic is organized in a way that it can be easily customized according to the requirements.  I have given an idea how to implement this technique across multiple tables, especially in a tightly normalized database.

No comments:

Post a Comment