Case Statements in SELECT Queries
The SELECT statements is the most common operation performed on SQL tables. It returns a list of records in table like format(also known as a result set) from one or more tables. The full syntax of the SELECT statement is as follows
SELECT <columns> | <function(columns)> ---
FROM <table_name> ---
WHERE <condition> ---
GROUP BY <columns> | <function(columns)> ---
ORDER BY <columns> ---
HAVING ---
``
In SQL SELECT statements, it is possible to provide a one to one or many to one mapping between a value in a column and a value that we provide. These are achieved through CASE statements. The syntax of a CASE statement in SQL is as follows
CASE
WHEN Condition_1 THEN Result_1
WHEN Condition_2 THEN Result_2
ELSE Result_Default
END
In simple terms, CASE statements handle if/then logic analagous to that found in programming languages. Every CASE statement is matched with a corresponding END statement. The if/else logic is handled in the WHEN Condition THEN Result statement. Lets take a simple example where we store abbrieviations in the table instead of its full form, but would like to display the full form to the end user:
SELECT name,
gender,
CASE
WHEN gender = 'M' THEN 'Male'
ELSE 'Female'
END As gender_full
FROM users;
Here we are assuming we have a table named users which has a column gender with values M and F. We would like our result-set to contain the values Male and Female in the column gender_full instead of the abbrieviations M and F stored in the table. So our SELECT statement proceeds as follows:
- For each row in the table ‘users’, check to see if gender is ‘M’
- If true, then print the word ‘Male’ in the column we have designated as gender_full
- If condition evaluates to fale, return ‘Female’.
- Each row contains the name, gender and gender_full column values.
This example shows how the one to one mapping proceeds. We can construct a similar example where the CASE statement maps a range(many values) to a single value. Ex:
CASE
WHEN range > 1000 and range < 3000 THEN '1-3K'
ELSE '>3K'
END
We used the CASE statement to create a new column from our actual tables and generated result sets that proceed by mapping values in actual tables to values we provide. We will see more concrete examples below doing the same using Laravel.
Translating to Laravel
There are two ways to interact with the Database layer in Laravel
- Use DB Facade with the Query Builder for running fluent database queries. The Query Builder also provides protection against SQL Injection attacks(more on that later). The table method on the DB facade returns a query builder instance which can be chained with one or more functions that provide constraints on the query. The final get method executes the query and returns the result. The Laravel equivalent of the SQL Select statement with all its clauses using the Query builder interface is:
$users = DB::table(table_name) ->select(<columns> | Raw<columns>) ->where(<condition>) ->orderBy(<columns> | Raw<columns>) ->groupBy(<columns>) ->having(<condition>) ->get();
- The second method of interacting with the Database is through Laravel’s own ActiveRecord implementation known as Eloquent. Each database table has a corresponding Model which is used to interact with that table. Models are used to create, update and delete records easily. The model needs to be first defined. Once defined, a similar SQL SELECT query can be constructed as follows:
$users = User::select(<columns>) ->where(<condition>) ->orderBy(<columns>) ->groupBy(<columns>) ->having(<condition>) ->get();
Here the User model is a class file which extends Eloquent base class. Note that the fluent syntax is similar to the Query Builder form. This is because Eloquent uses Query builder natively.
The difference between the two approaches is in terms of the final resultset returned after get()
or a corresponding final method(such as first()
or all()
) is processed. The query builder returns a single or a collection of std_class objects whereas Eloquent returns a single or a collection of Model objects.
The Eloquent ORM approach is based on the active record pattern that provides an interface between the database and your application. It is responsible for the M in the MVC pattern. Models are useful to structure relationships or associations between other models. But is it really useful for Raw queries like the CASE Statements we discuss here? Lets take an example: We have a table called users with a corresponding model called User. When we run, User::find(1)
we retrieve a singe model that augments the columns and row retrieved from the table with useful helper methods such as update
, delete
and so on. But if we run a select query with a group by clause on a model, the result is a model with the helper methods being pretty useless. Ex: User::select('gender', 'count(*) as number_gender')->groupBy('gender')->get()
returns a collection of models with the gender and number of person(s) with that gender. Each model in the collection can be invoked with the update or delete method. But since it is not an actual model, calling the update or delete methods on the model may return success, but not really modify any records in the database. It might be more sane to use method 1 in this case, relying on pure query builder methods and bypassing Eloquent. Still if you are interested in the final result, such as sending data back to the user, the final result is serialized(converted to an array or JSON) and it really doesnt matter whether we use approach 1 or 2.
Setting up the Example
The example that we will be looking at is the following table of first ascents to mountains.
id | name | height_meters | first_ascent |
---|---|---|---|
1 | Mount Everest | 8848 | 1953-05-29 |
2 | Kilimanjaro | 5895 | 1889-10-06 |
3 | Denali | 6190 | 1913-06-07 |
4 | Chimborazo | 6263 | 1880-01-04 |
5 | K2 | 8611 | 1954-07-31 |
6 | Piz Palü | 3900 | 1835-08-12 |
7 | Cho Oyu | 8188 | 1954-10-19 |
To create the database schema, which includes the attributes we are interested in, we can use the artisan command make:migration
as follows
php artisan make:migration create_mountains_table
We define the needed attributes, name
, height_meters
and first_ascent
in the up()
method of the CreateMountainsTable class that the previous artisan command scaffolded.
public function up()
{
Schema::create('mountains', function (Blueprint $table) {
$table->bigIncrements('id');
$table->string('name');
$table->integer('height_meters');
$table->date('first_ascent');
});
}
To use Eloquent, we need to create a new model which can be scaffolded using the make:model
command of artisan:
php artisan make:model Mountain
By default, when a new model is created and saved to the database, timestamps are automatically
generated(and managed for all future model operations by Eloquent). Since we are not interested in them(our migrations do not have timestamps), we can set the timestamps to false. The resulting model file looks as follows:
class Mountain extends Model
{
/**
* Indicates if the model should be timestamped.
*
* @var bool
*/
public $timestamps = false;
}
To populate the table with data, we can create a seeder using the artisan command
php artisan make:seeder MountainTableSeeder
The run()
method of the MountainTableSeeder is as follows:
public function run()
{
$mountainData = [
['Mount Everest', 8848, '1953-05-29'],
['Kilimanjaro', 5895, '1889-10-06'],
['Denali', 6190, '1913-06-07'],
['Chimborazo', 6263, '1880-01-04'],
['K2', 8611, '1954-07-31'],
['Piz Palü', 3900, '1835-08-12'],
['Cho Oyu', 8188, '1954-10-19'],
];
foreach($mountainData as $value)
{
$model = new Mountain();
$model->name = $value[0];
$model->height_meters = $value[1];
$model->first_ascent = $value[2];
$model->save();
}
}
We are using Eloquent Model in the seeder. So dont forget to include use App\Mountain
at the topb of the file. Once the seeder is created, we need to run composer du
to regenerate the Composer autoloader system so that the seeder class is found. To run the seeder, we can call
php artisan db:seed --class=MountainTableSeeder
If all went well, we are set to experiment with our queries. If not consult the docs
Raw Select Queries in Laravel
Raw queries are written by means of select()
method of the Query Builder which accept a list of all the fields. Some of these fields can be an Illuminate\Database\Query\Expression
object. These objects can be generated by passing the SQL query to DB::raw()
method. For Example:
$results = DB::select( DB::raw("Any Sql Query here") );
In this article, we will convert a few of the SQL queries(written in SQLite) found here to Laravel Query Builder Form.
-
To generate a list of mountains and their ascents ordered by century of ascent, the query builder statement is
Mountain::select( name, DB::raw( 'CAST(FLOOR(DATE_FORMAT(date(first_ascent),"%Y")/100 + 1) AS Int) century' )) ->orderBy('century') ->get();
The steps of query evaluation according to ANSI SQL can be found here. The wiki article will give a rough idea of how queries are processed internally through multiple intermediate tables. MySQL may or may not follow this approach – but its useful to visualize the process.
-
In the following query, we output the number of Mountains climbed in the 19th century
Mountain::select( DB::raw( 'CAST(FLOOR(DATE_FORMAT(date(first_ascent),"%Y")/100 + 1) AS Int) century' ), DB::raw('count(*) AS aggregate')) ->groupBy('century') ->having('century','=',19) ->get()
It is difficut to construct complex queries with
count(*)
aggregate functions using Laravel Query Builder. One could construct something likeMountain::count()
which returns the number of records/models in the table. Anything more complex – we resort to raw statements. Also, counts can be added using the versatile collection methods that Laravel provides.
Case Statements
CASE in the SELECT Clause
If we would like to map our mountains according to ranges, we could write a CASE statement which in Laravel query builder form is as follows:
Mountain::select(
DB::raw('
CASE
WHEN height_meters>3000 AND height_meters<5000
THEN "3-5K"
ELSE ">5K"
END AS height_range'),
DB::raw('count(*) AS ascents')
)->groupBy('height_range')->get()
In this example, each height_meters field in each row of the table is tested according to the logical conditions height_meters>3000 AND height_meters<5000
. If this condition is true, a column labelled height_range
is populated with the text 3-5K
. If the condition fails, the height_range
field is populated with >5K
. The groupBy clause, reduces the table by unique values of height_range
. The aggregate functions work in tandem with the groupBy clause to produce the number of ascents in each height_range
value.
CASE In the ORDER BY Clause:
We can use a CASE statement in the ORDER BY SQL clause which can be used to sort results in ascending or descending order. Lets consider an example where we display a list of Mountains ordered by their date of first ascent. To see that we can chain multiple order statements together, we can additionally impose the following constraint: The mountains are ordered in ascending order of date if they lie in the range of 3000-6500m else they are listed in descending order of date.
Mountain::select('name', 'first_ascent', 'height_meters')
->orderByRaw('
CASE
WHEN height_meters > 3000 AND height_meters < 6500
THEN first_ascent
END ASC
')
->orderByRaw('
CASE
WHEN height_meters > 6500
THEN first_ascent
END DESC
')
->get();
In this example, we have used the orderByRaw
method of Query Builder. This method allows us to set a raw string as input and saves us a bit of time rather than typing out orderBy( DB::raw($sql) )
. orderByRaw
and the family of raw methods accepts an array of bindings as its second parameter which is very useful when we want to bind data from forms to the queries. It also takes care of the SQL Injection type attacks which we will discuss later.
Another issue that should be kept in mind is that orderBy
by default appends ASC to the end of the query, since its full signature is orderBy($column, $direction = 'asc')
. So if we were to use orderBy(DB::raw($sql))
, we should not include ASC in the raw expression. To sort the items in descending order, we can use orderBy(DB::raw($sql), 'desc')
.
CASE in the GROUP BY Clause
We can include CASE statements in the GROUP BY SQL Clause. Suppose we want to group the mountains by their height ranges(say 3-5Km and >5Km). Further more, we would like to calculate the maxiumum and minimum heights in those ranges and the number of mountains that belong to those ranges. The following Laravel query will generate the necessary output.
Mountain::select(
DB::raw('
CASE
WHEN height_meters > 3000 AND height_meters < 5000
THEN "3-5K"
ELSE ">5K"
END as height_range
'),
DB::raw('MAX(height_meters) AS maximum_height'),
DB::raw('MIN(height_meters) AS minimum_height'),
DB::raw('COUNT(*) as total')
)
->groupBy(DB::raw('
CASE
WHEN height_meters > 3000 AND height_meters < 5000
THEN "3-5K"
ELSE ">5K"
END
'))->get()
When executing this query, you might receive the following error
Illuminate/Database/QueryException with message 'SQLSTATE[42000]: Syntax error
or access violation:1055 'test.mountains.height_meters' isn't in GROUP BY
This is because Laravel is by default configured to be used in strict mode with Mysql 5.7. The strict mode by default includes the SQL mode ONLY_FULL_GROUP_BY
to be used. This in turn requires that all the fields in the SELECT clause to also be included in the GROUP BY clause. To fix this, open up database.php
and in the mysql
section, replace 'strict' => true
with the following:
'modes' => [
'STRICT_TRANS_TABLES',
'ERROR_FOR_DIVISION_BY_ZERO',
'NO_ZERO_DATE',
'NO_ZERO_IN_DATE',
'NO_AUTO_CREATE_USER',
'NO_ENGINE_SUBSTITUTION'
],
The above is in essence 'strict' => true
without the ONLY_FULL_GROUP_BY
mode enabled. In case, you write your SQL queries and test them in a console that uses a less strict version of SQL(like in phpMyAdmin for example) and are wondering why they dont work with Laravel, its most likely one of the SQL modes that have been set.
The groupBy clause in this example could be replaced by groupBy('height_range')
which works with the ONLY_FULL_GROUP_BY
SQL Mode and produces similar results as the original query.
Binding input data to the CASE Query
In normal queries, we rely on user input to shape our queries and retrieve pertinant data from the database. In the following example, we rely on user input to determine what is returned as the label for height_range
:
$label1 = Input::get('height1'); // For ex: "3-5K"
$label2 = Input::get('height2'); // For ex: ">5K"
Mountain::select(
DB::raw(
"CASE
WHEN height_meters>3000 AND height_meters<5000
THEN '$label1'
ELSE '$label2'
END AS height_range"),
DB::raw("count(*) AS ascents"))
->groupBy('height_range')
->toSql()
We have used toSql()
instead of get()
which returns the native SQL query as a string. This is useful to experiment with input data and to study if SQL injection attacks are possible.
DB::raw
in the above example are used to create arbitrary SQL queries inside the SELECT clause and the input values are not sanitized before being bound to the query. This creates the possibility of attack of database centric applications via SQL injection or cross site scripting(XSS) injection. In the case of XSS, we can sanitize the user input using htmlentities
function. For SQL injection attacks we can utilize parameter binding feature of the selectRaw
method. This method also does not need a DB::raw
as input and can take direct SQL query elements as strings. We can reformulate the statement in a safer fashion as follows:
Mountain::selectRaw(
"CASE
WHEN height_meters>3000 AND height_meters<5000
THEN label1 = :label1
ELSE label2 = :label2
END AS height_range",
array('label1' => $label1, 'label2' => $label2))
->selectRaw("count(*) AS ascents")
->groupBy('height_range')
->toSql()
SQL Injection Attacks
In the previous example, we relied on a very nify trick – generating a dynamic SQL string using the form input data. This is the only way to retrieve or insert data received from the user into the database. However it is wide open to SQL injection attacks, which are most common in Database centric applications. We also discussed binding parameters in the select or selectRaw methods of query builder to mitigate the effect. In this section, we will outline examples of SQL injection strategies so as to be aware of what might be possible.
The most common example of this attack type proceeds as follows… Lets say we have a SQL statement of the following type in PHP
$stmt = "INSERT INTO mountains (name, height_meters, first_ascent) VALUES ('$name', '$height', '$ascent' )";
The variables $name
, $height
and $ascent
are placeholders in the SQL statement which receive their values from a form. An attacker might for example submit a form with the following data entered into the $ascent
field:
hey this is an attack'); DROP TABLE mountains;-- '
Note the usage of the --
ie. a comment and a space after the comment. With this input the SQL query resembles the following:
$stmt = "INSERT INTO mountains (name, height_meters, first_ascent) VALUES ('$name', '$height', 'hey this is an attack'); DROP TABLE mountains;-- ' )"
With this sort of $stmt
, we have two SQL statements – one that adds an entry and another that drops the table mountains
. We could write out this in the mySQL console. But this attack will not actually work when using PDO and prepared statements of PHP. Let us take a quick look at the DB::select()
implementation of Laravel(found in Illuminate\Database\Connection.php)
$statement = $this->prepared($this->getPdoForSelect($useReadPdo)->prepare($query));
$this->bindValues($statement, $this->prepareBindings($bindings));
$statement->execute();
Nice thing about this approach is that the prepared statement and the bindings proceed as two different stages. We may imagine that the SQL prepared statement is first converted into an internal represenataion not amenable to be tampered with(as in the case of conversion of one SQL statement into two) and the bindings are later on added.
$name = "' OR ''='"
Mountain::whereRaw("name='$name'")->get();
The resultant SQL query is
SELECT * FROM mountains WHERE name='' OR ''=''
which retrieves all the data from the table. Even thought this example is trivial, this method can be used to gain access to a site(if a user is checked against a table of users checking against the condition that the resultset should be non zero) or find out all information in a table that is usually restricted(list all users that are members of a site).
To fix the problem in the above example, we can sanitize the input variable($name
in this case) by binding the parameter to our query as explicit strings using the second argument of whereRaw
which accepts an array of bindings.
Mountain::whereRaw("name = :name", ['name'=>$name])
Cross Site Scripting(XSS) Attack
Finally, let us consider the case of not properly sanitizing the user input. Consider the following example:
DB::select('INSERT INTO mountains (name, height_meters, first_ascent) VALUES ('<script>alert(\'Youve been hacked\')</script>', 1000, STR_TO_DATE('10/12/2015', '%d/%m/%Y'))');
This query when executed inserts a small script into the name field of the mountains table. If this record is retrieved and displayed in the blade view as unescaped data, then the script will execute.
$name = Mountain::find($id)->name;
return view('index', ['name'=>$name]);
// Somewhere in index.blade.php
{{ $name }} // ok
{!! $name !!} // Oh oh! script executed
So it might be wise to display the data with double curly braces which are automatically sent through PHP htmlspecialchars
function (or) we might sanitize the input with htmlentities
before inserting into the database.
Simulating CASE Statements with Laravel Accessors
What are Accessors?
Accessors allow us to format or create unique attributes in the Eloqeunt model when we access them. To define an accessor, we use the getTestMeAttribute()
method in the model. The accessor is defined using studly cased name of the of the corresponding snake-case column attribute. For example: if our table has a first_ascent column, and we want to return the column value in kilometers, we would write
public function getFirstAscentAttribute($value)
{
return $value/1000;
}
The $value
of the column is automatically passed into the accessor, allowing us to manipulate and return a modified value. Note: Our table and model has an attribute named first_ascent. The accessor does not automatically modify the value of this attribute. Instead we need to call the accessor as follows:
$model->first_ascent
Laravel automagically figures out that the method to be called is getFirstAscentAttribute
and passes in the attribute value.
Conditional Logic with Accessors
We can use Laravel Accessors to return new and computed values from existing attributes. This also means complex if-else conditions and filter logic can be placed in accessor methods. Let us recast the CASE example we saw earlier in terms of Laravel accessors:
protected $appends = ['height_range'];
public function getHeightRangeAttribute()
{
return ($this->height_meters > 3000 && $this->height_meters < 5000)
? '3-5K' : '>5K';
}
Our table does not have the height_range
attribute. We use $appends
property to add the attribute to our model. Note that attribute names in $appends
are in snake-case, and the corresponding accessors are defined in camelCase. Also the attributes in $appends
are only included in the result after model serialization to JSON or arrays. This is useful when retrieving data as part of an API request. Example:
return Mountain::all()->toArray()
Also to note: We can move the conditional logic to the Blade template files or the frontend client(Vue, Angular). Keeping the logic closer to the models make sense if we are dumping the data directly into a table plugin(such as server side AJAX calls in datatables). Not to mention, a bit faster.
Also, it is possible to dynamically append attributes to the returned data(in JSON or array form) by using the appends
method as follows:
return Mountain::all()->appends('height_range');