Aggregating & calculating values
OverviewTeaching: 15 min
Exercises: 5 minQuestions
How can we aggregate values in SQL for reports?
Can SQL be used to make calculations?Objectives
Use SQL functions like
AVGin combination with clauses like
Group Byto aggregate values and return results for reports.
Make calculations on fields using SQL.
SQL contains functions which allow you to make calculations on data in your database for reports. Some of the most common functions are
MAX, MIN, AVG, COUNT, SUM, and they will:
MAX (find the maximum value in a field),
MIN (find the minimum value in a field),
AVG (find the average value of a field),
COUNT (count the number of values in a field and present the total), and
SUM (add up the values in a field and present the sum).
Let’s say we wanted to get the average
Citation_Count for each of the
ISSNs. We can use
AVG and the
GROUP BY clause in a query:
SELECT ISSNs, AVG(Citation_Count) FROM articles GROUP BY ISSNs;
GROUP BY is used by SQL to arrange identical data into groups. In this case, we are arranging all the citation counts by ISSNs.
AVG acts on the
Citation_Count in parentheses. This process is also called aggregation which allows us to combine results by grouping records based on value and calculating combined values in groups.
As you can see, it is difficult to tell though what ISSN has the highest average citation count and the least. We can improve upon the query above by using
ORDER BY and
SELECT ISSNs, AVG(Citation_Count) FROM articles GROUP BY ISSNs ORDER BY Avg_Citation_Count DESC;
Write a query using an aggregate function that returns the number of article titles per ISSNs, sorted by title count in descending order. Which ISSN has the most titles? (Hint to choosing which aggregate function to use - it is one of the common aggreggate functions
MAX, MIN, AVG, COUNT, SUM.)
SELECT ISSNs, COUNT(Title) FROM articles GROUP BY ISSNs ORDER BY Title_Count DESC;
SQL offers a mechanism to filter the results based on aggregate functions, through the
For example, we can adapt the last request we wrote to only return information about journal
ISSNs with 10 or more published articles:
SELECT ISSNs, COUNT(*) FROM articles GROUP BY ISSNs HAVING Record_Count >= 10;
HAVING keyword works exactly like the
WHERE keyword, but uses aggregate functions instead of database fields. When you want to filter based on an aggregation like
MAX, MIN, AVG, COUNT, SUM, use
HAVING; to filter based on the individual values in a database field, use
HAVING comes after
GROUP BY. One way to think about this is: the data are retrieved (
SELECT), can be filtered (
WHERE), then joined in groups (
GROUP BY); finally, we only select some of these groups (
Write a query that returns, from the
articlestable, the average
Citation_Countfor each journal ISSN but only for the journals with 5 or more citations on average.
SELECT ISSNs, AVG(Citation_Count) FROM articles GROUP BY ISSNs HAVING AVG(Citation_Count)>=5;
In SQL, we can also perform calculations as we query the database. Also known as computed columns, we can use expressions on a column or multiple columns to get new values during our query. For example, what if we wanted to calculate a new column called
SELECT Title, ISSNs, Author_Count -1 FROM articles ORDER BY Author_Count -1 DESC;
We can use any arithmetic operators (
/) if we would like.
If you would like to learn more about calculated values, the Software Carpentry Databases and SQL lesson includes a useful episode on Calculating New Values.
SQL can be used for reporting purposes.
Queries can do arithmetic operations on field values.