In this article, I will explain django GROUP BY(Annotate and values) compare with sql GROUP BY clause.
In the context of databases and SQL, the GROUP BY clause is used to group rows with similar values in specified columns, allowing you to perform aggregate functions on each group. This operation is especially useful for summarizing and analyzing data based on common characteristics. When you use GROUP BY, the result set is divided into groups, and aggregate functions (such as SUM, COUNT, AVG, etc.) are applied to each group.
I have two tables: Product
, Sales
. Using those two table I will explain group by with proper example.
from django.db import models
class Product(models.Model):
name = models.CharField(max_length=255)
price = models.DecimalField(max_digits=10, decimal_places=2)
discount = models.DecimalField(max_digits=5, decimal_places=2)
class Sales(models.Model):
product = models.ForeignKey(Product, on_delete=models.CASCADE)
items = models.IntegerField()
created_at = models.DateTimeField(auto_now_add=True, verbose_name="created at")
updated_at = models.DateTimeField(auto_now=True, verbose_name="updated at")
Product table with sample data:
|---------------------------------|
| name | price | discount |
|---------------------------------|
| Product A | 20.00 | 2.50 |
| Product B | 15.50 | 1.75 |
| Product C | 30.00 | 5.00 |
| Product D | 25.75 | 3.25 |
|---------------------------------|
Sales table with sample data:
|-------------------------------------------------------|
| product | items | created_at | updated_at |
|-------------------------------------------------------|
| Product A | 5 | 2022-01-01 | 2022-01-02 |
| Product B | 8 | 2022-01-02 | 2022-01-03 |
| Product C | 10 | 2022-01-03 | 2022-01-04 |
| Product A | 3 | 2022-01-04 | 2022-01-05 |
| Product D | 6 | 2022-01-05 | 2022-01-06 |
| Product B | 7 | 2022-01-06 | 2022-01-07 |
| Product C | 4 | 2022-01-07 | 2022-01-08 |
| Product D | 9 | 2022-01-08 | 2022-01-09 |
| Product A | 2 | 2022-01-09 | 2022-01-10 |
| Product C | 5 | 2022-01-10 | 2022-01-11 |
|-------------------------------------------------------|
Syntax of GROUP BY function of SQL:
SELECT column1, aggregate_function1(column_name), aggregate_function2(column_name) ...
FROM table
GROUP BY column1
OR
SELECT column1, column2, ..., aggregate_function1(column_name), aggregate_function2(column_name) ...
FROM table
GROUP BY column1, column2, ...,
N.B: You just use such columns in the select statement, which is used in Group By. But you can use many aggregation functions.
Here's a breakdown of the syntax:
- SELECT: The columns you want to retrieve in the result set.
- aggregate_function: An aggregate function (e.g., COUNT, SUM, AVG, MIN, MAX) applied to one or more columns.
- FROM: The table from which to retrieve the data.
- GROUP BY: Specifies the columns by which to group the result set.
In Django, GROUP BY functionality is achieved by the values() and annotate() functions.
GROUP BY -> values()
aggregation_function -> annotate( function_name() )
Syntax of Django Annotate and values:
queryset = YourModel.objects.values('field1').annotate(variable_name=FunctionName('field_name'))
OR
queryset = YourModel.objects.values('field1', "field2).annotate(variable_name=FunctionName('field_name'))
GROUP BY and count function:
If you want to count the number of rows for each product in the 'Sales' table, you can use the following SQL query:
SQL:
SELECT product, COUNT(*) AS total_row
FROM Sales
GROUP BY product;
In this query:
- The product column is specified in the GROUP BY clause, indicating that the result set should be grouped by distinct products.
- The COUNT(*) function is used to count the number of rows for each group (each product).
- The result will show the total number of rows (transactions) for each distinct product in the 'Sales' table.
Here's an example of what the output might look like:
|-----------------|-------------|
| product | total_rows |
|-----------------|-------------|
| Product A | 3 |
| Product B | 2 |
| Product C | 3 |
| Product D | 3 |
|-----------------|-------------|
Django group by count:
import json
from django.db.models import Count
from your_app.models import Sales
sales_count_by_product = Sales.objects.values('product').annotate(total_rows=Count('id'))
Here's an example of what the output might look like:
{
"Product A": 3,
"Product B": 2,
"Product C": 3,
"Product D": 3
}
GROUP BY and sum function:
SQL:
SELECT product, SUM(items) AS total_items_sold
FROM Sales
GROUP BY product;
Here's an example of what the output might look like:
|-----------------|-------------|
| product | total_items_sold |
|-----------------|-------------|
| Product A | 10 |
| Product B | 15 |
| Product C | 19 |
| Product D | 15 |
|-----------------|-------------|
Django group by sum:
from django.db.models import Sum
from your_app.models import Sales # Replace 'your_app' with your actual app name
# Sum items for each product
sales_sum_by_product = Sales.objects.values('product').annotate(total_items=Sum('items'))
Here's an example of what the output might look like:
{
"Product A": 10,
"Product B": 15,
"Product C": 19,
"Product D": 15
}
GROUP BY and avg function:
SQL:
SELECT product, AVG(items) AS avg_items_sold
FROM Sales
GROUP BY product;
Here's an example of what the output might look like:
|-----------------|---------------------|
| product | avg_items_sold |
|-----------------|---------------------|
| Product A | 3.3333 |
| Product B | 7.5000 |
| Product C | 6.3333 |
| Product D | 5.0000 |
|-----------------|---------------------|
Django group by avg:
from django.db.models import Avg
from your_app.models import Sales # Replace 'your_app' with your actual app name
# Calculate average items for each product
sales_avg_by_product = Sales.objects.values('product').annotate(avg_items=Avg('items'))
Here's an example of what the output might look like:
{
"Product A": 3.33,
"Product B": 7.50,
"Product C": 6.33,
"Product D": 5.00
}
GROUP BY and min, max function:
SQL:
SELECT product, MIN(items) AS min_items_sold
FROM Sales
GROUP BY product;
|-----------------|---------------------|
| product | min_items_sold |
|-----------------|---------------------|
| Product A | 2 |
| Product B | 7 |
| Product C | 4 |
| Product D | 6 |
|-----------------|---------------------|
SQL:
SELECT product, MAX(items) AS max_items_sold
FROM Sales
GROUP BY product;
|-----------------|---------------------|
| product | max_items_sold |
|-----------------|---------------------|
| Product A | 5 |
| Product B | 8 |
| Product C | 10 |
| Product D | 9 |
|-----------------|---------------------|
Django group by MIN, MAX:
from django.db.models import Min, Max
# Calculate minimum and maximum items for each product
sales_min_max_by_product = Sales.objects.values('product').annotate(
min_items=Min('items'),
max_items=Max('items')
)
{
"Product A": {
"min_items": 2,
"max_items": 5
},
"Product B": {
"min_items": 7,
"max_items": 8
},
"Product C": {
"min_items": 4,
"max_items": 10
},
"Product D": {
"min_items": 6,
"max_items": 9
}
}