Using PostgreSQL array_agg Function for data aggregation
PostgreSQL array_agg Function
The array_agg function in PostgreSQL is an aggregate function that collects multiple values from a group and returns them as an array. This is especially useful for aggregating data from multiple rows into a single array format, enabling you to perform complex data manipulation, such as collecting a list of related items or grouping values from multiple rows.
Syntax
array_agg(expression [ORDER BY sort_expression ASC|DESC])
- expression: The column or expression to be aggregated.
- ORDER BY: (Optional) Specifies the order of the elements in the resulting array.
Examples and Code Explanation
Example 1: Basic Array Aggregation
Suppose we have a students table with the following data:
id | name | course |
---|---|---|
1 | Alice | Math |
2 | Bob | Math |
3 | Charlie | Science |
4 | David | Math |
5 | Eve | Science |
You can use array_agg to list all student names by course.
Code:
SELECT
course, -- Group results by course
array_agg(name) AS student_names -- Collect names in an array for each course
FROM
students -- From students table
GROUP BY
course;
Result:
course student_names Math {Alice, Bob, David} Science {Charlie, Eve}
Explanation:
- array_agg(name): Aggregates the name field for each group (course) and returns it as an array.
- GROUP BY course: Groups the results by course, so each course has a separate row with all student names in that course.
Example 2: Using ORDER BY in array_agg
You can use the ORDER BY clause within array_agg to specify the order of items within the array.
Code:
SELECT
course, -- Group results by course
array_agg(name ORDER BY name ASC) AS student_names
FROM
students
GROUP BY
course;
This query sorts the names within each course in alphabetical order in the resulting arrays.
Example 3: Aggregating Multiple Columns
array_agg can also aggregate complex expressions, such as concatenating multiple columns.
Code:
SELECT
course, -- Group results by course
array_agg(name || ' (' || id || ')') AS student_details -- Concatenate name and id within array elements
FROM
students
GROUP BY
course;
Result:
course student_details Math {Alice (1), Bob (2), David (4)} Science {Charlie (3), Eve (5)}
Explanation:
- name || ' (' || id || ')': This concatenates name and id to give details for each student.
- array_agg: Collects each concatenated result into an array per group.
Example 4: Flattening an Array from Subquery
Sometimes, you may want to use array_agg in combination with a subquery to aggregate data more flexibly.
Code:
SELECT
array_agg(name) AS all_students
FROM
(SELECT DISTINCT name FROM students) AS unique_names; -- Get unique names before aggregation
Explanation:
- Subquery: Selects distinct student names to avoid duplicates.
- array_agg: Collects all unique student names into a single array.
Explanation of Key Points:
- Aggregating Data with array_agg: This function is ideal for collecting multiple rows into a single array, providing a compact, readable way to display grouped data.
- Using ORDER BY: The ORDER BY clause within array_agg gives control over the order of array elements, making it easier to manage sorted lists.
- Combining Expressions: array_agg can be combined with expressions and concatenations, making it possible to display multiple fields in a single array element.
All PostgreSQL Questions, Answers, and Code Snippets Collection.
- Weekly Trends and Language Statistics
- Weekly Trends and Language Statistics