Are you a software engineer who works with PostgreSQL? Do you often find yourself needing to manipulate and aggregate strings in your database? If so, you're in luck because PostgreSQL provides a powerful function called string_agg. In this blog post, you dive into what string_agg is, how it's related to MySQL's group_concat, and walk through a couple of examples using real data. By the end, you'll have a solid understanding of how to make the most string_agg in your PostgreSQL queries. Let’s get going!
Table of contents #
- Postgres string_agg
- Related to MySQL group_concat
- Example of billionaires data
- List the billionaires by country and birth year with count per country
- List the billionaires by category and count per category
- Conclusion
Postgres string_agg #
Postgres string_agg is an aggregate function that allows you to concatenate values from multiple rows into a single string. It is particularly useful when dealing with string aggregation or concatenation in a SQL query. This function can be applied to columns of text, varchar, or other string types. The basic syntax of string_agg is as follows:
string_agg(expression, delimiter)
The two parts are expression and delimeter:
- expression: The expression or column you want to concatenate.
- delimiter: The separator that will be inserted between the concatenated values.
For instance, you have customers in multiple countries, and for a campaign, you want to send emails to them. The need is to get all the customer emails grouped by country it can be done with a query that looks like the below:
SELECT country.name, string_agg(customer.email, ‘, ’)
FROM customer INNER JOIN country ON customer.country_id = country.id
GROUP By country.name
Related to MySQL group_concat #
If you're familiar with MySQL, you might have used the group_concat function. string_agg in Postgres serves a similar purpose as group_concat in MySQL. They both allow you to aggregate and concatenate values from multiple rows into a single string. However, there are some syntax and implementation differences between the two.
Below is an example of the usage of group_concat taken directly form the MySQL docs:
SELECT student_name,
GROUP_CONCAT(DISTINCT test_score
ORDER BY test_score DESC SEPARATOR ' ')
FROM student
GROUP BY student_name;
Here the query will get the distinct test_score highest first grouped by student and separated by a space character.
Now that we have a clear understanding of what string_agg is and how it compares to group_concat, let's move on to some practical examples using real data of billionaires of the world.
Example of billionaires data #
Yes, you heard it right, the top 100 billionaires of the world. Thankfully, Kaggle has a billionaires dataset of 2023 that lists 2540 billionaires in the world with many fields. I downloaded the CSV and removed most of the fields not relevant to this example. I also took only the top 100 billionaires from that list to create a Postgres table as follows:
CREATE TABLE "billionaire" (
"id" serial PRIMARY KEY,
"rank" integer,
"worth" integer,
"name" varchar,
"gender" varchar,
"category" varchar,
"country" varchar,
"city" varchar,
"source" varchar,
"industries" varchar,
"citizenship_country" varchar,
"organization" varchar,
"title" varchar,
"birth_year" integer
);
I have deliberately kept the structure very simple and have not added any index or unique indexes to the table as it will have only 100 rows. The goal of this tutorial is to understand the Postgres string_agg function, not something else. The data of the 10 (or 100) billionaires can be inserted with the following insert statement:
INSERT INTO "billionaire" (
rank, worth, name, gender, category, country, city,
source, industries, citizenship_country, organization, title, birth_year
) VALUES
(1,211000,'Bernard Arnault & family','M','Fashion & Retail','France','Paris','LVMH','Fashion & Retail','France','LVMH Moët Hennessy Louis Vuitton','Chairman and CEO',1949),
(2,180000,'Elon Musk','M','Automotive','United States','Austin','Tesla, SpaceX','Automotive','United States','Tesla','CEO',1971),
(3,114000,'Jeff Bezos','M','Technology','United States','Medina','Amazon','Technology','United States','Amazon','Chairman and Founder',1964),
(4,107000,'Larry Ellison','M','Technology','United States','Lanai','Oracle','Technology','United States','Oracle','CTO and Founder',1944),
(5,106000,'Warren Buffett','M','Finance & Investments','United States','Omaha','Berkshire Hathaway','Finance & Investments','United States','Berkshire Hathaway Inc. (Cl A)','CEO',1930),
(6,104000,'Bill Gates','M','Technology','United States','Medina','Microsoft','Technology','United States','Bill & Melinda Gates Foundation','Cochair',1955),
(7,94500,'Michael Bloomberg','M','Media & Entertainment','United States','New York','Bloomberg LP','Media & Entertainment','United States','Bloomberg','CEO',1942),
(8,93000,'Carlos Slim Helu & family','M','Telecom','Mexico','Mexico City','Telecom','Telecom','Mexico','América Móvil','Honorary Chairman',1940),
(9,83400,'Mukesh Ambani','M','Diversified','India','Mumbai','Diversified','Diversified','India','Reliance Industries','Founder and Chairman',1957),
(10,80700,'Steve Ballmer','M','Technology','United States','Hunts Point','Microsoft','Technology','United States','Los Angeles Clippers','Owner',1956);
The above is not the full list of top 100 billionares in 2023, it is just the 10. You can get the full list of top 100 in this gist.
You can run Postgres with Docker and Docker Compose locally. Still, The easiest way for you to test this without needing to install Postgres locally will be to use ElephantSQL. Their free plan will give you a database with a max of 20MB of data and 5 concurrent connections which is more than enough to try out this example. You can follow their official getting started documentation to set up a database and insert the date using the above SQL statements.
Once you have the database set up with the 100 rows added. You can ask various questions as you will do in the next section.
List the billionaires by country and birth year with count per country #
You can get answers to questions like listing all the billionaires by country and year of birth youngest first with the count by country. It can be done easily by using the string_agg
function in Postgres as below:
SELECT
b.country,
STRING_AGG (
b.name || ' - ' || b.birth_year, ', '
ORDER BY b.birth_year DESC
) AS billionaire_birth_year,
COUNT(b.name) as no_of_billionaires
FROM
billionaire b
GROUP BY
b.country
ORDER BY no_of_billionaires DESC;
The output will look like the below:
The output has been truncated with the top 10 rows but it will list all the 20 rows/countries if you run it on ElephantSQL Browser or any other CLI tool like psql or a GUI like DataGrip. You can already see the power of string_agg function in PostgreSQL. Let’s ask one more question in the next section and that will be all the examples.
List the billionaires by category and count per category #
The query this time is which category of work has the most billionaires and list the names alphabetically and number of billionaires by category. It can be achieved by:
SELECT
b.category,
STRING_AGG (
b.name, ', '
ORDER BY b.name ASC
) AS billionaire_names,
COUNT(b.name) as no_of_billionaires
FROM
billionaire b
GROUP BY
b.category
ORDER BY no_of_billionaires DESC;
The output will look like the below:
Surely in 2023, there will be more billionaires making their money from the technology category than any other. For instance, in the top 100 billionaires of the world Real Estate has 2 whereas tech has 16, so work in tech :). You can play around with the data and ask more questions to get more insights with SQL and string_agg.
Using string_agg with WITHIN GROUP #
In addition to the basic usage of string_agg, PostgreSQL provides an extension to this function called WITHIN GROUP, allowing you to specify the order in which values are concatenated within each group, which is particularly valuable when the sequence of items in the concatenated string is significant.
The syntax for using WITHIN GROUP is as follows:
string_agg(expression, delimiter) WITHIN GROUP (ORDER BY ordering_expression)
The WITHIN GROUP extension in PostgreSQL's string_agg function enables ordered concatenation, providing control over the sequence of values within each group. This functionality proves especially useful for tasks like listing products sorted by prices.
Let's update the aforementioned query to obtain answers to questions such as listing all the billionaires by country and year of birth, with the youngest first, along with the count by country:
SELECT
b.country,
STRING_AGG (
b.name || ' - ' || b.birth_year, ', '
ORDER BY b.birth_year DESC
WITHIN GROUP (ORDER BY b.birth_year DESC)
) AS billionaire_birth_year,
COUNT(b.name) as no_of_billionaires
FROM
billionaire b
GROUP BY
b.country
ORDER BY no_of_billionaires DESC;
In this query, the string_agg function will concatenate billionaire names along with their birth years (ordered by birth year in descending order) within each country group and count the number of billionaires in each country, ultimately sorting the results by the number of billionaires in descending order.
Did you notice any difference? In this query, we added 'WITHIN GROUP'. The key distinction is that the previous query applies the ordering to the entire result set, whereas in this one, we ensure that the concatenation is ordered within each country group as well.
Why does this matter? Consider a scenario where the order of items in the concatenated string is crucial, such as when creating a comma-separated list of items in a specific sequence. The WITHIN GROUP extension empowers you to control this order, providing a powerful tool for precise string aggregation.
It's important to note that if the order of items is not significant for your use case, you can continue to use string_agg without WITHIN GROUP as demonstrated earlier.
Conclusion #
PostgreSQL's string_agg function is a powerful tool for string aggregation in SQL queries. It allows you to concatenate values from multiple rows into a single string, making it ideal for tasks such as generating comma-separated lists or creating custom reports.
In this blog post, you found out what string_agg is and how it relates to MySQL's group_concat. You also learned its practical use with real data, including listing billionaires by country and category.
The next time you find yourself working on a PostgreSQL project that involves string aggregation, remember the string_agg function and how it can simplify your queries and data processing. Happy querying!