(no title)
Sonata | 2 years ago
Given the following tables:
CREATE TABLE person (
id INT PRIMARY KEY,
name VARCHAR NOT NULL
);
CREATE TABLE pet (
id INT PRIMARY KEY,
name VARCHAR NOT NULL,
owner_id INT NOT NULL
);
It is common to want to join them, like so: SELECT owner.id, owner.name, COUNT(pet.id) AS numberOfPets
FROM person AS owner
LEFT OUTER JOIN pet ON owner.id = pet.owner_id
GROUP BY owner.id
This doesn't work in standard SQL, because all columns in the SELECT list have to either be aggregated or included in the GROUP BY. owner.name is neither. That is a bit silly though because we know each result row will only have one unambiguous value for the owner name, since the GROUP BY is on a unique column from the same table as the owner name.We can solve this with ANY_VALUE:
SELECT owner.id, ANY_VALUE(owner.name) AS name, COUNT(pet.id) AS numberOfPets
FROM person AS owner
LEFT OUTER JOIN pet ON owner.id = pet.owner_id
GROUP BY owner.id
brabel|2 years ago
Not relevant, but I thought it may help other readers understand the problem.
mtts|2 years ago
That said, if owner.name has different values and you didn’t actually care which one you’d get ANY_VALUE could maybe be useful.
Though in that case you could also just use MIN or MAX, which already exist.
orthoxerox|2 years ago
I think all optimizers are smart enough already, as this is a super common pattern. I honestly would have preferred this logic to be made more explicit: if you group by the primary key (or by the table name, as someone suggested), you can use all columns from the table without aggregate functions.
This kinda breaks down when you introduce subqueries and CTEs, which is where myopic optimizers can screw the pooch as well, so any_value can be of use there.