Wednesday, March 21, 2012

RFC: EXISTS (SELECT ... FROM) optional?

Hi,
in SQL there is a term that's bothering me:
EXISTS (SELECT ... FROM
I guess generations of SQL programmers put some brains into thinking of some
expression to put after the "SELECT" term. I usually use a NULL, like
EXISTS (SELECT NULL FROM
I tend to believe this term is as redundant as DELETE * FROM. Wouldn't it be
good to have this part optional, like with the DELETE statement? Like:
EXISTS ([SELECT <column>[,<column>] FROM ] <tablename> ...)
so it would be a valid term to write:
EXISTS (MyTable WHERE MyIdColumn = 1234)
RFC,
Axel DahmenAxel Dahmen wrote:
> Hi,
> in SQL there is a term that's bothering me:
> EXISTS (SELECT ... FROM
> I guess generations of SQL programmers put some brains into thinking of so
me
> expression to put after the "SELECT" term. I usually use a NULL, like
> EXISTS (SELECT NULL FROM
> I tend to believe this term is as redundant as DELETE * FROM. Wouldn't it
be
> good to have this part optional, like with the DELETE statement? Like:
> EXISTS ([SELECT <column>[,<column>] FROM ] <tablename> ...)
> so it would be a valid term to write:
> EXISTS (MyTable WHERE MyIdColumn = 1234)
> RFC,
> Axel Dahmen
You are quite right that EXISTS is a pretty silly syntax for a semi
join in SQL. I believe that EXISTS pre-dates the ANSI outer join
syntax. That probably explains why EXISTS exists at all.
Another problem is that EXISTS only implements two-value logic
(TRUE/FALSE) even though the predicate that forms the join (the WHERE
clause in the subquery) may give an UNKNOWN result.
David Portas
SQL Server MVP
--|||On 15 Dec 2005 03:29:17 -0800, David Portas wrote:

>Axel Dahmen wrote:
>You are quite right that EXISTS is a pretty silly syntax for a semi
>join in SQL. I believe that EXISTS pre-dates the ANSI outer join
>syntax. That probably explains why EXISTS exists at all.
Hi David,
An existence check with an EXISTS clause is only equivalent to an
existence check with an outer join if the relationship is not one to
many. Otherwise, you'll either have to use a DISTINCT or GROUP BY to get
rid of the duplicates, or remove duplicates in a derived table before
doing the join.
In either case, an EXISTS is simpler and more elegant.

>Another problem is that EXISTS only implements two-value logic
>(TRUE/FALSE) even though the predicate that forms the join (the WHERE
>clause in the subquery) may give an UNKNOWN result.
Why is that a problem? Do you also have a problem with SIGN that can
only return -1, 0, or -1 even though the integer domain contains many
more values?
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||Hugo Kornelis wrote:
> On 15 Dec 2005 03:29:17 -0800, David Portas wrote:
>
> Hi David,
> An existence check with an EXISTS clause is only equivalent to an
> existence check with an outer join if the relationship is not one to
> many. Otherwise, you'll either have to use a DISTINCT or GROUP BY to get
> rid of the duplicates, or remove duplicates in a derived table before
> doing the join.
> In either case, an EXISTS is simpler and more elegant.
A proper semijoin would implicitly be DISTINCT and would therefore work
much the same as EXISTS.

>
> Why is that a problem? Do you also have a problem with SIGN that can
> only return -1, 0, or -1 even though the integer domain contains many
> more values?
>
The problem is that EXISTS doesn't propagate nulls. It converts unknown
results to FALSE ones. The end results can be odd when you try to make
sense of the logic. For example:
CREATE TABLE foo (x INTEGER NOT NULL PRIMARY KEY) ;
CREATE TABLE bar (x INTEGER NOT NULL PRIMARY KEY, y INTEGER NULL) ;
INSERT INTO foo (x) VALUES (1) ;
INSERT INTO bar (x,y) VALUES (1,NULL) ;
SELECT x
FROM foo
WHERE NOT EXISTS
(SELECT *
FROM bar
WHERE bar.x = foo.x
AND bar.y = 2) ;
Result:
x
--
1
(1 row(s) affected)
But this doesn't match the expected interpretation of y=NULL. If y is
unknown then we cannot definitively say that no row exists in Bar where
x=1 and y=2. We should surely expect EXISTS to return UNKINOWN and
therefore no row would be returned. The above example contradicts the
following apparently identical query using the same data:
SELECT x
FROM bar
WHERE y=2
AND EXISTS
(SELECT *
FROM foo
WHERE foo.x = bar.x) ;
x
--
(0 row(s) affected)
David Portas
SQL Server MVP
--|||David Portas wrote:
> therefore no row would be returned. The above example contradicts the
> following apparently identical query using the same data:
> SELECT x
> FROM bar
> WHERE y=2
> AND EXISTS
> (SELECT *
> FROM foo
> WHERE foo.x = bar.x) ;
>
Oops. That should have been:
SELECT x
FROM bar
WHERE y<>2
AND EXISTS
(SELECT *
FROM foo
WHERE foo.x = bar.x) ;
The result is the same as before.
David Portas
SQL Server MVP
--|||>> in SQL there is a term that's bothering me:
EXISTS (SELECT ... FROM
I guess generations of SQL programmers put some brains into thinking of
some expression to put after the "SELECT" term. <<
Now I get to tell ANSI X3H2 stories!
The predicates that use a subquery are "[NOT] EXISTS(<subquery> )",
"<expr> [SOME|ANY] <comp op><subquery>" and "<expr> ALL <comp
op><subquery>"; they required a single-column <subquery> in the
original SQL Standard.
When you used a "EXISTS (SELECT * FROM.." syntax, the fiction was that
the engine picked a single column from the list to use.
In the very first SQL engines, it made a difference if you had a
constant or a column name or a star. Constants ran faster in
Oracle because it had a crappy optimizer -- and still does-- so you
will see Oracle code with "SELECT 1" in their code.
The star was preferred because the better optimizers could look for an
indexed column and use the index for any outer references without
looking at base tables at all.
Today, it does not matter -- the SELECT list is ignored. But the
SELECT * is considered better style because it clearly shows that we
are working at the table level and the unit of work is a whole row.
There is no "DELETE * FROM" in SQL and never has been. That is Access,
a proprietary language that is nothing like SQL except for a few stolen
keywords.
Finally, We thought about it in X3H2 over 20 years ago and rejected
it because (1) It was a way to pass hints to the optimizers back
then (2) Think about nesting subqueries in an EXISTS()!!! Your
proposed syntax would require a major change in the BNF and syntax
rules for a subquery, for the predicates, etc.|||>> The problem is that EXISTS doesn't propagate NULLs. It converts UNKNOWN r
esults to FALSE ones. <<
No. No. No. The EXISTS() is not defined to have an UNKNOWN result at
all. The unit of work is a row, not a column so it cannot produce an
UNKNOWN.
This is ***important*** because an UNKNOWN "converts" to a FALSE in the
ON and WHERE clauses in the DML and to a TRUE in the CHECK() and other
predicates in the DDL.
Gee, Dave, I have not "Celko-ed" in a loooooong time!|||On 16 Dec 2005 03:36:05 -0800, David Portas wrote:
(snip)
>A proper semijoin would implicitly be DISTINCT and would therefore work
>much the same as EXISTS.
Hi David,
Agreed. But that would require a new operator, as the OUTER JOIN does
not (and should not) imply a DISTINCT.

>The problem is that EXISTS doesn't propagate nulls. It converts unknown
>results to FALSE ones. The end results can be odd when you try to make
>sense of the logic. For example:
>CREATE TABLE foo (x INTEGER NOT NULL PRIMARY KEY) ;
>CREATE TABLE bar (x INTEGER NOT NULL PRIMARY KEY, y INTEGER NULL) ;
>INSERT INTO foo (x) VALUES (1) ;
>INSERT INTO bar (x,y) VALUES (1,NULL) ;
>SELECT x
> FROM foo
> WHERE NOT EXISTS
> (SELECT *
> FROM bar
> WHERE bar.x = foo.x
> AND bar.y = 2) ;
>Result:
>x
>--
>1
>(1 row(s) affected)
>But this doesn't match the expected interpretation of y=NULL. If y is
>unknown then we cannot definitively say that no row exists in Bar where
>x=1 and y=2. We should surely expect EXISTS to return UNKINOWN and
>therefore no row would be returned.
Well, I would surely not expect EXISTS to return UNKNOWN, since I know
how the EXISTS operator is defined. :-)
But I see your point. The EXISTS operator should be renamed to something
like KNOWN_FOR_SURE_TO_EXIST to match it's real behaviour. And a new
operator, somewhat like EXISTS but able to return UNKNOWN as well might
have it's merit.

> The above example contradicts the
>following apparently identical query using the same data:
(snip)
(copy corrected version from other post)
>SELECT x
> FROM bar
> WHERE y<>2
> AND EXISTS
> (SELECT *
> FROM foo
> WHERE foo.x = bar.x) ;
>x
>--
>(0 row(s) affected)
It's probably just me being dense, but I fail to see why you feel that
this second query should be identical to the first.
Best, Hugo
--
(Remove _NO_ and _SPAM_ to get my e-mail address)|||--CELKO-- wrote:
> No. No. No. The EXISTS() is not defined to have an UNKNOWN result at
> all. The unit of work is a row, not a column so it cannot produce an
> UNKNOWN.
> This is ***important*** because an UNKNOWN "converts" to a FALSE in the
> ON and WHERE clauses in the DML and to a TRUE in the CHECK() and other
> predicates in the DDL.
> Gee, Dave, I have not "Celko-ed" in a loooooong time!
That's what the SQL says but my contention is that this isn't a natural
interpretation given SQL's treatment of 3VL elsewhere. The definition
of EXISTS makes less sense for the user and is more likely to lead to
mistakes. Partly I suppose the choice of keyword is at fault. "EXISTS"
doesn't really check for existence of a real world entity since (in
SQL's model) its existence or non-existence may be an unknown.
"SUBQUERY RETURNS ROWS" might be a more accurate name for SQL's EXISTS
operator.
A SEMIJOIN operator would be a more useful variant of EXISTS because we
so commonly code that function using INNER joins. Unfortunately, given
that SQL supports tables without keys and can only support projection
through the DISTINCT keyword I can see how a SEMIJOIN operator might
seem tricky and counter-intuitive to some users also.
David Portas
SQL Server MVP
--

No comments:

Post a Comment