USING Clause and Column Auto-rename
What fields will I see as a result of a SELECT query that retrieves multiple fields with identical names?
Hold on before answering; it's not all that obvious :)
Let me explain what the answer depends on:
- The type of client from which the query is made.
- Whether we are using USING or not.
- Whether there is data in the sources or not.
- Whether we are looking directly at the result of the executed query or at the result from the history using RESULT_SCAN.
Cool, right? Now, I'll briefly explain each point :)
Client Type
Database client creators love adding spices to the work process of data engineers :) All examples discussed were executed in Snowsight (in SnowSQL, the result will be different). To be fair, it's not always clear which part of the described chaos is attributed to the client and which to the database itself. But we won't delve into that now…
The creators decided that when more than one column with the same name is encountered in the result, it would be a good idea to rename them! A, A1, A2… The idea is great, but how do I now understand if these are actually existing columns or not?
In reality, such columns do not exist. You cannot refer to them in SQL.
USING
USING is syntactic sugar to shorten the ON clause. But it affects not only ON, which might seem logical, but also the list of columns.
CREATE TABLE T1 (A INT, B INT);
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A);
It is implied that the result should be identical to:
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 ON T1.A = T1_1.A
and we should get A, B, A, B (or the renamed version described above).
But no! USING removes one column A. Really, why do you need 2 identical columns that will break everything for you later? So, it should be A, B, B.
Everything is fine here. But what if we continue to use JOIN with USING?
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A)
LEFT JOIN T1 as T1_2 USING(A)
...
Do you think that each time another column B will be added? Nope! :) All subsequent JOIN USING with identically named columns will simply be ignored! Moreover, this will happen until a JOIN with ON is encountered!
SELECT
*
FROM T1
LEFT JOIN T1 as T1_1 USING(A)
LEFT JOIN T1 as T1_2 USING(A)
LEFT JOIN T1 as T1_3 ON T1.A = T1_3.A
will return A B B2 A2 B_3
Empty and Non-Empty Tables
Yes, Snowflake (Snowsight) renames columns differently depending on whether there is data in the tables or not!
-- EMPTY TABLES
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A_2 B_2
-- TABLE T1 IS NOT EMPTY
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A B
RESULT_SCAN
This function returns the result of a query by its UUID in QUERY_HISTORY:
SELECT * FROM t1 CROSS JOIN t1 a;
-- A B A_2 B_2
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()));
-- A B A_1 B_1
But now, A1 and B1 are actual columns!
SELECT A_1 FROM
(
SELECT * FROM TABLE(RESULT_SCAN(LAST_QUERY_ID()))
);
-- OK
More examples are here: Data Lineage Challenge - Snowflake