Virtual (computed/calculated/derived/generated) columns

  • Anton Revyako
    Anton Revyako

    Founder of dwh.dev

Virtual (computed/calculated/derived/generated) columns

In this series of posts, we'll be discussing some "unique" behaviors of the Snowflake analytical base, which you won't find in the documentation.

Let's start with virtual (computed/calculated/derived/generated) columns.

In the documentation, virtual columns are mentioned only in the context of EXTERNAL TABLEs and STREAMs. These are METADATA$… columns that contain record metadata.

But you can also create virtual columns in regular tables:

CREATE TABLE T (
  A INT,
  B INT,
  C INT AS (A + B )
)

Such columns cannot be updated. Also, you cannot perform INSERT INTO specifying these columns. You'll get an error message:

> Virtual column 'C' is an invalid target.

ATTENTION!

Values in virtual columns are not saved on disk. They are calculated each time they are accessed. Thus, you can't save on one-time calculation during insertion/update. This behavior differs from the usual GENERATED columns in PostgreSQL. To save calculated data, you can use DEFAULT expressions for columns. Unfortunately, this approach won't protect against forceful data update in such columns.

What are virtual columns for?

  1. If you want to "hard code" an algorithm and get prettier SELECT queries

  2. They can also be used as a parameter for CLUSTER BY.

  3. There's another use for such columns - to use them as analogs of constraints.

The only difference being - constraints prevent the insertion and change of invalid data. However, since values of virtual columns are not saved, but calculated every time they are accessed, invalid data will still make its way into your table.

By making a selection on virtual columns, you can conveniently track violations of set restrictions.

CREATE TABLE T (
  A INT,
  C BOOLEAN AS (A>0)
);
  1. Bonus: You can discourage your colleagues from using
SELECT * FROM T

and list all columns with a comma.

Make a virtual column with the expression (1/0) and when trying to make a SELECT * … you'll get an error.

Although, now this can be bypassed using SELECT * EXCLUDE …

Also, a side effect will be - everyone will hate you 🙂

Virtual column implementation subtleties

Expressions are rewritten by the Snowflake engine.

CASE WHEN A >0 THEN A ELSE B END

becomes

IFF(A > 0, A, B )

when exporting DDL.

Numeric types are confusing 🙂

CREATE TABLE T (
  A INT,
  B INT AS (1/(2*A))
);

When exporting DDL you will indeed get column B with type NUMBER(38,0), but the column type when accessing it will correspond to NUMBER(7,6).

Now we’ll tell you about a nuance that will lead to the creation of tables from which data cannot be read!

We already know that Snowflake can rewrite the expression specified as the value of a virtual column. But what will happen if we put a whole query in there?

CREATE TABLE MY_TBL2 (
    KEY NUMBER(38,0),
    CHECK_SUM NUMBER(38,0) AS (SELECT key FROM MY_TBL1)
);

The table will be created!

Snowflake will check the query in the virtual column and if the MYTBL1 table doesn't exist, then the MYTBL2 table will not be created.

This query will not be rewritten, although some time ago we would have seen something like this:

AS ((SELECT 1 AS "KEY" FROM TABLE (GENERATOR)ROWCOUNT => 1, rowCount => 1) GENERATOR))

You can insert data into the MY_TBL2 table, but you won't be able to retrieve it afterwards.

And what if we try a simpler query?

CREATE TABLE MY_TBL3 (
    KEY NUMBER(38,0),
    CHECK_SUM NUMBER(38,0) AS (SELECT 1)
);

The table will also be created
but accessing it will be impossible.