Enhance Snowflake Efficiency: Question Profile – DZone – Uplaza

Having labored with over 50 Snowflake prospects throughout Europe and the Center East, I’ve analyzed a whole lot of Question Profiles and recognized many points together with points round efficiency and value.

On this article, I am going to clarify:

  • What’s the Snowflake Question Profile, and the way to learn and perceive the elements
  • How the Question Profile reveals how Snowflake executes queries and supplies insights about Snowflake and potential question tuning
  • What to look out for in a Question Profile and the way to determine and resolve SQL points

By the top of this text, you need to have a significantly better understanding and appreciation of this characteristic and learn to determine and resolve question efficiency points.

What Is a Snowflake Question Profile?

The Snowflake Question Profile is a visible diagram explaining how Snowflake has executed your question. It exhibits the steps taken, the information volumes processed, and a breakdown of a very powerful statistics.

Question Profile: A Easy Instance

To reveal the way to learn the question profile, let’s think about this comparatively easy Snowflake SQL:

choose o_orderstatus,
       sum(o_totalprice)
from orders
the place yr(o_orderdate) = 1994
group by all
order by 1;

The above question was executed towards a replica of the Snowflake pattern information within the snowflake_sample_data.tpch_sf100.orders desk, which holds 150m rows or about 4.6GB of information.

This is the question profile it produced. We’ll clarify the elements under.

Question Profile: Steps

The diagram under illustrates the Question Steps. These are executed from the underside up, and every step represents an independently executing course of that processes a batch of some thousand rows of information in parallel.

There are numerous varieties accessible, however the commonest embody:

  • TableScan [4] – Indicating information being learn from a desk; Discover this step took 94.8% of the general execution time. This means the question spent more often than not scanning information. Discover we can’t inform from this whether or not information was learn from the digital warehouse cache or distant storage.
  • Filter[3] – This makes an attempt to scale back the variety of rows processed by filtering out the information. Discover the Filter step takes in 22.76M rows and outputs the identical quantity. This raises the query of whether or not the the place clause filtered any outcomes.
  • Combination [2] – This means a step summarizing outcomes. On this case, it produced the sum(orders.totalprice). Discover that this step obtained 22.76M rows and output only one row.
  • Kind [1] – Which represents the order by orderstatus. This types the outcomes earlier than returning them to the Outcome Set.

Observe: Every step additionally features a sequence quantity to assist determine the sequence of operation. Learn these from highest to lowest.

Question Profile: Overview and Statistics

Question Profile: Overview

The diagram under summarises the elements of the Profile Overview, highlighting a very powerful elements.

The elements embody:

  • Whole Execution Time: This means the precise time in seconds the question took to finish. Observe: The elapsed time often is barely longer because it contains different elements, together with compilation time and time spent queuing for assets.
  • Processing %: Signifies the share of time the question spends ready for the CPU; When this can be a excessive share of complete execution time, it signifies the question is CPU-bound — performing complicated processing.
  • Native Disk I/O %: Signifies the share of time ready for SSD
  • Distant Disk I/O %: This means the share of time spent ready for Distant Disk I/O. A excessive share signifies the question was I/O sure. This means that the efficiency may be finest improved by lowering the time spent studying from the disk.
  • Synchronizing %: That is seldom helpful and signifies the share of time spent synchronizing between processes. This tends to be increased because of type operations.
  • Initialization %: Tends to be a low share of general execution time and signifies time spent compiling the question; A excessive share usually signifies a probably over-complex question with many sub-queries however a brief execution time. This means the question is finest improved by simplifying the question design to scale back complexity and, due to this fact, compilation time.

Question Profile Statistics

The diagram under summarises the elements of the Profile Statistics, highlighting a very powerful elements.

The elements embody:

  • Scan Progress: This means the share of information scanned. When the question continues to be executing, this can be utilized to estimate the share of time remaining.
  • Bytes Scanned: This means the variety of bytes scanned. In contrast to row-based databases, Snowflake fetches solely the columns wanted, and this means the information quantity fetched from Native and Distant storage.
  • Proportion Scanned from Cache: That is usually mistaken for a significant statistic to observe. Nevertheless, when contemplating the efficiency of a selected SQL assertion, Proportion Scanned from Cache is a poor indicator of excellent or unhealthy question efficiency and ought to be largely ignored.
  • Partitions Scanned: This means the variety of micro partitions scanned and tends to be a crucial determinant of question efficiency. It additionally signifies the amount of information fetched from distant storage and the extent to which Snowflake may partition eradicate — to skip over partitions, defined under.
  • Partitions Whole: Reveals the entire variety of partitions in all tables learn. That is finest learn together with Partitions Scanned and signifies the effectivity of partition elimination. For instance, this question fetched 133 of 247 micro partitions and scanned simply 53% of the information. A decrease share signifies the next charge of partition elimination, which can considerably enhance queries which might be I/O sure.

A Be part of Question Profile

Whereas the straightforward instance above illustrates the way to learn a question profile, we have to understand how Snowflake handles JOIN operations between tables to totally perceive how Snowflake works.

The SQL question under features a be a part of of the buyer and orders tables:

choose  c_mktsegment
,       depend(*)
,       sum(o_totalprice)
,       depend(*)
from    buyer
,       orders
the place   c_custkey = o_custkey
and     o_orderdate between ('01-JAN-1992') and ('31-JAN-1992')
group by 1
order by 1;

The diagram under illustrates the connection between these tables within the Snowflake pattern information within the snowflake_sample_data.tpch_sf100 schema.

The diagram under illustrates the Snowflake Question Plan used to execute this question, highlighting the preliminary steps that contain fetching information from storage.

One of the crucial necessary insights concerning the Question Profile above is that every step represents an independently working parallel course of that runs concurrently. This makes use of superior vectorized processing to fetch and course of a number of thousand rows at a time, passing them to the following step to course of in parallel.

Snowflake can use this structure to interrupt down complicated question pipelines, executing particular person steps in parallel throughout all CPUs in a Digital Warehouse. It additionally means Snowflake can learn information from the ORDERS and CUSTOMER information in parallel utilizing the Desk Scan operations.

How Does Snowflake Execute a JOIN Operation?

The diagram under illustrates the processing sequence of a Snowflake JOIN operation. To learn the sequence accurately, all the time begin from the Be part of step and take the left leg, on this case, all the way down to the TableScan of the ORDERS desk, step 5.

The diagram above signifies the steps have been as follows:

  • TableScan [5]: This fetches information from the ORDERS desk, which returns 19.32M rows out of 150M rows. This discount is defined by the Snowflake’s capability to routinely partition eradicate – to skip over micro-partitions, as described within the article on Snowflake Cluster Keys. Discover that the question spent 9.3% of the time processing this step.
  • Filter [4]: Receives 19.32M rows and logically represents the next line within the above question:
 
 and o_orderdate between ('01-JAN-1992') and ('31-JAN-1992')
 
 

This step represents filtering rows from the ORDERS desk earlier than passing them to the Be part of [3] step above. Surprisingly, this step seems to do no precise work because it receives and emits 19.32M rows. Nevertheless, Snowflake makes use of Predicate Pushdown, which filters the rows within the TableScan [4] step earlier than studying them into reminiscence. The output of this step is handed to the Be part of step.

  • Be part of [3]: Receives ORDERS rows however must fetch corresponding CUSTOMERentries. We, due to this fact, must skip all the way down to the TableScan [7] step.
  • TableScan [7]: Fetches information from the CUSTOMER desk. Discover this step takes 77.7% of the general execution time and, due to this fact, has probably the most important potential profit from question efficiency tuning. This step fetches 28.4M rows, though Snowflake routinely tunes this step, as there are 1.5 Bn rows on the CUSTOMER desk.
  • JoinFilter [6]: This step represents an automated Snowflake efficiency tuning operation that makes use of a Bloom Filter to keep away from scanning micro-partitions on the right-hand aspect of a Be part of operation. In abstract, as Snowflake has already fetched the CUSTOMER entries, it solely must fetch ORDERS for the matching CUSTOMER rows. This explains the very fact the TableScan [7] returns solely 28M of the 1.5Bn doable entries. It is value noting this efficiency tuning is routinely utilized, though it might be improved utilizing a Cluster Key on the ORDERS desk on the be a part of columns.
  • Be part of [3]: This represents the precise be a part of of information within the CUSTOMER and ORDERStables. It is necessary to grasp that each Snowflake Be part of operation is applied as a Hash Be part of.

What Is a Snowflake Hash Be part of?

Whereas it might seem we’re disappearing into the Snowflake internals, bear with me. Understanding how Snowflake executes JOIN operations highlights a crucial performance-tuning alternative.

The diagram under highlights the important statistics to be careful for in any Snowflake Be part of operation.

The diagram above exhibits the variety of rows fed into the JOIN and the entire rows returned. Particularly, the left leg delivered fewer rows (19.32M) than the correct leg (28.4M). That is necessary as a result of it highlights an rare however crucial efficiency sample: The variety of rows fed into the left leg of a JOIN should all the time be lower than the correct.

The explanation for this crucial rule is revealed in the way in which Snowflake executes a Hash Be part of, which is illustrated within the diagram under:

The above diagram illustrates how a Hash Be part of operation works by studying a complete desk into reminiscence and producing a singular hash key for every row. It then performs a full desk scan, which appears to be like up towards the in-memory hash key to hitch the ensuing information units.

Due to this fact, it is important to accurately determine the smaller of the 2 information units and skim it into reminiscence whereas scanning the bigger of the 2, however generally Snowflake will get it improper. The screen-shot under illustrates the state of affairs:

Within the above instance, Snowflake must learn eight million entries into reminiscence, create a hash key for every row, and carry out a full scan of simply 639 rows. This results in very poor question efficiency and a be a part of that ought to take seconds however usually takes hours.

As I’ve defined beforehand in an article on Snowflake Efficiency Tuning, that is usually the results of a number of nested joins and group by operations, which makes it tough for Snowflake to determine the cardinality accurately.

Whereas this occurs occasionally, it may result in excessive efficiency degradation and the most effective follow method is to simplify the question, maybe breaking it down into a number of steps utilizing transient or short-term tables.

Figuring out Points Utilizing the Question Profile

Question Profile Be part of Explosion

The screenshot under illustrates a standard subject that usually results in each poor question efficiency and (extra importantly) incorrect outcomes.

Discover the output of the Be part of [4] step does not match the values enter on the left or proper leg regardless of the very fact the question be a part of clause is an easy be a part of by CUSTKEY?

This subject is commonly known as a “Join Explosion” and is usually brought on by duplicate values in one of many tables. As indicated above, this often results in poor question efficiency and ought to be investigated and glued.

Observe: One potential solution to routinely determine Be part of Explosion is to make use of the Snowflake operate GET_OPERATOR_QUERY_STATS , which permits programmatic entry to the question profile.

Unintended Cartesian Be part of

The screenshot under illustrates one other widespread subject simply recognized within the Snowflake question profile: a cartesian be a part of operation.

Just like the Be part of Explosion above, this question profile is produced by a mistake within the SQL question. This error produces an output that multiplies the scale of each inputs. Once more, that is straightforward to identify in a question profile, and though it might, in some instances, be intentional, if not, it may result in very poor question efficiency.

Disjunctive OR Question

Disjunctive database queries are queries that embody an OR within the question WHEREclause. That is an instance of a legitimate use of the Cartesian Be part of, however one which may be simply prevented.

Take, for instance, the next question:

choose distinct l_linenumber
from snowflake_sample_data.tpch_sf1.lineitem,
     snowflake_sample_data.tpch_sf1.partsupp
the place (l_partkey = ps_partkey)
or
      (l_suppkey = ps_suppkey);

The above question produced the next Snowflake Question Profile and took 7m 28s to finish on an XSMALL warehouse regardless of scanning solely 28 micro partitions.

Nevertheless, when the identical question was rewritten (under) to make use of a UNION assertion, it took simply 3.4 seconds to finish, a 132 instances efficiency enchancment for little or no effort.

choose l_linenumber
from snowflake_sample_data.tpch_sf1.lineitem 
be a part of snowflake_sample_data.tpch_sf1.partsupp
on   l_partkey = ps_partkey
union
choose l_linenumber
from snowflake_sample_data.tpch_sf1.lineitem 
be a part of snowflake_sample_data.tpch_sf1.partsupp
on   l_suppkey = ps_suppkey;

Discover the Cartesian Be part of operation accounted for 95.8% of the execution time. Additionally, the Profile Overview signifies that the question spent 98.9% of the time processing. That is value noting because it demonstrates a CPU-bound question.

Wrapping Columns within the WHERE Clause

Whereas this subject is tougher to determine from the question profile alone, it illustrates one a very powerful statistics accessible, the Partitions Scanned in comparison with Partitions Whole.

Take the next SQL question for example:

choose o_orderpriority, 
       sum(o_totalprice)
from  orders
the place o_orderdate = to_date('1993-02-04','YYYY-MM-DD')
group by all;

The above question was accomplished in 667 milliseconds on an XSMALL warehouse and produced the next profile.

Discover the sub-second execution time and that the question solely scanned 73 of 247 micro partitions. Evaluate the above state of affairs to the next question, which took 7.6 seconds to finish – 11 instances slower than the earlier question to supply the identical outcomes.

choose o_orderpriority, 
       sum(o_totalprice)
from  orders
the place to_char(o_orderdate, 'YYYY-MM-DD') = '1993-02-04'
group by all;

The screenshot above exhibits the second question was 11 instances slower as a result of it wanted to scan 243 micro-partitions. The explanation lies within the WHERE clause.

Within the first question, the WHERE clause compares the ORDERDATE to a hard and fast literal. This meant that Snowflake was in a position to carry out partition elimination by date.

the place o_orderdate = to_date('1993-02-04','YYYY-MM-DD')

Within the second question, the WHERE clause modified the ORDERDATE area to a personality string, which lowered Snowflake’s capability to filter out micro-partitions. This meant extra information wanted to be processed which took longer to finish.

the place to_char(o_orderdate, 'YYYY-MM-DD') = '1993-02-04'

Due to this fact, the most effective follow is to keep away from wrapping database columns with features, particularly not user-defined features, which severely impression question efficiency.

Figuring out Spilling to Storage within the Snowflake Question Profile

As mentioned in my article on enhancing question efficiency by avoiding spilling to storage, this tends to be an easy-to-identify and probably resolve subject.

Take, for instance, this easy benchmark SQL question:

choose ss_sales_price
from   snowflake_sample_data.TPCDS_SF100TCL.STORE_SALES
order by SS_SOLD_DATE_SK, SS_SOLD_TIME_SK, SS_ITEM_SK, SS_CUSTOMER_SK, 
         SS_CDEMO_SK, SS_HDEMO_SK, SS_ADDR_SK, SS_STORE_SK, 
         SS_PROMO_SK, SS_TICKET_NUMBER, SS_QUANTITY;

The above question sorted a desk with 288 billion rows and took over 30 hours to finish on a SMALL digital warehouse. The crucial level is that the Question Profile Statistics confirmed that it spilled over 10 TB to native storage and eight TB to distant storage. Moreover, as a result of it took so lengthy, it value over $183 to finish.

The screenshot above exhibits the question profile, execution time, and bytes spilled to native and distant storage. It is also value noting that the question spent 70.9% of the time ready for Distant Disk I/O, in line with the information quantity spilled to Distant Storage.

Evaluate the outcomes above to the screenshot under. This exhibits the identical question executed on an X3LARGE warehouse.

The Question Profile above exhibits that the question was accomplished in 38 minutes and produced no distant spilling. Along with finishing 48 instances quicker than on the SMALL warehouse additionally value $121.80, a 66% discount in value. Assuming this question was executed each day, that may quantity to an annual financial savings of over $22,000.

Conclusion

The instance above illustrates my level in an article on controlling Snowflake prices. Snowflake Knowledge Engineers and Directors are likely to put far an excessive amount of emphasis on tuning efficiency. Nevertheless, Snowflake has modified the panorama, and we have to give attention to each maximizing question efficiency and controlling prices.

The duty of managing value whereas maximizing efficiency could seem at odds with one another, however utilizing the Snowflake Question Profile and the strategies described on this article, there is not any purpose why we will not ship each.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version