Leveraging Snowflake AI/ML for Anomaly Detection – DZone – Uplaza

Anomaly detection is the method of figuring out the info deviation from the anticipated ends in a time-series knowledge. This deviation can have a big impact on forecasting fashions if not recognized earlier than the mannequin creation. Snowflake Cortex AL/ML suite helps you practice the fashions to identify and proper these outliers in an effort to assist enhance the standard of your outcomes. Detecting outliers additionally helps in figuring out the supply of the deviations in processes.

Anomaly detection works with each single and multi-series knowledge. Multi-series knowledge represents a number of unbiased threads of occasions. For instance, when you’ve got gross sales knowledge for a number of shops, every retailer’s gross sales could be checked individually by a single mannequin primarily based on the shop identifier. These outliers could be detected in time-series knowledge utilizing the Snowflake built-in class SNOWFLAKE.ML.ANOMALY_DETECTION.

Please comply with the steps beneath to implement the anomaly detection in a time-series dataset.

  • Create an anomaly detection object by passing the coaching knowledge. This object matches a mannequin to the coaching knowledge that you just present.
  • Utilizing this anomaly detection mannequin object, name the DETECT_ANOMALIES operate to determine anomalies by passing the info to research.

On this article, I will likely be specializing in leveraging the SNOWFLAKE.ML.ANOMALY_DETECTION operate to detect anomalies in superstore gross sales.

Information Setup and Exploration

On this article, we will likely be utilizing the historic Know-how gross sales knowledge for a superstore. The next code can be utilized to discover the know-how gross sales. 

choose * from superstore.superstore_ml_functions.superstore_sales the place class = 'Know-how';

Having explored the historic gross sales, let’s create a desk to retailer the final 12 months of gross sales. This knowledge will likely be used as coaching knowledge.

CREATE OR REPLACE TABLE superstore_tech_sales_last_year AS (
    SELECT
     to_timestamp_ntz(Order_Date) AS timestamp,
        Phase,
        Class,
        Sub_Category,
        Gross sales
    FROM
        superstore_sales
    WHERE
        Order_Date > (SELECT max(Order_Date) - interval '1 12 months' FROM superstore_sales the place class = 'Know-how')
        and class = 'Know-how'
    GROUP BY
        all
);

CREATE OR REPLACE TABLE superstore_tech_sales_historical AS (
    SELECT
        to_timestamp_ntz(Order_Date) AS timestamp,
        Phase,
        Class,
        Sub_Category,
        Gross sales
    FROM
        superstore_sales
    WHERE
        Order_Date 

Figuring out Anomalies

On this part, we are going to give attention to creating coaching datasets, evaluation datasets, and fashions to detect the anomalies in a time collection dataset.

The next code can be utilized to create coaching datasets for six months of historic gross sales to complement the forecast fashions to detect the anomalies.

CREATE OR REPLACE VIEW superstore_tech_sales_historical_training
  AS SELECT timestamp,sum(gross sales) as gross sales  FROM superstore_tech_sales_historical the place timestamp = '2022-01-01' group by timestamp;

After creating the coaching dataset, let’s create the mannequin to detect the anomalies utilizing SNOWFLAKE.ML.ANOMALY_DETECTION class, key parameters to this operate are as follows.

  • Mannequin title: anomaly_basic_model
  • Coaching dataset: superstore_tech_sales_historical_training

Together with these two key attributes, we additionally must specify the timestamp column and key metric column within the dataset. In our use, SALES is the important thing metric the place we wish to determine the outliers. This name may take a couple of minutes to construct the fashions.

  CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION anomaly_basic_model(
  INPUT_DATA => TABLE(superstore_tech_sales_historical_training),
  TIMESTAMP_COLNAME => 'TIMESTAMP',
  TARGET_COLNAME => 'SALES',
  LABEL_COLNAME => '');

Now that we have now the fundamental mannequin prepared, we are going to create the info to research utilizing this mannequin. You need to use the code beneath to create a view. This view would be the supply for the anomaly detection.

CREATE OR REPLACE VIEW superstore_tech_sales_for_analysis
  AS SELECT timestamp,sum(gross sales) as gross sales  FROM superstore_tech_sales_last_year the place timestamp 

After creating the coaching knowledge, fashions, and evaluation datasets, the ultimate stage on this course of is to determine the anomalies. Please use the code beneath to search for the outliers in our evaluation dataset.

CALL anomaly_basic_model!DETECT_ANOMALIES(
  INPUT_DATA => TABLE(superstore_tech_sales_for_analysis),
  TIMESTAMP_COLNAME =>'TIMESTAMP',
  TARGET_COLNAME => 'SALES'
);

The above code will forecast the actuals, decrease band, and higher band, and likewise name out if there are any anomalies within the knowledge.

Whereas detecting the anomalies, you too can present the labeled knowledge to the mannequin. For instance, if you wish to determine a number of irregular gross sales and wish the forecast fashions to think about them as outliers whereas forecasting the gross sales, you should utilize labeled knowledge as a part of the LABEL_COLNAME parameter. That is known as supervised anomalies detection.

The next code block will create a brand new coaching mannequin with an extra attribute known as LABEL. This will likely be a boolean kind to determine the outliers. Any sale of $1,000 or extra is being labeled right here. 

CREATE OR REPLACE VIEW superstore_tech_sales_historical_training_with_label
  AS SELECT DATE_TRUNC('day',timestamp) as timestamp,sum(gross sales) as gross sales, case when sum(gross sales) > 1000 then true else false finish as label  FROM superstore_tech_sales_historical the place timestamp = '2022-01-01' group by DATE_TRUNC('day',timestamp);

CREATE OR REPLACE SNOWFLAKE.ML.ANOMALY_DETECTION anomaly_labeled_model(
  INPUT_DATA => TABLE(superstore_tech_sales_historical_training_with_label),
  TIMESTAMP_COLNAME => 'TIMESTAMP',
  TARGET_COLNAME => 'SALES',
  LABEL_COLNAME => 'LABEL');

CALL anomaly_labeled_model!DETECT_ANOMALIES(
  INPUT_DATA => TABLE(superstore_tech_sales_for_analysis),
  TIMESTAMP_COLNAME =>'TIMESTAMP',
  TARGET_COLNAME => 'SALES'
); 

The next is the output for a supervised forecast mannequin.

Conclusion

On this article, we have now explored the Snowflake AI/ML capabilities to uncover the anomalies by creating forecast fashions. As a subsequent step, I might advocate continued studying the Snowflake Cortex framework. You’ll be able to discover designing anomaly visualizations, and create an automatic anomaly detection pipeline for recurring coaching and execution.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version