Logo

dev-resources.site

for different kinds of informations.

Snowflake Advanced Cost Control: Ingest Optimizations

Published at
6/17/2024
Categories
data
snowflake
costoptimization
datasuperhero
Author
kiniama
Author
7 person written this
kiniama
open
Snowflake Advanced Cost Control: Ingest Optimizations

For Large Data Set Ingestion — Advanced Cost Control Measures Are Required that Optimize Data Sets at Before or At Ingest

What approaches would work if you are constantly ingesting data, real-time or not, if you have petabytes of data or billions of rows it can add up and account for a lot of the cost in your Snowflake and Cloud implementations.

A recent client was ingesting a high volume of data that was repetitively processed daily, but changes in data updates existed even for records up to five years before.

Ultimately, you will need to try to reduce the initial data set or change how you are ingesting your data.

Tip 1: Snowpipe Streaming

For streaming, move to Snowpipe Streaming. It gives you considerable savings. One person I spoke with recently talked about the cost going from 500 dollars/day to 27 dollars/day when they moved from Snowpipe to Snowpipe Streaming. Snowflake Sales Engineers estimate Snowpipe is about 1/3 of the cost, but the example is much better.

If you are looking to learn how to implement Snowpipe Streaming, try this lab.

Tip 2: Optimize for Deltas

When dealing with large datasets resulting from multiple API calls, it’s better to optimize deltas, which is the difference between two subsequent load dates. Typically, only around 5% of the data changes daily, so instead of using the entire dataset for each load, we can analyze the changes from the previous load and use only the 5% of data that has changed. This approach can significantly improve the next steps in the data analysis process.

Even if it is not an API call, if the ingestion requires a comparison of tables, there you would still benefit from this tip. This is a large data set that may have to unpack and clean, or dedupe and merged.

Tip 3: Don’t keep the history in the intermediate tables

Often, historical data are kept in the intermediate-working tables. Our take is: don’t keep it for more than one reason. First, it will save storage. Secondly, it reduces the risk of an inefficient execution plan. Third, it provides an opportunity to use a better clustering key, as you no longer have to partition your data by load date.

Tip 4: File size for ingestion: 100–250 MB

It seems small, but following Snowflake’s recommendation for file size is a great help. It does seem to improve ingestion, as it makes any COPY INTO statements as efficient as possible. If you can control file sizes, it would be great. Smaller files reduce the overhead charge relative to the amount of total data loaded to the point where the overhead cost is immaterial. An efficient data pipeline is a cheaper pipeline.

Tip 5: Parquet format

The final tip not to be discounted is to use the Parquet format instead of the typical CSV.

Parquet provides an efficient compression with smaller file sizes and improved columnar storage. With better compression ratios and faster read times for analytical queries that access specific columns. Smaller file sizes mean faster transfer rates and reduced storage costs. With the optimized structure and smaller size, you can parallel process things.

Plus, you gain metadata, making your queries faster as well.

Conclusion

As we have explored, managing costs while ingesting extensive data sets into Snowflake requires a strategic data handling and processing approach. Organizations can significantly reduce their cloud and Snowflake costs by optimizing data ingestion through Snowpipe Streaming, focusing on deltas, and minimizing unnecessary historical data storage.

Incorporating the strategies mentioned in this blog can help you reduce expenses and improve the overall efficiency of your data management system. These tips should be viewed not only as cost-saving measures but also as essential components of a strong data strategy promoting scalable, efficient, and cost-effective Snowflake operations. To ensure your data processes remain lean and robust, encourage a culture of continuous learning and keep yourself updated with the latest methods and trends.

If you do need help, check out Infostrux Solutions approaches to Data Engineering.

I am Augusto Rosa, VP of Engineering for Infostrux Solutions. Snowflake Data Super Hero and SME. Thanks for reading my blog post. You can follow me on LinkedIn.

Subscribe to Infostrux Medium Blogs https://medium.com/infostrux-solutions for the most interesting Data Engineering and Snowflake news.

Sources:


costoptimization Article's
30 articles in total
Favicon
Cost Optimization Strategies for AWS: Leveraging Spot Instances, Savings Plans, and Cost Explorer
Favicon
Optimize Cloud Expenses with CloudCADI
Favicon
FinOps Consulting Services for Cloud Cost Optimization
Favicon
Automating Cost Optimization Insights with AWS SAM: A Well-Architected Framework Solution
Favicon
S3 Lifecycle or Intelligent-Tiering? Object Size Always Matters
Favicon
Azure CAF for Cost Optimization: Reducing Cloud Spend Without Compromising Performance
Favicon
Unlocking AWS Cost Savings: Unique Optimization Tips
Favicon
SaaS Cost Optimization on AWS: Effective Strategies for Managing Cloud Expenses
Favicon
15 Cut Points to Save AWS Bills
Favicon
Optimizing Software Licensing Costs with AWS License Manager
Favicon
Open Source Tools for AWS Cost Optimization: The Ultimate Guide
Favicon
Snowflake Advanced Cost Control: Ingest Optimizations
Favicon
AWS Cloud Cost Optimization - Identifying Stale Resources
Favicon
AWS Cost Optimization: Periodic Deletion of ECR Container Images
Favicon
Are You Wasting Money in Your Software Development Project?
Favicon
Mastering AWS Instance Pricing: A Guide to Choosing the Right Plan for Your Needs
Favicon
AWS announces a 7-day window to return Savings Plans
Favicon
Saving 90% of our AWS Cost using ECR Lifecycle Rules
Favicon
Maximizing SPOT Instance Efficiency Strategies
Favicon
Revolutionizing Assessments: A Dive into AWS Well-Architected Tool's Newest Feature – “Review Templates”
Favicon
Organization's EKS Clusters Discovery
Favicon
Architecting for Cost Savings on BigQuery
Favicon
Cost optimisation on AWS: Navigating NAT Charges with Private ECS Tasks on Fargate
Favicon
Resizing images on-the-fly
Favicon
AWS Cloud Cost Dilemma: Savings Plans vs. RIs - Your Definitive Guide
Favicon
Cloud Cost Control: Exploring 10+ Tools for AWS Spending Optimization
Favicon
With AWS Savings Plans - Next stop is AWS Budgets
Favicon
How to save cost in non-prod AWS environment
Favicon
Optimize Your AWS Costs - Get 60% Savings Now
Favicon
Maximizing Cost Optimization with Well-Architected Programs on AWS

Featured ones: