Skip to main content

Data optimization is a must for maximum efficiency

Image Credit: Adobe

This article is part of a VB Lab Insights series paid for by Capital One.


For cloud-based companies, the ability to leverage nearly unlimited amounts of data can unlock possibilities that lead to more innovative products and experiences for customers. But more data coming from more sources can also lead to challenges.

Maybe you can’t find the right data when you need it. Maybe you’re having trouble accessing the data when you do find it. Maybe unlocking valuable insights from your data is requiring far too much processing. And when cloud data platforms charge for compute based on consumption, as most do, inefficiencies can result in unnecessary expenses that make you deprioritize unlocking value from your data.

When data was stored on-premises, you worried less about managing these inefficiencies. There were restraints like limited compute or processing power that prevented you from overspending. These shackles come off in the cloud, and you’re able to scale infinitely, on demand, anytime you need it. This requires companies to focus on data optimization to manage inefficiencies that are a result of this power or risk an increase in costs.

A quick solution would be to tightly control how employees access and use data. But that can limit the speed at which a business needs data to generate insights and make informed decisions. The smart answer is to give employees access to relevant, high-quality data to help fuel innovation, and focus on strategic optimization to efficiently manage that data.

It’s about balancing cost and performance

Prior to optimizing, you need to evaluate what cloud data platform is the right fit for your use case. Once you have decided, think about potential cost implications for the use case (i.e., storage costs, loading costs, computing power) and balance this against the potential value you can derive from that use case. If it’s a fit, you can then focus on managing the inefficiencies.

There will be inefficiencies even when you have made the right decision for your use case. Not everyone in your cloud data platform will be an expert in executing a use case most efficiently, so it’s important to make teachers out of a teaching moment. Focus on visibility, alerting and recommendations. This means providing users with visibility into potential inefficiencies, alerting users to inefficiencies quickly and generating recommendations for optimizing the inefficiency — all while making it a teaching moment.

On the optimization front, there are four areas of data optimization that will help you operate more efficiently, scale faster and get the most value out of your data:

1. Compute optimization

You don’t need the same compute size at all times. Workload varies depending on day of the week and time of day. You can schedule different warehouse sizes based on how queries are expected to run then, potentially saving a lot of resources.

2. Query optimization

A badly written query can take excess compute or scan far too much data than needed. Users can also run the same query more times than needed. These inefficiencies can waste money and slow down potential insights. Educating users on query writing can help avoid unwanted behaviors, and alerting mechanisms can give notice when a query is running for longer than expected or when it shouldn’t be.

3. Dataset optimization

Traditional data modeling techniques impact how you optimize your cloud data platform, so you should pay attention to decisions around star-schema, aggregate tables and materialized view. When managing petabytes of data in the cloud, you need a retention strategy in place to move, archive or permanently purge data after a certain period of time, and how you load data is also a consideration. Are you loading data that people aren’t accessing? Are you loading data in near real-time that is accessed only infrequently? Make sure your dataset management matches your actual use case to avoid unnecessary expenditures.

4. Environment optimization

Be vigilant about controlling costs in the development environment, which can add up. Enforce policies such as not exceeding a small compute in lower environments or shutting down compute immediately when not in use.

There are also times when it makes sense to spend for compute. Certain month-end reports need to run fast. You may need extra compute to meet an SLA. Efficiency matters then, too — because you want to make sure those priority processes aren’t slowed down by something that doesn’t need to be running.

Efficiency can mean different things to different companies. For some, it means running at the lowest possible cost. For others, it means making sure that the most important jobs finish when they need to. For most, it means striking the perfect balance between cost and performance. When companies optimize their cloud data architectures for maximum efficiency, they spend less time managing their data and more time managing their business.

Good tools can help make the job easier

Centralized tooling adds another layer of accountability for optimization. Tooling enables users to monitor for unexpected spikes in usage, or to be sure that when extra compute comes online for that big month-end report, it then gets shut down as soon as that report finishes running. Tools can provide broad visibility into data usage, identify new usage patterns and even provide recommendations to address problems quickly and proactively. We built Capital One Slingshot to help us optimize our costs, reduce waste and inefficiencies and accelerate time-to-value of Snowflake, while adhering to governance requirements.

Data will always be changing, and there will always be a need to focus on optimizing your data to reach maximum efficiency. When you’re operating efficiently, you’re able to cut through the noise and get insights from your data. In turn, that helps drive business results like hitting on the right pricing strategy faster than anyone else, being prepared to respond to natural disasters without disrupting business continuity, or seeing patterns that help you catch fraud before it becomes a problem. Ultimately, optimizing your data can help improve performance and drive tangible business value.

Salim Syed is VP and Head of Slingshot Engineering at Capital One Software.


VB Lab Insights content is created in collaboration with a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. Content produced by our editorial team is never influenced by advertisers or sponsors in any way. For more information. For more information, contact sales@venturebeat.com.