Logo

dev-resources.site

for different kinds of informations.

The Three Pillars of Successful Database Maintenance and Monitoring

Published at
2/14/2024
Categories
ownership
sql
mindset
database
Author
adammetis
Categories
4 categories in total
ownership
open
sql
open
mindset
open
database
open
Author
9 person written this
adammetis
open
The Three Pillars of Successful Database Maintenance and Monitoring

Developers and DevOps engineers don’t own their databases. They query the data, they implement schema migrations, but they rarely maintain databases, tune the performance, and solve the issues when they appear. Instead, developers get help and support from the operations teams and database administrators.

Is this efficient? We learned with the DevOps movement that minimizing communication and putting people together to work hand-in-hand improves performance and makes the organization work faster. We also learned that DevOps engineers can efficiently manage both the development and deployment parts of the software development lifecycle. However, we rarely do that for our databases. In this article, we explain why shifting the ownership of the databases to developers and DevOps engineers is a must and how to achieve that.

Recommended reading : Why DevOps Should Own Their Databases and How to Do It

Why We Need the Shift of Ownership

When developers don’t own their databases, they can’t maintain the performance and fix issues on their own. They need to have support from other teams which poses multiple challenges. Let’s see these challenges to understand why we need to shift the ownership to the developers.

First, developers often don’t have access to the production database. They can’t access the data, they can’t see the performance metrics, they can’t check how the database is configured. This makes it much harder to reason about the performance when it goes down. Developers can’t see what indexes would be helpful or what the data distribution is. Whenever there is a need to tune the performance of the applications accessing the database, developers need to rely on other teams to provide enough insights on what’s going on and how to improve it.

Human communication is slow. Opening tickets or issues to other teams, coordinating the communication, and making sure artifacts (logs, traces, schemas) are delivered promptly is hard. We already learned with the DevOps movement that it’s beneficial to minimize communication. We need to do the same with our database management. We need to minimize or even eliminate communication and let developers work on the issues in a self-serve manner. Slow communication makes people work less efficiently and makes troubleshooting much longer due to more context switches. Everyone involved in the communication works on multiple things when waiting for the response and this leads to feeling overloaded. One of the principles of Kanban is tolimit the work in progress. The more communication we have, the more work in progress we need to deal with.

Another issue is the lack of motivation. If developers don’t own their databases, they feel detached from the problems. The “not my job” attitude leads to more issues because people don’t try to prevent them as it’s not in their ownership. This is often connected to not being exposed to the actual clients’ feedback and how people use the application. There are many ways to motivate people to do more and still feel less tired. Lack of motivation also makes people leave the company more often which makes others less efficient as they need to spend more time on onboarding and transferring knowledge.

Yet another issue with lack of ownership is the lack of development of skills. Developers do not learn databases because they don’t work with them. Even if they wanted to work on their own, it’s much harder because they don’t gain the practical working knowledge of the systems. They only raise issues and wait for others to provide solutions. This makes people feel siloed even more and never leave their comfort zones. To let people develop, we need to challenge them and let them solve problems they are not familiar with. This is loosely connected to a fungible developer that was popular in the industry. The idea is to be able to replace developers with other developers with no loss (or gain) in productivity. While this sounds great in theory, reality shows that it doesn’t work. However, lack of ownership makes developers feel less attached to the products and it makes them fungible.

Finally, a lack of ownership means that more teams are needed. This simply increases the cost as more people need to be involved in the work. Increased costs obviously limit the potential of your company and make your business run less efficiently. Just like we want to optimize our CI/CD pipelines to work faster and cut on resource usage, we should achieve the same goals with less work and fewer people so that others can work on something else.

Three Pillars of Ownership

What we need is true ownership of databases. Developers need to own their databases the same way they own their code and deployments. There are many reasons for that and many benefits. However, we know they can’t take the ownership on their own and they need help from other teams like platform engineers. There are three things that developers and DevOps engineers need to own their databases:

  • Database Observability and understanding built with tools that are database-aware and database-focused
  • Processes that define how to use the database observability tools and how to proceed in a self-service mode
  • The mindset of the true owners

Let’s now examine each of these in detail.

Owning Databases with Database Observability

To own a database, developers need to have tools that let them peek behind the scenes. These tools cannot just show raw metrics like CPU consumption. They need to focus on database-oriented details around transactions, execution plans, background tasks, fragmentation, partitioning, sharding, and many other things that are specific to databases.

Additionally, these tools must bring an understanding and coherent story. They need to connect multiple dots to present the flow and evolution of the system. They need to be aware of deployments, migrations, traffic patterns, and other intricacies of the database world. They need to be aware of the database engine edition and capabilities to suggest better solutions that developers can use.

Recommended reading : Observability vs Monitoring: Key Differences & How They Pair

What Database Observability Brings

Proper database observability has multiple benefits:

  • Observability can bring a coherent story that explains what happened. Instead of examining raw metrics, it can show the true understanding of what’s going on from a business perspective.
  • They can set thresholds for alarms and alerts automatically as they can observe the actual performance and see how it changes when background tasks start or when the traffic changes.
  • Observability tools can troubleshoot issues automatically since they can track the database configuration and changes, and they can submit pull requests automatically.
  • Observability tools can detect anomalies more easily as they don’t need to wait for historical data. They can just track the deployments and see whether changes are expected or not.

All these things result in the following improvements to KPIs:

  • They reduce mean time to repair (MTTR) and mean time to detect (MTTD)
  • They reduce communication effort as developers don’t need to reach out to other teams
  • They increase the ability to self-serve the issues and free up the capacity of people (database administrators, developers, DevOps engineers, etc.) as they don’t need to be involved in troubleshooting that can be done automatically.

How Metis Helps with Database Observability

Metis brings database-oriented dashboards. They show the true insights into what happens with the database, configuration, data volume, and maintenance tasks:

You can see the transactions (1), rows (2), temporary files (3), cache hits (4). You can also examine table sizes (5), schema insights (6), indexes (7), queries (8) and extensions (9). This is how you can easily analyze whether the database is healthy. But there is more. Metis can analyze actual queries and see how they change over time:

This page shows the query content, duration, and number of calls. Most importantly, Metis can present insights into how to fix queries:

Metis can automate the database maintenance. Database administrators don’t need to keep an eye on the database anymore and developers can do it on their own.

Owning Databases with Good Processes

Developers need to have well-defined processes to monitor and maintain their databases. When something goes wrong, there is no time to figure out what to do. Teams need to act fast and know exactly how to tackle the issue. They need to have standard operating procedures, playbooks, and instructions they can follow to avoid making things even worse.

Good processes make things run faster. This applies to all parts of the software development lifecycle. Just like we have CI/CD and automated tests, we need to have automated reviews that focus on databases. It’s not enough just to check the code. We need to check schema migrations, configurations, partitioning, and performance changes. All of that should be an inherent part of the flow.

When things break, systems need to notify about that automatically. They need to know when something is an issue to avoid false positives. Notifications need to be sent to the right people and over channels they can follow. There is no point in sending emails when developers use rules to mark these messages as read automatically. Low-priority messages can’t be mixed with high-priority ones to avoid clutter and reduce the cognitive load. All these things need to be configurable to make sure they are convenient for developers.

What Good Processes Bring

Smooth processes give the following:

  • Communication is reduced as fewer people need to be involved. Less communication means shorter MTTR.
  • Developers can work efficiently as they don’t need to figure out what to do. They can follow playbooks and standard procedures.
  • There is no need to manually review metrics and dashboards. All the tools can alert automatically which reduces MTTD.
  • Fewer teams are involved in the troubleshooting and developers can fix the issues in a self-serve manner which frees up the capacity of people.
  • The number of critical issues is reduced as issues are detected earlier.
  • Newcomers can onboard much faster as the processes are written down and standardized.
  • There is no loss of tribal knowledge when people move to different organizations.

How Metis Helps with Good Processes

Metis can analyze performance in developers’ environments before any changes are deployed to production:

These statistics can show traffic (1), a list of aspects of database interaction (2), various insights (3) with details (4), explanations (5), and remediation (6).

Metis can analyze schema migrations and performance as part of the CI/CD process:

By extending the regular CI/CD flow with Metis, developers don’t need to check various aspects manually. They can get automated comments and checks right in their CI/CD pipelines which they use today.

Metis can automatically detect anomalies and notify developers over various communication channels. Metis can be configured per project, per team, or per organization. This way developers can get dashboards and tools for the projects they work on and not be swamped with notifications they are not interested in.

Owning Databases with Mindset

Finally, we need to change our mindset. We need to challenge the typical understanding that developers don’t need to understand databases. Just like we shifted ownership of deployments and infrastructure configuration, we need to move the database maintenance to the people that are directly involved.

This is a tremendous challenge in the industry. Every change is challenging as people need to adapt to the new reality. It may also sound counterintuitive at first to put more responsibility on the developers who are already overloaded with their work. However, it’s not about making them do more work. It’s about replacing their existing work (communication, coordination, debugging, troubleshooting, manual load tests or time-consuming fixes) with automated work they can do much faster and with fewer parties involved.

True owners always look for improvements and optimizations. If developers don’t own databases, they can’t optimize and improve the maintenance. They need to be given the responsibility so they can shine and show their best.

Try these exercises to make a shift of mindset: Focus on hands-on database projects, learn query optimization, and practice data modeling. Simulate real-world scenarios, master backup/recovery techniques, and engage with DBA communities. Emphasize a mindset of security and performance, and continuously adapt by reflecting on new challenges.

What Mindset Brings

Benefits of the right mindset:

  • People care about what they own and look for improvements. These may be small things that we can’t track directly with KPIs but they can still affect work satisfaction or decrease the number of issues.
  • MTTR is reduced as maintainers are much closer to the systems.
  • We leave no broken windows anymore.

How Metis Helps with Mindset

Metis promotes the attitude of owning everything end-to-end. Developers can configure their databases and observability tools as they need so they are convenient to use. Developers don’t need to ask for permission anymore. They have everything they need and can move much faster. This promotes remote work and asynchronous environments that don’t waste time on synchronization and communication.

Metis considers databases true first-class citizens. The same way we care about our code quality or test logic automatically, we need to care about our databases and make sure they run smoothly. This is what Metis does with database guardrails.

What to Do Next

Build your database guardrails with Metis. Challenge your team’s organization and processes. Look for new solutions that can bring the maintenance to the teams that have an impact. Share responsibility and ownership with developers and empower them to change their mindset. Metis lets you do all of that but it all starts with you.

Featured ones: