“Don’t have duplication of effort across your monitoring stack.”

Share

JAXenter: What are the top challenges associated with managing IT infrastructure in mixed (cloud, multi-cloud and hybrid) environments?

Ciaran Byrne: The biggest challenge is dealing with the complexity. It’s not just a matter of cloud and on-premises, you have networks, servers, storage, virtual environments, containers, and applications that you have to discover and collect metrics on, and those are running in both cloud and on-premises environments. In most cases, you’ll be managing these mixed environments with multiple monitoring tools, leading to tool sprawl. You’ll have to make sense of large volumes of data coming from these mixed environments managed by a diverse toolset. The environments that are mixed will likely have inter-dependencies which may make it difficult to be aware of and troubleshoot issues. Troubleshooting may also be more complicated as each of the environments will have their own nuances for investigating and resolving issues that require operators and admins to have a broad range of skills.

Once you’ve “solved” the problem of monitoring these hybrid environments, you have to understand which parts of this hybrid infrastructure are supporting which application services. Then you have to respond effectively when a problem is detected, to triage and manage the incident, consolidate alerts to the same event, then route the incident for remediation. If you’re doing this all manually, it’s a long and cumbersome process that will take too much of your IT ops team’s time, so you need to be able to automate it as much as possible.

SEE ALSO: “Cloud moved decision-making responsibility out of finance and into engineering”

JAXenter: What does an IT Ops pro need to know about monitoring the major cloud platforms?

Ciaran Byrne: If you’re talking about the three major public cloud platforms–AWS, Azure, and Google Cloud Platform–they all have their own sets of services that have to be monitored, each one has more than 50 different services across compute, storage, network, database, containers, security, IoT, etc. So there’s no one-size-fits-all approach that you can take. That said, these services are fairly similar so a user familiar with one should be able to quickly learn another. They all have their own monitoring tools like Amazon CloudWatch, Azure Monitor, and Google Cloud Operations Suite, that provide data that you need to aggregate and integrate. You’ll want to use an agent-based approach for some services, an API-based approach for others. You’ll want to use a query tool that’s designed for these environments like PromQL rather than using the same query tool you’d use for your internal environment, like SQL.

JAXenter: What are some best practices associated with managing mixed environments?

Ciaran Byrne: For starters, eliminate the swivel chair. Even if you take a best-of-breed approach to monitoring, you’ll want to bring all of those metrics–server, storage, virtualized environments, network, cloud, containers, applications–into one place for aggregation and integration.

These are dynamic and changing environments so you’ll want to use ML/AI techniques to manage them, so that your management system is continually learning and updating. This makes incident management, event correlation, and alert consolidation so much easier.

Don’t have duplication of effort across your monitoring stack. Your APM tools are great at application metrics, but probably aren’t telling you anything about your infrastructure that you don’t already get from your ITIM stack.

Deploy service-aware topology maps to understand the dependencies between your business services and IT services. Once you understand the business service dependencies, you can then get a better handle on how much your cloud services and other IT services are costing you and which services can be shut off or retired without impacting your business.

The post “Don’t have duplication of effort across your monitoring stack.” appeared first on JAXenter.

Source : JAXenter