For those who’ve labored within the operations house for the final 5+ years, you’ve doubtless heard of or have began utilizing Prometheus. The proliferation of Prometheus for time sequence metrics formatting, querying and storage throughout the open supply world and enterprise IT has been shockingly quick, particularly with groups utilizing Kubernetes platforms like Google Kubernetes Engine (GKE). We launched Google Cloud Managed Service for Prometheus final yr, which has helped organizations remedy their scaling points in the case of managing Prometheus storage and queries.
There’s so much to like in regards to the intensive ecosystem of Prometheus exporters and integrations to watch your utility workloads and visualization instruments like Grafana, however we are able to generally hit challenges when attempting to leverage these instruments past Kubernetes based mostly environments.
Crossing the chasm to the remainder of your setting
What should you’re seeking to unify your metrics throughout Kubernetes clusters and providers operating in VMs? Kubernetes makes it simple for Prometheus to auto-discover providers and instantly begin ingesting metrics, however as we speak there isn’t a frequent sample for locating VM situations.
We’ve seen just a few prospects attempt to remedy this and hit some points like:
Constructing in-house dynamic discovery methods is difficult
We’ve seen prospects construct their very own API discovery methods towards the Google Compute APIs, their Configuration Administration Databases, or different methods they like as sources of fact. This could work however requires you to keep up this technique in perpetuity and normally requires constructing an occasion pushed structure for life like timeline updates
Managing their very own daemonized prometheus binaries
Perhaps you like systemd on Linux. Perhaps not a lot. Both approach, it’s definitely doable to construct a Prometheus binary, daemonize it, and replace its configuration to match your anticipated habits and in addition scrape your native service for Prometheus metrics. This could work for a lot of but when your group is attempting to keep away from including technical debt like most are, this implies you continue to should now observe and keep the prometheus work. Perhaps that even means rolling your individual RPM to keep up this and managing the SLAs for this daemonized model.
There could be a variety of pitfalls and challenges with extending Prometheus over to the VM world despite the fact that the advantages of a unified metric format and question syntax like PromQL are clear.
Making it less complicated on Google Cloud
To make standardizing on Prometheus simpler for you, we’re happy to introduce help for Prometheus metrics within the Cloud Ops Agent, our agent for gathering logs and metrics from Google Compute situations.
The Ops Agent was launched in 2021 and was based mostly on the OpenTelemetry challenge for metrics assortment, offering a substantial amount of flexibility from the neighborhood. That flexibility consists of the power to ingest Prometheus metrics, retain their form, and add it to Google Cloud Monitoring whereas sustaining the Prometheus metric construction.
Which means beginning as we speak you may deploy the Ops Agent and configure it to scrape Prometheus metrics.
Right here’s a fast walkthrough of what that appears like: