Breaking News

Prometheus High Availability and Fault Tolerance strategy, long term storage with VictoriaMetrics

Prometheus anyway, and the development team behind it, are focused on scraping metrics. It’s a particularly great solution for short term retention of the metrics. Long term retention is another…


We have a Prometheus and its ecosystem configured for HA and FT, we have multiple groups of Prometheus instances that are focused on their part of the infrastructure and they are relatively small.

Cool, but we are keeping the data for only, let’s say, 10 days, that’s probably the most important period to query but of course it’s not enough, what about long time storage for metrics?

Here come solutions like Cortex, Thanos, M3DB, VictoriaMetrics, and more others. They can collect the metrics from different Prometheus instances, deduplicate the duplicated metrics (you’ll have a lot of them, remember, every Prometheus instance you have is duplicated, so you have double metrics), and they can provide a single point of storage for all the metrics you are collecting.

Even if Cortex, Thanos, and M3DB are great tools, definitely capable of achieving the goal of long term storage for metrics, and also to be themselves HA and FT, we chose the newborn VictoriaMetrics. This article will not focus on comparing all those tools, but I am going to describe why we have chosen VictoriaMetrics.

VictoriaMetrics is available in two different configurations, one is an all-in-one solution, easier to configure, and with all the components together (it’s a good and stable solution, also capable to scale, but only vertically, so it can be a choice for you depending on your needs) and the cluster solution, with separated components, so you can scale vertically and horizontally, for every single component.

We like complex things (that’s definitely not true) so we decided to use the cluster solution.

The cluster version of VictoriaMetrics is composed of three main components, the “vmstorage” (responsible for storing the data), the “vminsert” (responsible for writing the data into the storage), and the “vmselect” (which is responsible for querying the data from the storage). The tool is very flexible, and the vminsert and vmselect are sorts of proxy.

Vminsert, as said, is responsible for inserting the data into the vmstorage. There are many options that you can configure, but for the scope of this article, it’s important to know you can easily duplicate vminsert in an arbitrary number of instances, and put a Load Balancer in front of them as a single point of injection for incoming data. Vminsert is stateless, so it’s also easy to manage,…

luca carboni

Read full article

Source link