Observability is a vital pillar of any software, and monitoring is a vital part of it. Having a well-suited, strong monitoring system is essential. It could actually show you how to detect points in your software and supply insights as soon as it’s deployed. It aids in efficiency, useful resource administration, and observability. Most significantly, it might probably show you how to save prices by figuring out points in your infrastructure. One of the vital standard instruments in monitoring is Prometheus.
It units a de facto customary with its easy and highly effective question language PromQL, nevertheless it has limitations that make it unsuitable for long-term monitoring. Querying historic metrics in Prometheus is difficult as a result of it isn’t designed for this goal. Acquiring a worldwide metrics view in Prometheus could be advanced. Whereas Prometheus can scale horizontally with ease on a small scale, it faces challenges when coping with lots of of clusters. In such situations, Prometheus requires important disk area to retailer metrics, usually retaining knowledge for round 15 days. As an illustration, producing 1TB of metrics per week can result in elevated prices when scaling horizontally, particularly with the Horizontal Pod Autoscaler (HPA). Moreover, querying knowledge past 15 days with out downsampling additional escalates these prices.
There are a lot of Initiatives like Thanos, M3, Cortex, and Victoriametrics. However Thanos is the preferred amongst these. Thanos addresses these points with Prometheus and is the best resolution for scaling Prometheus in environments with in depth metrics or a number of clusters the place we require a worldwide view of historic metrics. On this weblog, we’ll discover the parts of Thanos and can attempt to simplify its structure by constructing it step-by-step, beginning with the primary parts. We will even have a demo utilizing k6-metrics. Earlier than diving into Thanos, I like to recommend studying “Monitoring with Prometheus” in case you are not already conversant in Prometheus.
Thanos
Began in November 2017, Thanos is an open-source CNCF incubating venture with over 12.8k stars on GitHub. Constructed on prime of Prometheus, Thanos goals to offer a extremely accessible Prometheus setting with long-term storage help and a worldwide view of metrics. Firms like Disney, Abode, eBay, SoundCloud, and ByteDance use Thanos for monitoring at scale. Nevertheless, organising Thanos could be advanced and requires experience with Prometheus and trade expertise.
Now, let’s delve into the parts of Thanos and perceive its full structure.
Thanos Parts and Structure
Thanos Question/Querier
Thanos Question serves because the backend for Thanos, using the gRPC StoreAPI to retrieve knowledge from varied parts. It’s fully stateless and horizontally scalable, permitting it to question a number of sources and merge them into one, successfully avoiding duplicate metrics. With Thanos Question, knowledge could be fetched from varied sources. Beneath is an instance of retrieving knowledge from a Thanos Sidecar.
Thanos Question
Prometheus is unaware of StoreAPI, so Thanos Question requests metrics from the Thanos Sidecar. This fashion, Thanos Question not directly communicates with the Prometheus occasion in a sidecar structure. Whereas it’s attainable to deploy Thanos Question and not using a sidecar mannequin, earlier than that, let’s discover the advantages and functionalities of a sidecar mannequin.
Thanos Sidecar
The Thanos Sidecar can do extra than simply retrieve metrics from Prometheus. It could actually additionally retailer these metrics in an Object Retailer. Thanos Question can then use the Retailer Gateway part to fetch knowledge instantly from the Object Retailer, eliminating the necessity to request metrics from the Sidecar. This permits for diminished retention in Prometheus, leading to decrease disk area utilization and value financial savings. Sidecar sends TSDB block knowledge from Prometheus to the Object Retailer each two hours by default, which reduces Prometheus’s useful resource consumption.
To keep away from knowledge loss throughout the two-hour window, Prometheus ought to stay stateful. Nevertheless, to make Prometheus stateless, Thanos presents a part known as Thanos Receiver. Utilizing Thanos Receiver we will eradicate the sidecar mannequin. Earlier than delving into Receiver, let’s discover the performance of Thanos Retailer Gateway.
Thanos Retailer Gateway
Thanos Retailer Gateway implements the Retailer API, enabling Thanos Question to retrieve knowledge from the distant Object Retailer. Performing as an API gateway between the Object Retailer and Thanos Question, the Thanos Retailer facilitates environment friendly knowledge entry. The Thanos Sidecar can instantly push knowledge to this Object Retailer. The Retailer Gateway part retains some knowledge from the Object Retailer on its native disk, guaranteeing correct synchronization with the Object Retailer. Try the beneath illustration.
Thanos Retailer
The usage of an Object Retailer eliminates the necessity to retailer giant quantities of information on disk, serving to us save on prices. At any time when we require any knowledge, we will question it utilizing Thanos Question. The Thanos Question includes a dashboard part named Thanos Question Frontend, similar to that of Prometheus, the place customers can enter a PromQL question. The Thanos Question then makes use of the gRPC Retailer API to retrieve the information by way of the Thanos Retailer.
Thanos Compactor
Whereas we will retailer infinite quantities of information in an Object Retailer, long-term storage can turn into expensive. Downsampling our knowledge helps mitigate this problem. Once we downsample a block of information, we enhance the time interval of the information factors, for instance, from a one-minute block to a five-minute block. This not solely reduces storage prices but in addition enhances question efficiency utilizing PromQL.
Thanos Compactor
The Compactor is the only part in Thanos with the aptitude to delete knowledge from the Object Retailer whereas all different parts solely have write permissions. The Compactor consolidates a number of blocks of information into one, optimizing storage effectivity. It is best apply to run just one occasion of the Compactor towards an Object Retailer.
Thanos Ruler
Thanos Ruler evaluates the Prometheus recording and alerting rule towards the handed question and can be utilized for alerting goal. By default, the evaluated outcomes by Thanos Ruler are written again to the disk. The Thanos Ruler could be configured to retailer these ends in a distant Object Retailer.
Thanos Ruler
Thanos Receiver
Utilizing Thanos Receiver simplifies the complexities related to the Thanos Sidecar. When utilizing the sidecar, permissions have to be granted for sidecar parts to push metrics to the article retailer, which includes opening a brand new port for communication with the shop. Thanos Receiver eliminates this complexity.
With Thanos Receiver, Prometheus is configured to make use of its distant write function to ship metrics on to the receiver. The Thanos Receiver then pushes these metrics to the article retailer. The diagram beneath illustrates this setup. Prometheus repeatedly writes metrics to the Thanos Receiver, which, by default, pushes these metrics to the article retailer after two hours. To question metrics in real-time, the Thanos Receiver exposes a Retailer API for Thanos Question which could be helpful for builders to see stay metrics after deployment.
Thanos Receiver
Thanos Receiver wants to find out methods to distribute incoming time-series knowledge throughout totally different nodes. To deal with this, Thanos Receiver employs a hashring mechanism. When Thanos Receiver is configured on Kubernetes it takes the assistance of Thanos Receiver controller which automates the hashring administration. This part retains the hashring up-to-date when the Thanos receiver is scaled utilizing HPA or different auto-scalers.
Thanos Question Frontend
The Thanos Question frontend is a dashboard offered by Thanos that’s just like the Prometheus Dashboard. It additionally makes use of PromQL as its question language. With this part, customers can ask for metrics from the Thanos Question part.
Set up and Demo
On this demo, we’ll take a look at Thanos and scale Thanos receiver utilizing k6s-metrics.
- Putting in Minio for object storage
- Putting in Thanos and Prometheus
- Load take a look at utilizing k6s-metrics
Let’s begin by creating a form cluster.
sort create cluster --name my-cluster --config=
Putting in Minio (Object Retailer)
Minio is a well-liked open-source object storage, an alternative choice to AWS S3 that we’re utilizing right here in our native setup. In case you have S3 or related storage, you need to use it right here.
- Run the script beneath to put in Minio within the thanos-test namespace.
#!/bin/bash
set -e
kubectl create ns thanos-test
echo "Installing Minio using Helm charts..."
helm repo add bitnami https://charts.bitnami.com/bitnami
helm set up minio bitnami/minio --version 14.2.0 -n thanos-test
sleep 40
echo "Exposing Minio on 127.0.0.1:8080"
echo "Username for Minio: admin"
echo "Password for Minio: $(kubectl get secrets -n thanos-test minio -o json | jq -r '.data."root-password"' | base64 -d)"
kubectl port-forward svc/minio 8080:9001 -n thanos-test &
echo
- Entry the Minio dashboard at port 8080 and create a brand new bucket named
“thanos”
. Additionally, create an entry key and secret. As soon as achieved, create a secret as of beneath and exchange the entry key and secret area.
apiVersion: v1
sort: Secret
metadata:
title: minio-thanos
namespace: thanos-test
stringData:
objstore.yml: |
sort: S3
config:
bucket: "thanos"
endpoint: "minio.thanos-test.svc.cluster.local:9000"
insecure: true
access_key:
Secret_key:
yaml
Putting in Thanos and Prometheus
Please execute the next script to put in Thanos and Prometheus.
#!/bin/bash
echo "Installing Thanos in $(kubectl config current-context)"
helm repo add bitnami https://charts.bitnami.com/bitnami
helm set up thanos bitnami/thanos --version 15.1.0 -n thanos-test
sleep 60
echo "thanos is installed"
kubectl get all -n thanos-test
echo "Exposing thanos on 127.0.0.1:8081"
kubectl port-forward svc/thanos-query-frontend -n thanos-test 8081:9090 &
echo "Exposing grafana on 127.0.0.1:8082"
kubectl port-forward svc/grafana -n thanos-test 8082:3000 &
echo "Password for grafana: $(kubectl get secrets -n thanos-test grafana-admin -o json | jq -r '.data."GF_SECURITY_ADMIN_PASSWORD"' | base64 -d)"
echo "Username for grafana: admin"
echo "For mointoring purpose installing kube-prometheus-stack"
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts
helm set up kube-prometheus-stack prometheus-community/kube-prometheus-stack --version 58.2.1 -n thanos-test -f kube-prometheus-stack-values.yaml
sleep 60
echo "Prometheus installed connect with grafana at port 8082"
bash
Testing Utilizing k6s-metrics
Use the beneath script to check Thanos. You possibly can change the digital customers and different fields.
import { verify, sleep } from 'k6';
import distant from 'k6/x/remotewrite';
export let choices = {
vus: 100,
period: '800s',
};
const shopper = new distant.Consumer({
url: 'http://127.0.0.1:8085/api/v1/obtain',
});
export default perform () {
let res = shopper.retailer([
{
labels: [
{ name: '__name__', value: `test_metric_${__VU}` },
{ name: 'service', value: 'bar' },
],
samples: [{ value: Math.random() * 100 }],
},
]);
verify(res, {
'is standing 200': (r) => r.standing === 200,
});
sleep(1);
}
javascript
You should use Grafana to visualise the Thanos receiver consumption. Extra Grafana dashboards can be found right here.
Conclusion
Among the advantages of utilizing Thanos are:
- Lengthy-term metrics storage
- Save value by utilizing Object Retailer
- Environment friendly question with World View
- HA Prometheus occasion
- Information deduplication
Integrating Thanos into your monitoring setup can improve your software by offering entry to historic knowledge and overcoming the constraints of a standalone Prometheus setup. Moreover, Thanos will help scale back the prices related to Prometheus. Nevertheless, Thanos will not be the best resolution for everybody. Decide what’s greatest in your infrastructure.