Avoiding Telemetry Knowledge Loss With Fluent Bit – DZone – Uplaza

Are you able to get began with cloud-native observability with telemetry pipelines? 

This text is a part of a collection exploring a workshop guiding you thru the open supply venture Fluent Bit, what it’s, a primary set up, and organising the primary telemetry pipeline venture. Discover ways to handle your cloud-native information from supply to vacation spot utilizing the telemetry pipeline phases masking assortment, aggregation, transformation, and forwarding from any supply to any vacation spot. 

Within the earlier article on this collection, we explored what backpressure was, the way it manifests in telemetry pipelines, and took the primary steps to mitigate this with Fluent Bit. On this article, we take a look at the best way to allow Fluent Bit options that can assist with avoiding telemetry information loss as we noticed within the earlier article.

You could find extra particulars within the accompanying workshop lab.

Earlier than we get began it is necessary to evaluate the phases of a telemetry pipeline. Within the diagram beneath we see them laid out once more. Every incoming occasion goes from enter to parser to filter to buffer to routing earlier than they’re despatched to its closing output vacation spot(s).

For readability on this article, we’ll break up up the configuration into information which can be imported right into a foremost fluent bit configuration file we’ll title workshop-fb.conf.

Tackling Knowledge Loss

Beforehand, we explored how enter plugins can hit their ingestion limits when our telemetry pipelines scale past reminiscence limits when utilizing default in-memory buffering of our occasions. We additionally noticed that we will restrict the scale of our enter plugin buffers to forestall our pipeline from failing on out-of-memory errors, however that the pausing of the ingestion can even result in information loss if the clearing of the enter buffers takes too lengthy.

To rectify this drawback, we’ll discover one other buffering answer that Fluent Bit presents, making certain information and reminiscence security at scale by configuring filesystem buffering.

To that finish, let’s discover how the Fluent Bit engine processes information that enter plugins emit. When an enter plugin emits occasions, the engine teams them right into a Chunk. The chunk dimension is round 2MB. The default is for the engine to position this Chunk solely in reminiscence.

We noticed that limiting in-memory buffer dimension didn’t remedy the issue, so we’re taking a look at modifying this default conduct of solely inserting chunks into reminiscence. That is achieved by altering the property storage.sort from the default Reminiscence to Filesystem.

It is necessary to know that reminiscence and filesystem buffering mechanisms will not be mutually unique. By enabling filesystem buffering for our enter plugin we routinely get efficiency and information security

Filesystem Buffering Suggestions

When altering our buffering from reminiscence to filesystem with the property storage.sort filesystem, the settings for mem_buf_limit are ignored.

As an alternative, we have to use the property storage.max_chunks_up for controlling the scale of our reminiscence buffer. Shockingly, when utilizing the default settings the property storage.pause_on_chunks_overlimit is ready to off, inflicting the enter plugins to not pause. As an alternative, enter plugins will swap to buffering solely within the filesystem. We will management the quantity of disk area used with storage.total_limit_size.

If the property storage.pause_on_chunks_overlimit is ready to on, then the buffering mechanism to the filesystem behaves identical to our mem_buf_limit situation demonstrated beforehand.

Configuring Harassed Telemetry Pipeline

On this instance, we’re going to use the identical careworn Fluent Bit pipeline to simulate a necessity for enabling filesystem buffering. All examples are going to be proven utilizing containers (Podman) and it is assumed you might be accustomed to container tooling comparable to Podman or Docker.

We start the configuration of our telemetry pipeline within the INPUT section with a easy dummy plugin producing a lot of entries to flood our pipeline with as follows in our configuration file inputs.conf (observe that the mem_buf_limit repair is commented out):

# This entry generates a considerable amount of success messages for the workshop.
[INPUT]
  Identify   dummy
  Tag    massive.information
  Copies 15000
  Dummy  {"message":"true 200 success", "big_data": "blah blah blah blah blah blah blah blah blah"}
  #Mem_Buf_Limit 2MB

Now make sure the output configuration file outputs.conf has the next configuration:

# This entry directs all tags (it matches any we encounter)
# to print to plain output, which is our console.
[OUTPUT]
  Identify  stdout
  Match *

With our inputs and outputs configured, we will now carry them collectively in a single foremost configuration file. Utilizing a file known as workshop-fb.conf in our favourite editor, guarantee the next configuration is created. For now, simply import two information:

# Fluent Bit foremost configuration file.
#
# Imports part.
@INCLUDE inputs.conf
@INCLUDE outputs.conf

Let’s now strive testing our configuration by working it utilizing a container picture. The very first thing that’s wanted is to make sure a file known as Buildfile is created. That is going for use to construct a brand new container picture and insert our configuration information. Notice this file must be in the identical listing as our configuration information, in any other case modify the file path names:

FROM cr.fluentbit.io/fluent/fluent-bit:3.0.4

COPY ./workshop-fb.conf /fluent-bit/and many others/fluent-bit.conf
COPY ./inputs.conf /fluent-bit/and many others/inputs.conf
COPY ./outputs.conf /fluent-bit/and many others/outputs.conf

Now we’ll construct a brand new container picture, naming it with a model tag as follows utilizing the Buildfile and assuming you might be in the identical listing:

$ podman construct -t workshop-fb:v8 -f Buildfile

STEP 1/4: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.4
STEP 2/4: COPY ./workshop-fb.conf /fluent-bit/and many others/fluent-bit.conf
--> a379e7611210
STEP 3/4: COPY ./inputs.conf /fluent-bit/and many others/inputs.conf
--> f39b10d3d6d0
STEP 4/4: COPY ./outputs.conf /fluent-bit/and many others/outputs.conf
COMMIT workshop-fb:v6
--> e74b2f228729
Efficiently tagged localhost/workshop-fb:v8
e74b2f22872958a79c0e056efce66a811c93f43da641a2efaa30cacceb94a195

If we run our pipeline in a container configured with constricted reminiscence, in our case, we have to give it round a 6.5MB restrict, then we’ll see the pipeline run for a bit after which fail attributable to overloading (OOM):

$ podman run --memory 6.5MB --name fbv8 workshop-fb:v8

The console output reveals that the pipeline ran for a bit; in our case, beneath to occasion quantity 862 earlier than it hit the OOM limits of our container setting (6.5MB):

...
[860] massive.information: [[1716551898.202389716, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah blah"}]
[861] massive.information: [[1716551898.202389925, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah blah"}]
[862] massive.information: [[1716551898.202390133, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah blah"}]
[863] massive.information: [[1    

We can validate that the stressed telemetry pipeline actually failed on an OOM status by viewing our container, and inspecting it for an OOM failure to validate our backpressure worked:

# Use the container name to inspect for reason it failed
$ podman inspect fbv8 | grep OOM

 "OOMKilled": true,

Already having tried in a previous lab to manage this with mem_buf_limit settings, we’ve seen that this also is not the real fix. To prevent data loss we need to enable filesystem buffering so that overloading the memory buffer means that events will be buffered in the filesystem until there is memory free to process them.

Using Filesystem Buffering

The configuration of our telemetry pipeline in the INPUT phase needs a slight adjustment by adding storage.type to as shown, set to filesystem to enable it. Note that mem_buf_limit has been removed:

# This entry generates a large amount of success messages for the workshop.
[INPUT]
  Identify   dummy
  Tag    massive.information
  Copies 15000
  Dummy  {"message":"true 200 success", "big_data": "blah blah blah blah blah blah blah blah  blah"}
  storage.sort filesystem

We will now carry all of it collectively in the principle configuration file. Utilizing a file known as the next workshop-fb.conf in our favourite editor, replace the file to incorporate SERVICE configuration is added with settings for managing the filesystem buffering:

# Fluent Bit foremost configuration file.
[SERVICE] 
  flush 1 
  log_Level information 
  storage.path /tmp/fluentbit-storage 
  storage.sync regular 
  storage.checksum off 
  storage.max_chunks_up 5 

# Imports part
@INCLUDE inputs.conf
@INCLUDE outputs.conf

A number of phrases on the SERVICE part properties may be wanted to clarify their perform:

  1. storage.path – Placing filesystem buffering within the tmp filesystem
  2. storage.sync– Utilizing regular and turning off checksum processing
  3. storage.max_chunks_up – Set to ~10MB, quantity of allowed reminiscence for occasions

Now it is time for testing our configuration by working it utilizing a container picture. The very first thing that’s wanted is to make sure a file known as Buildfile is created. That is going for use to construct a brand new container picture and insert our configuration information. Notice this file must be in the identical listing as our configuration information, in any other case modify the file path names:

FROM cr.fluentbit.io/fluent/fluent-bit:3.0.4

COPY ./workshop-fb.conf /fluent-bit/and many others/fluent-bit.conf
COPY ./inputs.conf /fluent-bit/and many others/inputs.conf
COPY ./outputs.conf /fluent-bit/and many others/outputs.conf

Now we’ll construct a brand new container picture, naming it with a model tag, as follows utilizing the Buildfile and assuming you might be in the identical listing:

$ podman construct -t workshop-fb:v9 -f Buildfile

STEP 1/4: FROM cr.fluentbit.io/fluent/fluent-bit:3.0.4
STEP 2/4: COPY ./workshop-fb.conf /fluent-bit/and many others/fluent-bit.conf
--> a379e7611210
STEP 3/4: COPY ./inputs.conf /fluent-bit/and many others/inputs.conf
--> f39b10d3d6d0
STEP 4/4: COPY ./outputs.conf /fluent-bit/and many others/outputs.conf
COMMIT workshop-fb:v6
--> e74b2f228729
Efficiently tagged localhost/workshop-fb:v9
e74b2f22872958a79c0e056efce66a811c93f43da641a2efaa30cacceb94a195

If we run our pipeline in a container configured with constricted reminiscence (barely bigger worth attributable to reminiscence wanted for mounting the filesystem) –  in our case, we have to give it round a 9MB restrict – then we’ll see the pipeline working with out failure:

$ podman run -v ./:/tmp --memory 9MB --name fbv9 workshop-fb:v9

The console output reveals that the pipeline runs till we cease it with CTRL-C, with occasions rolling by as proven beneath.

...
[14991] massive.information: [[1716559655.213181639, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah"}]
[14992] massive.information: [[1716559655.213182181, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah"}]
[14993] massive.information: [[1716559655.213182681, {}], {"message"=>"true 200 success", "big_data"=>"blah blah blah blah blah blah blah"}]
...

We will now validate the filesystem buffering by wanting on the filesystem storage. Examine the filesystem from the listing the place you began your container. Whereas the pipeline is working with reminiscence restrictions, it will likely be utilizing the filesystem to retailer occasions till the reminiscence is free to course of them. In the event you view the contents of the file earlier than stopping your pipeline, you will see a messy message format saved inside (cleaned up for you right here):

$ ls -l ./fluentbit-storage/dummy.0/1-1716558042.211576161.flb

-rw-------  1 username  groupname   1.4M Might 24 15:40 1-1716558042.211576161.flb

$ cat fluentbit-storage/dummy.0/1-1716558042.211576161.flb

??wbig.information???fP??
?????message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
?p???message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
߲???message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
?F???message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
?d???message?true 200 success?big_data?'blah blah blah blah blah blah blah blah???fP??
...

Final Ideas on Filesystem Buffering

This answer is the best way to cope with backpressure and different points which may flood your telemetry pipeline and trigger it to crash. It is value noting that utilizing a filesystem to buffer the occasions additionally introduces the bounds of the filesystem getting used.

It is necessary to know that simply as reminiscence can run out, so can also the filesystem storage attain its limits. It is best to have a plan to handle any attainable filesystem challenges when utilizing this answer, however that is outdoors the scope of this text.

This completes our use circumstances for this text. Make sure you discover this hands-on expertise with the accompanying workshop lab.

What’s Subsequent?

This text walked us by how Fluent Bit filesystem buffering supplies a data- and memory-safe answer to the issues of backpressure and information loss.

Keep tuned for extra hands-on materials that will help you along with your cloud-native observability journey.

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *

Exit mobile version