Wednesday, 1 December 2021

Send All Azure App Insights Or Logs Data To Tool Like Datadog

 

Introduction

Microsoft Azure provides a variety of insights and monitoring logs that you can monitor to check the state of a resource and its performance. We can send these metrics and insights to other tools and create several alerts and performance metrices of our own. Datadog is a tool that provides a rich experience to users who are monitoring resources on the cloud or on-premises.

How to transfer Azure metrices to Datadog?

We can transfer all of the Azure insights to datadog in few simple steps using Eventhub. The original purpose of Eventhub is to transfer millions of requests per second which is best suited for our purpose.

Send All Azure App Insights or Logs data to tool like Datadog

Create an eventhub

First, we will create an eventhub instance in azure,

Once an eventhub is created you can send your insights or metrices data to eventhub from where DataDog will consume it.

Go to your app insights and find Diagnostic settings,

Here you will see that we can variety of metrices and logs, click on add diagnostics setting,

Send All Azure App Insights or Logs data to tool like Datadog

In the log selection select whichever you want to feed to the event hub, In the right section select "stream to eventhub" and in right lower section select details of eventhub namespace and eventhub instance.

Now you are all set from the Azure side. It's time to move to Datadog.

For datadog to read the eventhub:

  1. Create a app registry
  2. Give read-only access to newly created eventhub

Datadog can read these events from evethub and generate interactive and rich performance metrices and alerts.

I hope this was helpful.

Thanks.

Implementing Circuit Breaker using Azure EventHubs And Azure Logic App

 

Introduction

Circuit breaker logic is an extreme condition where one needs to stop the application automatically if something is wrong. This problem can be with our application itself or with the third-party APIs on which our application is dependent. To implement circuit breaker logic you must have a good understanding of retry policies and transient exceptions. in electronic terms circuit breaking or opening a circuit means cutting the wire at a single point to stop the flow of current so it does not harm the wire or components connected to that wire.

Structure

We have an existing eCommerce system say Contoso which gets a million hits per second. In Contoso, we have a customer module hosted as a web service. We have an order processing module which is a internal module which indicates the warehouse to start processing the items in the order. We wanted to make both the modules decoupled and async so we introduced a serverless layer that employes azure functions and event hubs.

We used event hubs because it is capable of handeling millions of requests per second until the threshold is reached i.e. 1000 requests per second or 1 MB of request data per second and we have more control on retries in cases of event hub then service bus or queue storage. When a customer makes an order we store the order in any DB and push the data in the event hub. The azure function is triggered by the event hub which sends the order data to the backend process via Refit.

Requirement

After this new implementation, a new problem arose. What if the backend order processing app is down for few minutes? In this case, the azure function will start giving errors for every request filling the precious logs with errors. You also know that if the backend API is down that function is running purposeless and just increasing compute units and cost. Also because of this we have to implement a lookup service to retrieve the lost data which was not processed because of backend pai downtime.

So our requirement is to know if our service is giving a large number of error in a limited time so that we can stop the function app and fix the issue. We also want to retry one message almost 5 times if it generates a transient error

Now comes the benefit of the event hub. Events hubs maintain a pointer which indicates that up to this location events have been processed. When the event hub delivers a message to the function app it expects a response, if the response is a success then the event hub increases the pointer by one otherwise it does not increase the counter and the same message can be retried.

Solution

We can either use [FixedDelayRetry] or [ExponentialBackOffRetry] attributes to start the retry for more details see here.

[ExponentialBackoffRetry(5, "00:00:05", "00:00:50")]

The above attribute will retry the same request 5 times after the first request. It will delay the first request by 5 seconds and the last request by 50 seconds. Now we have made one change in our code. in catch blocks instead of just logging the error, we need to throw so that the event hub knows that and it does not increase the pointer Now before stopping the function app we have to create the azure logic app.

You can see in the logic app designer I have added one trigger and one action. You can use HTTP Trigger or event grid trigger but I already had an event hub employed for this so I used event hub trigger.

Configure the trigger with eventhub instance name, consumer group (default or any), and other details set connection to eventhub.

Connect to eventhub using APIConnection.

Here select the eventhub namespace and press create. After this add action as the next step and use Azure resource manager to stop the function app with the following configuration,

Here you can filter the data according to your need or before this HTTPAction, you can introduce a condition to filter the data. and you can also add additional email send action to notify the DevOps saying "something is wrong stopping function app"

You need to make sure that this logic app has appropriate access to the function app which it is going to stop.

Now the final circuit breaker logic and how to call this logic app.

Create a Redis cache instance in Azure where we will store our error count. I wanted to stop the function app I received 40 errors within 100 seconds at any two-point in a timeframe so I set up a cache with sliding expiry.

private async Task OpenCircuit() {
    ConnectionMultiplexer redis = ConnectionMultiplexer.Connect("Redis Connection String");
    IDatabase db = redis.GetDatabase();
    var transaction = db.CreateTransaction();
    transaction.SortedSetRemoveRangeByScoreAsync("Redis key name", double.NegativeInfinity, DateTime.Now.AddSeconds(100 * -1).Ticks);
    transaction.SortedSetAddAsync("Redis key name", DateTime.Now.Ticks, DateTime.Now.Ticks);
    transaction.KeyExpireAsync("Redis key name", new TimeSpan(0, 0, 100));
    var sliding_failures = transaction.SortedSetLengthAsync("Redis key name");
    if (await transaction.ExecuteAsync()) {
        var failures = await sliding_failures;
        if (failures >= 40) {
            transaction = db.CreateTransaction();
            if (await transaction.ExecuteAsync()) {
                var eventHubClient = EventHubClient.CreateFromConnectionString("Eventhub connectionstring");
                var body = JsonConvert.SerializeObject("Stop app");
                var eventData = new EventData(Encoding.UTF8.GetBytes(body));
                await eventHubClient.SendAsync(eventData);
            }
        }
    }
}
C#

As soon as we send the "stop app" message on the given eventhub to which the logic app is listening it will stop our function app and save us precious resources, money, and logs.

One good thing about this is whenever the function app is restarted manually or automatically it again starts listening to eventhub message from where it left. Also, keep in mind each partition of eventhub has a separate pointer so.

Hope this helps.

For further discussion feel free to connect with me.

Thanks.

Send All Azure App Insights Or Logs Data To Tool Like Datadog

  Introduction Microsoft Azure provides a variety of insights and monitoring logs that you can monitor to check the state of a resource and ...