data:image/s3,"s3://crabby-images/8bddb/8bddb88a8fb8bf10afbb2edbe99af5fc46ccbc62" alt="Using a Distributed Lock in Hosted Service in ASP.NET WEB API"
Using a Distributed Lock in Hosted Service in ASP.NET WEB API
Web API
21 Articles
Table of Contents
What we gonna do?
One of our recent requirement in a client project has the need to query AWS Athena to get analytics data every hour and display the same data in a dashboard. We could have used HangFire or AWS Lambda to run this as jobs but we decided to use Hosted Service in our Web API to query this data periodically for every one hour as we thought, HangFire is a overkill for this work and also to spin up new lambda we need to go through lots of approvals and explanation process which is tedious in enterprises.
We have minimum of six instances of our API running in production and it can scale up to maximum of nine instances during peak traffic. We assumed even if the Hosted Service runs inside all six instances at the same time, all background Service will query the data from AWS Athena and update / overwrite the latest data inside our database and we can pull that data using endpoint and show it in dashabord.
data:image/s3,"s3://crabby-images/85fe5/85fe550e8d6295d279868cc1eb56f79f53a7828c" alt="Problem Statement"
But since this has lots of moving parts, things didn'table go as expected in Production and in this article, lets learn how we have added a distributed lock to solve this problem.
Why we gonna do?
Everything worked by adding our logics inside simple Hosted Service in ASP.NET Web API app but the problem is that after some time we started getting Rate Limit Exceptions from AWS Athena after deploying to production. The Exceptions count was alarming and kept increasing. We were not getting these issues in lower environments and so we decided to introduce retry mechanism by catching this error and log as warnings and deploying a retry for 5 times with some delay and then log as error if it fails. Now we have sort of reduced the errors but warnings count was alarming. We felt like we didn't actually solve the problem after taking this changes to production.
data:image/s3,"s3://crabby-images/22ed9/22ed9bb741e45823b7dcfef840bf37e10626624a" alt="Adding a retry mechanism"
Adding up fuel to fire, the new requirement is to reduce the frequency to every 30 minutes from one hour.
We decided to solve this permanentely by adding a distributed lock using a table in our database without relying on any libraries or tools. This way we don't introduce any tech debt and dependency and also making sure that only one instances will query AWS Athena and process the data and store in our database at given point in time. So that we don't over poll AWS Athena thus reducing the Rate Limit Exceptions from AWS and also making it more cost efficient as every query,a storage and memory used will add up cost to us.
data:image/s3,"s3://crabby-images/f064a/f064a0dd85ca0916fe7bcaf47de5577011147ce8" alt="Adding a distributed lock"
So Now when every instances starts, Hosted Service will also started and will register the instance id inside in our settings table against this job name. This way the last / latest instance incase if scale up happens will win and register its instance id. So only one will run the job and process the result.
But wait What happens if the current instance running the job becomes faulty and down for whatever reason or what if scale down happens and current instance gets terminated? In either of the case some other available instance should register itself and take over the background job. And how we did that is by adding a simple Last Run TimeStamp against the job entry in settings table.
So Now each instance while register its instance id, it will add the Last Run TimeStamp and then every instance will check settings table and see if Last Run TimeStamp is more than a hour from now and if so then any other healthy instance will self register itself and start running the job.
data:image/s3,"s3://crabby-images/2c563/2c56351b5a5d8a498ba3b0ca193c6cadc5310153" alt="Failure Scenarios"
How we gonna do?
Here is the code that we have used to solve this by adding a distributed lock in multiple instance scenario.
Code Sample - Using a distributed lock in Hosted Service in ASP.NET Web API
Here is the explanation of the above code.
- Registers the instance: Stores the instance ID in the database using RegisterInstance method.
- Runs every 30 minutes: Uses a timer to trigger data collection using TimeProvider.
- Checks eligibility: Ensures only one instance runs the job.
- Processes data: Queries Athena and updates the database using ReadDataFromAthena.
- Failsafe Mechanism: If the current instance stops, another takes over using CanExecute method.
Summary
In a recent client project, we needed to query AWS Athena every hour to fetch analytics data for a dashboard. Instead of using HangFire or lambda, we opted for a Hosted Service in our Web API, assuming multiple API instances querying Athena simultaneously would not cause issues. However, in production, we encountered frequent AWS Athena Rate Limit Exceptions. Implementing a retry mechanism reduced errors but significantly increased warnings, prompting us to rethink our approach—especially when the query frequency was reduced to 30 minutes.
To solve this permanently, we introduced a distributed lock using a database table, ensuring only one instance queries Athena at a time, reducing API calls and costs. Each instance registers itself with a timestamp, and if the current instance fails or scales down, another instance takes over by checking if the last run timestamp is over an hour old. This approach eliminated redundant queries, reduced rate limit issues, and improved system efficiency.
Now this idea can be used to add distributed lock to hosted services in any multiple instance scenarios.
data:image/s3,"s3://crabby-images/c1316/c13162f6ac78fb269a3a8f5e770c334a0a87beba" alt="Complete Idea"