FANA LLM v0.2.5 - Advanced API Rate Limiting and Exponential Backoff Integration
This is a critical update introducing enhanced security and reliability features in our API infrastructure, such as the integration of rate limiting and exponential backoff with robust retry mechanism
These features are designed to improve both the performance and security of our application. Once limit is reached, our application should throw a 429 Error Code - Rate limit reached for FANA LLM requests
What's new?
Advanced Rate Limiting For Chat Generations
Implementation: Utilizes
FastAPILimiter
integrated with a Redis backend to manage rate limiting effectively.Functionality: Ensures that endpoint such as
/
are limited to a maximum of 30 requests per 60 seconds, preventing abuse and ensuring equitable resource usage with the OpenAI rate limit for image generation.Benefits: Protects the API from overuse and potential DDoS attacks, enhancing the security and availability of our services.
Advanced Rate Limiting For Image Generation
Implementation: Utilizes
FastAPILimiter
integrated with a Redis backend to manage rate limiting effectively.Functionality: Ensures that endpoints such as
/
are limited to a maximum of 15 requests per 60 seconds, preventing abuse and ensuring equitable resource usage with the OpenAI rate limit for image generation.Benefits: Protects the API from overuse and potential DDoS attacks, enhancing the security and availability of our services.
Robust Retry Mechanism with Tenacity:
Implementation: Uses the
tenacity
library to implement retries with exponential backoff.Scope: Applied to critical API operations that may require resilience in face of transient issues.
Benefits: Increases the reliability of endpoint operations by automatically retrying failed operations, optimizing for both performance and reduced error rates.
Functionality: The system first waits 8 seconds to retry it, then 16 seconds for the next retry, and finally 64 seconds for the last retry. If all retries fail, the system will take appropriate actions, such as logging an error or notifying the user.
Multiplier: Sets how much the wait time increases with each retry. A multiplier of 1 means each wait time is double the previous one.
Min and Max: The minimum wait time starts at 8 seconds, and it will not exceed 64 seconds regardless of the number of retries.
Stop After Attempt: Limits the number of retries to 3.
Enhanced Media Handling:
Update: Improvements in the LLM model interaction, now supporting a broader range of media types.
Impact: Allows users to interact more flexibly with the platform using different media formats, enhancing user experience.
Additional Information: For more details on configuring and utilizing the new rate limiting and retry features, please refer to our updated documentation or contact our support team.
Last updated