FANA LLM v0.2.5 - Advanced API Rate Limiting and Exponential Backoff Integration

This is a critical update introducing enhanced security and reliability features in our API infrastructure, such as the integration of rate limiting and exponential backoff with robust retry mechanism

These features are designed to improve both the performance and security of our application. Once limit is reached, our application should throw a 429 Error Code - Rate limit reached for FANA LLM requests

What's new?

Advanced Rate Limiting For Chat Generations

Implementation: Utilizes FastAPILimiter integrated with a Redis backend to manage rate limiting effectively.
Functionality: Ensures that endpoint such as / are limited to a maximum of 30 requests per 60 seconds, preventing abuse and ensuring equitable resource usage with the OpenAI rate limit for image generation.
Benefits: Protects the API from overuse and potential DDoS attacks, enhancing the security and availability of our services.

from fastapi_limiter.depends import RateLimiter

@router.get(
    "/",
    tags=["api_v1"],
    summary="Read Main Endpoint",
    dependencies=[Depends(RateLimiter(times=30, seconds=60))]
)
def read_main(api_key: str = Depends(get_api_key)):
    try:
        # Assuming fetch_data might fail
        data = fetch_data(api_key)
        return {"msg": "Hello from FANA LLM API V1", "data": data}
    except Exception as e:
        logging.error(f"Failed on attempt {retry_state.attempt_number}: {e}")
        raise HTTPException(status_code=500, detail="Service unavailable, please try again later.")

Advanced Rate Limiting For Image Generation

Implementation: Utilizes FastAPILimiter integrated with a Redis backend to manage rate limiting effectively.
Functionality: Ensures that endpoints such as / are limited to a maximum of 15 requests per 60 seconds, preventing abuse and ensuring equitable resource usage with the OpenAI rate limit for image generation.
Benefits: Protects the API from overuse and potential DDoS attacks, enhancing the security and availability of our services.

from fastapi_limiter.depends import RateLimiter

@router.get(
    "/",
    tags=["api_v1"],
    summary="Read Main Endpoint",
    dependencies=[Depends(RateLimiter(times=15, seconds=60))]
)
def read_main(api_key: str = Depends(get_api_key)):
    try:
        # Assuming fetch_data might fail
        data = fetch_data(api_key)
        return {"msg": "Hello from FANA LLM API V1", "data": data}
    except Exception as e:
        logging.error(f"Failed on attempt {retry_state.attempt_number}: {e}")
        raise HTTPException(status_code=500, detail="Service unavailable, please try again later.")

Robust Retry Mechanism with Tenacity:

Implementation: Uses the tenacity library to implement retries with exponential backoff.
Scope: Applied to critical API operations that may require resilience in face of transient issues.
Benefits: Increases the reliability of endpoint operations by automatically retrying failed operations, optimizing for both performance and reduced error rates.
Functionality: The system first waits 8 seconds to retry it, then 16 seconds for the next retry, and finally 64 seconds for the last retry. If all retries fail, the system will take appropriate actions, such as logging an error or notifying the user.

from tenacity import retry, stop_after_attempt, wait_exponential

@retry(wait=wait_exponential(multiplier=1, min=8, max=64), stop=stop_after_attempt(3))@retry(wait=wait_exponential(multiplier=1, min=8, max=64), stop=stop_after_attempt(3))
@router.post("/interact-with-llm/")

completion_with_backoff(model="gpt-3.5-turbo-16k-0613", messages=[{"role": "user", "content": "Once upon a time,"}])

Multiplier: Sets how much the wait time increases with each retry. A multiplier of 1 means each wait time is double the previous one.
Min and Max: The minimum wait time starts at 8 seconds, and it will not exceed 64 seconds regardless of the number of retries.
Stop After Attempt: Limits the number of retries to 3.

Enhanced Media Handling:

Update: Improvements in the LLM model interaction, now supporting a broader range of media types.
Impact: Allows users to interact more flexibly with the platform using different media formats, enhancing user experience.

Additional Information: For more details on configuring and utilizing the new rate limiting and retry features, please refer to our updated documentation or contact our support team.

PreviousFANA LLM v0.2.6 - GPT4o Integration, Enhanced API, URL Sanitizer, Additional Logging and Bugs Fixed NextFANA LLM v0.2.4 - Image Upload Handling, Generation Module and LLM Interaction Enhanced

Last updated 6 months ago