FANA LLM v0.3.4 - Advanced RAG Context Management and Multi-Model Image Generation

What's new?

FANA LLM v0.3.4 - optimization of FANA context management, integration of Stability Diffusion Ultra and Replicate models, along with sophisticated context management for enhanced conversational AI capabilities

Advanced RAG Context Management

The optimized retrieval augmented generation context management system uses embedding-based retrieval for more relevant context selection:

  • User Input Conversion: User input is converted to an embedding.

  • Embedding Comparison: This embedding is compared to stored message embeddings.

  • Context Retrieval: The most similar messages are retrieved as context.

  • Response Generation: This context is used to generate more relevant responses.

RAQ Context Manager Module

Manages the context of user interactions with advanced embedding-based retrieval.

  • File: src/context_manager.rs

  • Key Functions:

    • ContextManager::new()

    • ContextManager::get_context_with_embedding()

    • ContextManager::add_message()

    • ContextManager::trim_context()

  • File: src/context_fetcher.rs

  • Key Functions:

    • StoreContext::MemoryAppender()

    • RetrieveContext::RetrieveAugmentedGeneration()

  • File: src/embedding.rs

  • Key Functions:

    • GenerateEmbedding::EmbeddingsGenerate()

    • ExtractAnNormalizeEmbedding::EmbeddingsNormalize()

    • CosineSimilarity::ComputeSimilarities()

Image Diffusion with Stability Diffusion Ultra

The generate_image_stability_ai function in src/image_diffusion_stability_diffusion.rs integrates with the Stability AI API for advanced image generation:

pub async fn generate_image_stability_ai(user_input: &str) -> Result<(String, String), BoxError> {
    let api_token = env::var("STABILITY_API_KEY").expect("STABILITY_API_KEY not set");
    let client = Client::new();

    let request = StabilityAIRequest {
        prompt: format!("Generate a high-quality image: {}", user_input),
        // ... other parameters ...
    };

    let response = client.post("https://api.stability.ai/v1/generation/stable-diffusion-xl-1024-v1-0/text-to-image")
        .header("Authorization", format!("Bearer {}", api_token))
        .json(&request)
        .send()
        .await?;

    // ... process response and return image URL ...
}

Image Diffusion with Replicate

The generate_image_sdxl function in src/image_diffusion_sdxl_replicate.rs integrates with the Replicate API:

pub async fn generate_image_sdxl(user_input: &str) -> Result<(String, String), BoxError> {
    let api_token = env::var("REPLICATE_API_TOKEN").expect("REPLICATE_API_TOKEN not set");
    let client = Client::new();

    let request = ReplicateRequest {
        version: "stable-diffusion-xl-1024-v1-0",
        input: ReplicateInput {
            prompt: format!("Generate a creative image: {}", user_input),
            // ... other parameters ...
        },
    };

    let response = client.post("https://api.replicate.com/v1/predictions")
        .header("Authorization", format!("Token {}", api_token))
        .json(&request)
        .send()
        .await?;

    // ... process response and return image URL ...
}

Error Handling

Error handling is implemented throughout the application:

  • API Response Handling: Logs and handles errors during API interactions.

  • Fallbacks: Provides default behavior when translations or embeddings fail.

  • BoxError: Uses BoxError for unified error handling across async functions.


Spawn Threading for Faster Responses

To optimize response times, key functionalities are spawned in separate threads. This allows the system to handle multiple tasks concurrently, significantly improving performance.

Example: Spawning Context Management and Image Generation

tokio::spawn(async move {
    // Context management tasks
    let user_message = json!({
        "role": "user",
        "content": user_input_clone
    });
    let assistant_message = json!({
        "role": "assistant",
        "content": result_clone
    });

    let messages_to_append = vec![user_message, assistant_message];
    let combined_content = format!("{} {}", user_input_clone, result_clone);
    let embedding = generate_embedding(&combined_content).await.unwrap_or_default();
    if let Err(e) = append_to_chat_history(
        messages_to_append,
        &user_id_clone.unwrap_or_default(),
        &session_id_clone.unwrap_or_default(),
        Some(embedding)
    ).await {
        error!("Failed to update context: {:?}", e);
    } else {
        info!("Context updated successfully");
    }
});

tokio::spawn(async move {
    // Image generation tasks
    match generate_image_sdxl(&user_input).await {
        Ok((prompt, url)) => {
            println!("Generated Image URL: {}", url);
            println!("Prompt: {}", prompt);
        }
        Err(e) => {
            eprintln!("Error generating image: {}", e);
        }
    }
});

By usingtokio::spawn, tasks are executed in parallel, ensuring efficient use of resources and faster response times.


Future Enhancements

  • Multi-modal Context: Incorporate image understanding into context management.

  • Adaptive Model Selection: Dynamically choose between different image generation models based on the request.

  • Federated Learning: Implement privacy-preserving learning from user interactions.


Last updated