Skip to content

Commit

Permalink
Merge branch 'develop'
Browse files Browse the repository at this point in the history
  • Loading branch information
clemlesne committed Dec 8, 2024
2 parents 3edea50 + 1d7fdab commit f507f97
Show file tree
Hide file tree
Showing 23 changed files with 1,465 additions and 910 deletions.
83 changes: 14 additions & 69 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -188,28 +188,29 @@ graph LR
redis[("Cache<br>(Redis)")]
search[("RAG<br>(AI Search)")]
sounds[("Sounds<br>(Azure Storage)")]
sst["Speech-to-Text<br>(Cognitive Services)"]
sst["Speech-to-text<br>(Cognitive Services)"]
translation["Translation<br>(Cognitive Services)"]
tts["Text-to-Speech<br>(Cognitive Services)"]
tts["Text-to-speech<br>(Cognitive Services)"]
end
app -- Respond with text --> communication_services
app -- Ask for translation --> translation
app -- Ask to transfer --> communication_services
app -- Few-shot training --> search
app -- Translate static TTS --> translation
app -- Sezarch RAG data --> search
app -- Generate completion --> gpt
gpt -. Answer with completion .-> app
app -- Generate voice --> tts
tts -. Answer with voice .-> app
app -- Get cached data --> redis
app -- Save conversation --> db
app -- Send SMS report --> communication_services
app -- Transform voice --> sst
sst -. Answer with text .-> app
app <-. Exchange audio .-> communication_services
app -. Watch .-> queues
communication_services -- Generate voice --> tts
communication_services -- Load sound --> sounds
communication_services -- Notifies --> eventgrid
communication_services -- Send SMS --> user
communication_services -- Transfer to --> agent
communication_services -- Transform voice --> sst
communication_services -. Send voice .-> user
communication_services <-. Exchange audio .-> agent
communication_services <-. Exchange audio .-> user
eventgrid -- Push to --> queues
Expand All @@ -218,63 +219,6 @@ graph LR
user -- Call --> communication_services
```

### Sequence diagram

```mermaid
sequenceDiagram
autonumber
actor Customer
participant PSTN
participant Text to Speech
participant Speech to Text
actor Human agent
participant Event Grid
participant Communication Services
participant App
participant Cosmos DB
participant OpenAI GPT
participant AI Search
App->>Event Grid: Subscribe to events
Customer->>PSTN: Initiate a call
PSTN->>Communication Services: Forward call
Communication Services->>Event Grid: New call event
Event Grid->>App: Send event to event URL (HTTP webhook)
activate App
App->>Communication Services: Accept the call and give inbound URL
deactivate App
Communication Services->>Speech to Text: Transform speech to text
Communication Services->>App: Send text to the inbound URL
activate App
alt First call
App->>Communication Services: Send static SSML text
else Callback
App->>AI Search: Gather training data
App->>OpenAI GPT: Ask for a completion
OpenAI GPT-->>App: Respond (HTTP/2 SSE)
loop Over buffer
loop Over multiple tools
alt Is this a claim data update?
App->>Cosmos DB: Update claim data
else Does the user want the human agent?
App->>Communication Services: Send static SSML text
App->>Communication Services: Transfer to a human
Communication Services->>Human agent: Call the phone number
else Should we end the call?
App->>Communication Services: Send static SSML text
App->>Communication Services: End the call
end
end
end
App->>Cosmos DB: Persist conversation
end
deactivate App
Communication Services->>PSTN: Send voice
PSTN->>Customer: Forward voice
```

## Deployment

### Prerequisites
Expand Down Expand Up @@ -751,7 +695,7 @@ prompts:

The delay mainly come from two things:

- The fact that Azure Communication Services is sequential in the way it forwards the audio (it technically foarwards only the text, not the audio, and once the entire audio is transformed, after waited for a specified blank time)
- Voice in and voice out are processed by Azure AI Speech, both are implemented in streaming mode but voice is not directly streamed to the LLM
- The LLM, more specifically the delay between API call and first sentence infered, can be long (as the sentences are sent one by one once they are made avalable), even longer if it hallucinate and returns empty answers (it happens regularly, and the applicatoipn retries the call)

From now, the only impactful thing you can do is the LLM part. This can be acheieve by a PTU on Azure or using a less smart model like `gpt-4o-mini` (selected by default on the latest versions). With a PTU on Azure OpenAI, you can divide by 2 the latency in some case.
Expand Down Expand Up @@ -795,6 +739,7 @@ Resiliency:
Security:

- [x] CI builds attestations
- [x] CodeQL static code checks
- [ ] GitOps for deployments
- [ ] Red team exercises

Expand Down
38 changes: 38 additions & 0 deletions app/helpers/cache.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
import functools
from collections import OrderedDict
from collections.abc import Awaitable


def async_lru_cache(maxsize: int = 128):
"""
Caches a function's return value each time it is called.
If the maxsize is reached, the least recently used value is removed.
"""

def decorator(func):
cache: OrderedDict[tuple, Awaitable] = OrderedDict()

@functools.wraps(func)
async def wrapper(*args, **kwargs) -> Awaitable:
# Create a cache key from args and kwargs, using frozenset for kwargs to ensure hashability
key = (args, frozenset(kwargs.items()))

if key in cache:
# Move the recently accessed key to the end (most recently used)
cache.move_to_end(key)
return cache[key]

# Compute the value since it's not cached
value = await func(*args, **kwargs)
cache[key] = value
cache.move_to_end(key)

if len(cache) > maxsize:
cache.popitem(last=False)

return value

return wrapper

return decorator
Loading

0 comments on commit f507f97

Please sign in to comment.