Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature to use the Azure or AWS-specific Image Caches of ImageSharp #15016

Closed
Piedone opened this issue Jan 8, 2024 · 17 comments · Fixed by #15028
Closed

Feature to use the Azure or AWS-specific Image Caches of ImageSharp #15016

Piedone opened this issue Jan 8, 2024 · 17 comments · Fixed by #15028
Assignees
Milestone

Comments

@Piedone
Copy link
Member

Piedone commented Jan 8, 2024

Is your feature request related to a problem? Please describe.

When caching images resized with ImageSharp, we use PhysicalFileSystemCache. This can cause an IO bottleneck, like it happens on Azure App Services, see #14859.

Describe the solution you'd like

There are alternative Image Caches: AzureBlobStorageImageCache stores resized images in Azure Blob Storage, and AWSS3StorageCache in AWS S3. These can offer a performance boost to apps running in either cloud by reducing the number of local IO transactions. Furthermore, if you run throwaway webservers that start with a blank slate after every deployment, including containers, then you can use these features to make your image cache persistent.

So, we could add features in OrchardCore.Media.Azure and OrchardCore.Media.AmazonS3 to optionally use these caches.

Note that unfortunately at the moment these caches can't distinguish tenants, see this feature request. We could either contribute that (if the ImageSharp team wants it, to begin with; or add it from a fork of ImageSharp.Web) or accept that the cache files will be in the root of an Azure Blob Storage container/AWS S3 bucket for all tenants, making tenant-specific cache purging or clean-up after tenant deletion infeasible.

I'm adding a simple prototype below, but I'm yet actively testing AzureBlobStorageImageCache on DotNest:

[RequireFeatures("OrchardCore.Media.Azure.Storage")]
public class ImageSharpAzureBlobCacheStartup : StartupBase
{
    private static readonly object _containerCreateLock = new();

    private readonly IShellConfiguration _shellConfiguration;

    private static bool _containerCreated;

    // This is needed to be above OrchardCore.Media's 0 to replace the IImageCache implementation registered there.
    public override int Order => 1;

    public ImageSharpAzureBlobCacheStartup(IShellConfiguration shellConfiguration) => _shellConfiguration = shellConfiguration;

    public override void ConfigureServices(IServiceCollection services) =>
        // Following https://docs.sixlabors.com/articles/imagesharp.web/imagecaches.html we'd use
        // SetCache<AzureBlobStorageCache>() but that's only available on IImageSharpBuilder after AddImageSharp(), what
        // happens in OrchardCore.Media. Thus, an explicit Replace() is necessary.
        services.Configure<AzureBlobStorageCacheOptions>(options =>
        {
            _shellConfiguration
                .GetSection("OrchardCore_Media_Azure").GetSection("ImageSharp").GetSection("AzureBlobStorageCacheOptions")
                .Bind(options);

            lock (_containerCreateLock)
            {
                if (!_containerCreated)
                {
                    AzureBlobStorageCache.CreateIfNotExists(options, PublicAccessType.None);
                    // We handle the locking.
#pragma warning disable S2696 // Instance members should not write to "static" fields
                    _containerCreated = true;
#pragma warning restore S2696 // Instance members should not write to "static" fields
                }
            }
        })
        .Replace(ServiceDescriptor.Singleton<IImageCache, AzureBlobStorageCache>());
}

Describe alternatives you've considered

Using Output Cache for /media, as elaborated under #14859, but that's perhaps good anyway, in addition to this.

@MikeAlhayek
Copy link
Member

@Piedone I agree with the need to move this outside the file system. This once was a topic during triage and I shared the same concern.

I just don't think it will have any performance gain. It'll free up resources on the web server for sure. But, I can see it being slightly slower in performance. Either way, I agree with adding this feature would be great.

@Piedone
Copy link
Member Author

Piedone commented Jan 9, 2024

Thanks for your feedback!

On Azure App Services storage IO is a really narrow bottleneck, so due to not bumping into its throttling even with single page views and especially if the site sees any traffic), I'd expect a performance increase in the average media request too (and the seemingly local storage there is a networked one anyway, with latencies similar to Blob Storage). But we'll see, we're testing it now.

In the meantime, I'll start working on this.

@Piedone
Copy link
Member Author

Piedone commented Jan 9, 2024

Due to the limitation mentioned above we can't easily manage tenants' caches separately unless using separate containers/buckets. This means that purging the cache of just one tenant is not supported (nor purging the cache at all, unless you delete the blobs manually). Furthermore, cache files are only removed for a tenant when removing the tenant itself if you use a separate container for each tenant.

I think this is acceptable. But perhaps I can add a background task that automatically deletes cache files older than ImageSharpMiddlewareOptions.CacheMaxAge and/or admin UI to purge the whole cache container's content for every tenant from the Default tenant. (Nicer would be to do this if there's a single container for all tenants, and let each tenant purge their own cache if they have their own container. I don't think this is really possible to determine, though, since we'd need to basically check every other tenants' container too, to determine if the current one's is unique, with Liquid templating.)

What do you think?

I created the first draft of this, only for Azure for now: #15028.

@Piedone
Copy link
Member Author

Piedone commented Jan 12, 2024

We'll have subpath config in ImageSharp: SixLabors/ImageSharp.Web#351 (reply in thread)

@Piedone
Copy link
Member Author

Piedone commented Jan 17, 2024

Opened an ImageSharp PR to have subpath config: SixLabors/ImageSharp.Web#353.

@sebastienros
Copy link
Member

Will the new ImageCache still check if the "file" exist and then download it locally (on the server) from the blob storage and serve it, or will the that we generate contain a direct url to the blob storage?

Azure App Services storage IO is a really narrow bottleneck

Do you know that when we do FS IO in App Service it's actually using Blob Storage? I believe this will resolve the limitations imposed by App Service, but maybe the blob storage cost will be higher. Have you been able to test it on the same site that was having issues already? (maybe you are waiting for the PR in IS to be done)

@sebastienros sebastienros added this to the 1.x milestone Jan 18, 2024
@Piedone
Copy link
Member Author

Piedone commented Jan 18, 2024

Note that this is only about caching images resized with ImageSharp, not any other Media file coming from Blob Storage (that I'm still looking into under #14859). Those are directly served by IS, and currently the cache IS uses is only the local file system one.

This issue and the corresponding PR are about swapping that out with the Blob Storage cache of IS. I.e., after that, IS will save cache files to Blob Storage and also serve them from them directly. No local files used at all. The request still go through the same pipeline though, so URLs don't change, clients aren't directly connecting to the blob's URL.

Yes, I'm aware that the local file system under Azure App Services is actually an Azure File Share. The problem with it is that it has a much lower (default) r/s, i.e. IOPS limit (see here) than Blob Storage (which is 20k r/s). And the actual local file system, which I also tried, is even lower at presumably 500 IOPS (see). I went through these under #14859.

So, with this change, we can spread out the IO operations across multiple storages, so hopefully neither of them are a bottleneck, at least until you not to anything extreme.

We're still testing the above prototype. The results are a mixed bag yet, but we'll see after gathering some more data.

@Piedone
Copy link
Member Author

Piedone commented Jan 22, 2024

Here are the results after about a week:

  • I don't really see a change in the App Service's IO (looking at the "IO*" metrics under it). This is unexpected. The storage account's transactions increased. This is expected, of course. The latter is nowhere near the mentioned 20k r/s limit for the whole storage account (occasionally spikes to 500 r/s but usually much less).
  • Resized image requests are still topping the slowest ones for our app, with them actually becoming slower, See details below.

With Blob Storage, at least I can see which exact operations take time. And while usually it's much faster, frequently Blob Storage is super slow. See e.g. this request:

image

Each Blob Storage operation took more than 1,5s! This is really slow. I see many more examples of the same, where the latency of simple Blob operations for 20-30 KB images take seconds. It's hopeless for us to optimize anything if the underlying Blob operations can take so long.

And while the most painful are request that have seconds of latencies, multiple 100s of ms is also way too much for what I'd expect. Note that the App and Blob Storage account are in the same Azure region, so these requests are supposedly not even leaving the datacenter.

Looking at the Success Server Latency and Success E2E latency it's the same story (note that this is Success Server Latency, i.e. the time it takes Blob Storage to response, it's not related to network latency):

image

Note how usually the latency is within what you'd expect, <20 ms (though this is not visible in the chart) but then there's a huge spike. Without anything special in the amount of transactions:

image

And it's not because of huge egress/ingress volume:

image

image

This is one example, but we see Blob latencies spike to seconds, even close to 10s, multiple times a day.

@sebastienros can you ask somebody from the Azure team to check if this is normal? From what I can see in public sources, it isn't.

@MikeAlhayek
Copy link
Member

@Piedone Just wondering what your settings look like. Which blob tear are you using? May be you are on in a non-hot tier. https://learn.microsoft.com/en-us/azure/storage/blobs/access-tiers-overview . Are you using private endpoint?

@Piedone
Copy link
Member Author

Piedone commented Jan 22, 2024

It's on the hot tier, yes. BTW I think hot-cold are not performance tiers but pricing ones.

The endpoint is public.

@sebastienros
Copy link
Member

@Piedone I can try to ping them but only if you create an isolated repro (no orchard for instance) that is easy to understand. They might even say "remove ImageSharp" to see if that is the problem. At that point we could even check the SLA and then file a support ticket. If think this is a sensible step before I start reaching out directly.

@Piedone
Copy link
Member Author

Piedone commented Jan 23, 2024

I see, though the Success Server Latency (and the other storage metrics) is not really something that would depend on whether it's OC, IS, or anything else on the other end of the line.

The SLA for Standard storage transactions I think is 2s (see page 83) for files <=1 MB. It's usually, but not always, within that. I can't find the Premium figure there, but at least for File here it says that should be <10ms.

Now I switched over to Premium storage as a test. We'll see in the next few days whether it makes a difference. Interestingly, pricing-wise it should actually be better than the Hot tier of Standard, since while its data storage price is $0,15/GB vs $0,021, read/write operations are cheaper.

This is a bit more promising since it seems that CPU became the bottleneck, with some requests resembling this, with the time between the arrows NOT being storage latency (but I assume CPU):

image

Latency metrics, even E2E, are <10ms.

We'll see.

@Piedone
Copy link
Member Author

Piedone commented Jan 23, 2024

Well, with premium storage I'm still frequently getting >1s Blob latencies, and resized image requests are still topping the slow requests list.

E.g. here the quickest Blob access was 400ms, the slowest 1,7s:

image

Note that this is a cache read for an existing cache item, as you can see, it's just fetching the blob.

This time though, checking out the storage account's metrics for the time frame of such requests, Success Server Latency there is max 21 ms with E2E latency being <=28 ms.

So, this is promising for the wider use case. However, we're doing a standard App Service hosting model, so still, where is the rest of the time spent? Not much remains than the network, but I can't see any issues that might cause this (and then again this doesn't happen all the time).

@Piedone
Copy link
Member Author

Piedone commented Jan 27, 2024

@Piedone
Copy link
Member Author

Piedone commented Jan 30, 2024

In the meantime, as an additional issue, we see too many IS cache misses. My working theory is that this is caused by the webroot of our app being renewed due to staged publishing. And since via MediaResizingFileProvider IS loads images from there, if a files changed there, IS will re-cache it (see here).

The real, fundamental solution here might be a persistent (and fast) webroot folder. I'm not hopeful about being able to move that in any more granular way than setting WebApplicationOptions.WebRootPath, since a lot of IO code uses that to write directly to the file system.

I'll try this too.

@Piedone
Copy link
Member Author

Piedone commented Mar 8, 2024

I'll be able to share some real-life performance comparisons between various IS cache approaches.

@Piedone
Copy link
Member Author

Piedone commented Mar 12, 2024

See: #14859 (comment)

@MikeAlhayek MikeAlhayek added this to the 2.0 milestone Sep 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
3 participants