status	implementation	status_last_reviewed
accepted	done	2024-03-06

Supporting slug changes in the Publishing API

Problem

slug changes are costly and increasingly common for live content
we also need to handle slug changes for draft content

Slug changes in live content

GOV.UK's existing publishing tools were built upon the assumption that document slugs do not change. This was a helpful simplifying assumption in the early days of GOV.UK and mostly held true. In the rare cases where a published slug did need to change, the cost of manual intervention from a developer to correct the issue was reasonable.

However as GOV.UK has matured, the factors in this trade-off have changed:

we now have vastly more content on GOV.UK, so although the chance of a particular document needing a slug change remains low, slug change requests are still much more frequent
our systems have become more complex over time and the effort involved in performing a slug change manually without introducing errors has increased

If we need evidence of the cost and the frequency of slug changes, we could review the workload of 2nd line support.

Slug changes in draft content

We have always allowed slugs of draft content to change. This has never been an issue because draft items were contained within a single system, so there was no requirement to maintain consistency between multiple systems representing the same content item.

With the introduction of the 'content preview' system in Publishing API, we now handle draft items whose slugs can change.

The primary identifier used for content items in the publishing API is the base_path. Using base_path as the primary identifier is based on the assumption that it does not change.

If the slug of a draft content item changes, our only option at present would be to require the publishing application to notify the publishing API of this change so that it can remove the document at the previous slug (publishing API currently does not support deletion of content items).

Proposal

Proposal 1: we should assume that slug changes will happen and incorporate this into the design of the publishing API

The simplifying assumption that slug changes do not happen is no longer serving us.

Proposal 2: use content_id as the primary identifier of content items

In order to cater for the above change in assumptions, we should use a persistent abstract identifier for content items. We already have such an identifier in the systems, in the form of content_id.

Since it's a GUID it can be generated independently and asynchronously by the publishing applications (no need for a central coordinating authority).

All publishing API endpoints should accept content_id rather than base_path, ie. instead of:

This implies that content_id would be required for all content items (not an onerous requirement).

In order to transition to this approach there are a few options:

introduce a set of publishing API endpoints which accept content by guid (e.g. PUT /content_by_guid/, PUT /draft_content_by_guid/ or something similar)
allow the existing endpoints to detect a slug which looks like a guid and treat it as such. Although slightly hacky, the chance that a normal slug would match the pattern for a guid is extremely low.

Benefits of using `content_id` as primary identifier

This will allow the publishing API to understand when the slug of a content item posted has changed. It will then allow publishing API to either:

disallow the change
gracefully handle the change by propogating the change to any downstream systems (e.g. router, url arbiter etc). It could even put in place a redirect from the old url to the new url.

Further down the line, if we move to a system where publishing API keeps some kind of 'transaction log' record, this API will allow us to keep a record of the changes of a documents slug over time. Having this data in a single transaction log will mean that we have all the information in one place to verify and enforce consistency in downstream systems.

Status of this RFC

This is an early draft, there are probably many things I have missed or not thought about.

Do you agree with the end goal?
Do you see any issues with migrating to this?
Can you see any problems or risks I haven't identified?
Does anything need fleshing out further?

Thanks for reading and for your input!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

rfc-027-supporting-slug-changes-in-the-publishing-api.md

rfc-027-supporting-slug-changes-in-the-publishing-api.md

Supporting slug changes in the Publishing API

Problem

Slug changes in live content

Slug changes in draft content

Proposal

Proposal 1: we should assume that slug changes will happen and incorporate this into the design of the publishing API

Proposal 2: use content_id as the primary identifier of content items

Benefits of using `content_id` as primary identifier

Status of this RFC

Files

rfc-027-supporting-slug-changes-in-the-publishing-api.md

Latest commit

History

rfc-027-supporting-slug-changes-in-the-publishing-api.md

File metadata and controls

Supporting slug changes in the Publishing API

Problem

Slug changes in live content

Slug changes in draft content

Proposal

Proposal 1: we should assume that slug changes will happen and incorporate this into the design of the publishing API

Proposal 2: use content_id as the primary identifier of content items

Benefits of using content_id as primary identifier

Status of this RFC

Benefits of using `content_id` as primary identifier