-
Notifications
You must be signed in to change notification settings - Fork 143
U - Define P1 WebAPI docs #3327
Comments
This issue might be useful to decide what you want to call a p1 API page: mdn/browser-compat-data#5674 |
We also have the scoping exercise in https://docs.google.com/document/d/1rHSMMyM4RSFjttXvWLwaqWDQiVzPlAQRqnchhLfp0tg/edit#heading=h.kq7cdsyjkr1c - this is a bit old but still probably mostly relevant. |
PrinciplesI'd like to frame this work with the following principles:
What we have in Web/APIThere are 5596 pages under Web/API (https://wiki.developer.mozilla.org/en-US/docs/User:wbamberg/all-api-pages?raw¯os). There are 1095 pages at the top level. Fundamentally this documentation consists of two interleaved but semantically quite distinct documentation hierarchies.
API overview pagesWe have 98 API overview pages (defined as top-level pages with a space in their name). Under these are (supposed to be) guide pages. In total we have 171 of these guide pages. We also have the GroupData.json KumaScript macro, which is supposed to be a list of all the APIs we document. This includes the URL for the overview page and the list of interfaces and dictionaries that the API contains. Ideally, there would be one entry in GroupData of these for every API overview page in Web/API. Actually there are only 87 objects in GroupData, and only 68 of these appear in the set of actual API overview pages. It's tempting to use the higher-level concept of APIs as a way to get a handle on the scale of the Web APIs - we could for example say "Fetch is a P1 API". But the data here contains enough errors and omissions that this is hard to do. Interface and Dictionary pagesThis leaves us trying to define priorities at the level of interfaces (and sometimes dictionaries). There are about 997 such pages directly under Web/API. Analysing the interfacesI've taken the hierarchy of pages under Web/API and added traffic data to it, to make this spreadsheet: https://docs.google.com/spreadsheets/d/1UTAQG3pSrdBD2tIXMWRSATkOSSj-dcuXnG7ux5pFtDk/edit#gid=538131955. This has a row for every top-level page that's not an API overview page. In each row, it lists:
With this sheet we can list the top 10 interfaces by size:
If you sort by Traffic/page, you can capture 77.52 % of traffic by including the following interfaces, which contain 1113 pages:
This seems like a good initial proposal for P1 Web/API docs. It is entirely based on traffic though, so it would be worth scanning the other interfaces to see if we are missing any that we would like to include. Note though that we are already at 1100 pages, so any proposals to add new interfaces should be accompanied by a proposal to remove one :). |
Thanks Will! This is amazing research and very data-driven. Love it! I agree to the principles you've chosen. After choosing p1 pages by traffic, the second principle "we should prioritize coherent sets of pages" resonates a lot with me and I think this is a point where the final list of pages might change a bit as we go. I agree that if we do that, we should think about what to remove from the list, too. An attempt to cluster your list:
Now, I guess you can imagine that each of these clusters are a work package, but they are incomplete, because, for example, when you work on Canvas, you probably want to fix all of the Canvas API page structures instead of just the most trafficked. Maybe there are even dependencies with the non high traffic pages, because it turns out that some canvas (mixin, dictionary) pages need splitting or merging. So, my feeling is that in that case, it makes sense to look at all interfaces of a cluster (see here for canvas) and make them all fit the correct recipes. Does that make sense? I guess I'm coming from a more holistic approach, thinking that no one wants to dig into the other half of an API cluster again after we fixed the first p1 part. That way we would also have good examples of whole API clusters that follow the newly defined and correct doc structures that shape our way forward into more API clusters we want to fix or document. |
Well... re clusters. We already have "clusters", of a sort, they are what's defined in GroupData.json. But as said above, the data in GroupData is incomplete and inconsistent. We could fix all that before starting the WebAPI linting. Why though? It would take a bunch of time and it's not clear how it really supports the work of linting our docs. By just looking at the interfaces level, we can start fixing up individual pages without looking at fixing the higher-level organization. If fixing the higher-level organization of Web/API is a goal of this project, then fine, but that's definitely a change in scope (this US is scoped at 1 point, hilariously). Of course we can also say "the clusters in GroupData are no good, we should invent new ones". But I'd be very careful here. Grouping things into categories is always really tricky: there are like 30% of obvious cases, another 50% of cases where categorization is very subjective, and 20% that you just have to lump into "Miscellaneous", which is terrible because then people have to look in two places to find something. In your example you've taken Canvas, but then actually pointed to GroupData (essentially) for the definition. But what about, say "Fundamentals"? what are all the things we should add to that, to have a "complete" cluster? Are, say, Service Workers fundamentals? And looked at from the point of view of a developer, what sorts of things are considered fundamental? I expect you'll get different answers from different people. At least GroupData categories are backed by a real thing - the spec that defines the interfaces - rather than just the intuitions of a particular tech writer on a particular day. I'm not exactly against defining new groups, especially higher-level ones than those in GroupData. But it's hard, and it takes time, and needs to be done as a complete project IMO.
Well...maybe. The thing about doing complete interfaces is that at least each subtree of the hierarchy is done at a time, and that seems like the most important thing. But. I do think that higher-level abstractions are helpful here. So for example the "Fetch API" abstraction that unites
|
I've added a new field to the spreadsheet, which represents the group that the interface belongs to, as defined in GroupData. Note that 252 top-level pages - a quarter of the total - are not assigned to a group at all. Also, 32 interfaces are listed under two - or more! - different APIs in GroupData. So I have had to pick the most likely option in those cases. We really, really, ought to clean this data. Anyway, we can use this to see how much extra work it would be to lint complete groups. I've added another sheet, "groups". This contains a row for every group that's represented in our 57 interfaces. Each row contains:
For example, four of our interfaces, In total we can see that to complete all API groups included here would mean linting 977 more pages, or almost double the total. An especially problematic group is "HTML DOM", which would commit us to 395 more pages. And then there are groups like "Service Workers API", where we're proposing to lint just 8 pages but would need to add another 97 to complete the group. On the other hand, we're at 55/61 pages in the "XMLHttpRequest" group, so it seems very worthwhile to finish this one... I'm confused about what's in "DOM" and what's in "HTML DOM". In GroupData, Also a quick look reveals a lot of DOM interfaces that are obsolete. So although I was reluctant to define new groups across the board, it might be worth thinking about reworking this area. |
Thanks for your detailed comments!
Right, groups exist already. I was thinking more about how to proceed practically workload-wise. I mean, we could just pick random interfaces and lint them, or could we cluster things we identified as p1 and work on cluster after cluster. Maybe just linting whatever we come across in the p1 list is fine, though, as you say. And if that's the most practical way forward, lets do that.
Yeah. I thought of the clusters as practical slices to work from, not exactly aiming to re-invent groupdata. I'm not sure how useful the current groups are to our readers and as you say that is a higher level problem. I think the other areas (JS, for example) didn't suffer from these higher level issues, so maybe my (naive) hope was that we can tackle it somehow, but you are right in identifying this as its own project / user story. It is quite complex.
Thanks, this is useful!
Wow, this is too much indeed
This is amazingly useful to know per each API group! Thanks for making this analysis, Will! ServiceWorkers is really surprising here. I wonder why things are like that. Is the rest very poor/useless pages? Are we creating many pages (say for dictionaries, enums, mixins, etc.) for an API, but most people just read the main interface and/or method pages? So, here I struggle to make a clear call if we should say lets do ServiceWorkers in totality given my lacking sense of the docs.
I agree this needs work. It is worth splitting this out into an own "re-grouping" user story. I don't know how much it rabbit-holes into defining P1 API docs, but it seems that your list above is still useful and our current best bet on what we think the P1s are. There are two options, I guess:
|
I wanted to briefly chime in here: I think the analysis is really good. I like the principles you've selected to selection. I itch a bit a wholly traffic-driven approaches and I like the way you've handled that. I'm also pleased to see that these principles satisfied another interest I had, that the pages we selected represented a good range of page types, to ensure we get a mostly complete set of recipes. Based on the selection, this seems likely (even if we lost some pages on the margins—more on that in a second). I also favor completing groups, since I suspect this will improve the completeness of our recipes by drawing in more long-tail pages. But I'm also reluctant to balloon the number of P1 pages. Maybe we could complete the groups which are already very-well covered (e.g., XMLHttpRequest, which is already at 90%), but not ones that aren't close (e.g., HTML DOM, at 47%). Or if you really want completeness in one area (say the HTML DOM group) then you might let completeness cut both ways: Media Capture and Streams and Service Workers API's selected pages are less than 10% of their respective groups. If you dropped the entirety of those groups, you'd shed over 200 pages for reallocation elsewhere. Or you could do some combination: shedding groups that are poorly covered (say <10% coverage), completing groups that are well-covered (say >90% coverage), and letting the groups in between be incomplete. |
Thanks for the comments! Then here's my suggestion. We lint complete groups, with three exceptions, "DOM", "DOM Events", and "HTML DOM". Reasons for excepting these are:
So for these three groups only, we define a subset of the interfaces as P1s, based on traffic. For the other groups, we only lint complete groups. That means we either expand the set of interfaces in a group, so as to complete the group, or we remove the set, so as to omit the group completely. Also, we will throw in the 5 interfaces (56 pages) that are not currently assigned to any group. Given those principles I've added yet another sheet "P1 docs" that lists my concrete proposal, and that I'll copy here:
This lists P1 docs by group:
As you can see from a comparison with the "groups" sheet, I've removed several groups for which many pages were missing, or that just didn't seem that important. This gives us 1210 pages as P1. I'm really happy with this proposal. I think all the groups listed here are important Web APIs, and deserve to be considered P1. |
Other things to come out of this work:
|
We want to lint the WebAPI docs, but at this point we are only concerned with the most important WebAPI docs, which we call "P1 docs". We expect this to consist of about 1100 pages (out of ~5000 pages total under https://developer.mozilla.org/en-US/docs/Web/API.
In this story we will define precisely what is in P1, and thus what's in scope for the current round of linting.
Acceptance criteria
The text was updated successfully, but these errors were encountered: