Track technology adoption and share #591

rviscomi · 2022-05-04T21:54:11Z

Add a new report that tracks the adoption and share of detected technologies.

Reports currently fall into timeseries and histograms, so we many need a new report template that handles more custom ways to explore and visualize this data.

The primary use case for this feature is to track CMS adoption, but it would be good to build this in a way that supports any given technology category and users can filter it down however they want.

Similar to the CWV Technology Report, it could be useful to apply dimensions to the stats, like ranking and country. @jdevalk also suggested slicing by "new" sites.

tunetheweb · 2022-05-04T22:17:34Z

If slicing by new sites, probably want to avoid the long tail of sites that drop in and out of our dataset depending on traffic that month, but aren’t really new - just low traffic-ed sites.

Could exclude any new sites in the largest 10m rank, and only look at new sites in top 1m or 100k sites that either haven’t appeared at all before or only in top 10m previously.

jdevalk · 2022-05-05T06:00:06Z

@tunetheweb I was actually hoping we could find a source for truly new sites; sites that are just hitting the web.

tunetheweb · 2022-05-05T06:48:42Z

Not aware of any to be hones. We could use meta dates but they are notoriously unreliable.

“New to top million” or similar is best way I can think of measuring this. It would then also include sites that launched maybe a few months ago but are only now getting serious traffic/traction.

Maybe, once we figure out the algorithm to mention this we can become that source 😁

rviscomi · 2022-05-05T19:08:26Z

Am I oversimplifying or can we just check to see if the website had ever been in the dataset?

jdevalk · 2022-05-05T19:10:56Z

@rviscomi ok, can I be really cheeky? I was hoping to “add” a bit to the dataset, so “on top”, not “within”. I think a certain search engine would know about some sites new to them?

rviscomi · 2022-05-05T19:15:16Z

I think we can only assume we're able to work with the data already publicly available to us.

Beyond "have we seen this URL before" we could also look at resource freshness data like the Last-Modified header of 1P content. If this was truly a new site, we wouldn't expect to see 3 year-old content, for example. It might still take time for a new site to reach the popularity threshold to be included in CrUX and ultimately HTTP Archive, as @tunetheweb noted.

@tomvangoethem or @nrllh might also be interested in this problem from a research perspective.

rviscomi · 2022-05-05T19:16:24Z

Perhaps worth forking the "new site" dimension from the technology adoption report for now.

tomvangoethem · 2022-05-05T21:31:51Z

From what I understand, with the "new site" dimension you're mainly interested in sites that were created/developed recently? How about using Certificate Transparency logs for that? Should be feasible to determine when a site's first certificate was issued (or, given that domains expire and get reused: the last time that the site did not have a valid certificate for a certain period of time).

Accessing CT logs might be a bit tricky though; depending on the number of sites to test, it might be feasible using the crt.sh or censys.io APIs. Censys also provides access to their data on BigQuery for research purposes (not sure if that would fall under "publicly available to us"?). Ingesting CT logs into the HTTP Archive dataset might also be an interesting option. Perhaps there's some other data sources that I don't know about?

rviscomi assigned jdevalk, rviscomi and tunetheweb May 4, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Track technology adoption and share #591

Track technology adoption and share #591

rviscomi commented May 4, 2022

tunetheweb commented May 4, 2022

jdevalk commented May 5, 2022

tunetheweb commented May 5, 2022

rviscomi commented May 5, 2022

jdevalk commented May 5, 2022

rviscomi commented May 5, 2022

rviscomi commented May 5, 2022

tomvangoethem commented May 5, 2022

Track technology adoption and share #591

Track technology adoption and share #591

Comments

rviscomi commented May 4, 2022

tunetheweb commented May 4, 2022

jdevalk commented May 5, 2022

tunetheweb commented May 5, 2022

rviscomi commented May 5, 2022

jdevalk commented May 5, 2022

rviscomi commented May 5, 2022

rviscomi commented May 5, 2022

tomvangoethem commented May 5, 2022