Skip to content

Semantic fingerprinting of scientific journals using the cortical.io API and visualization with Bokeh

Notifications You must be signed in to change notification settings

bcschiffler/journals-semantic-fingerprints

Repository files navigation

Semantic fingerprinting of scientific journals

To compare the similarity of texts, it is helpful to get an aggregate representation of the relevant texts. The notebook in this repo displays the semantic overlap in content between major scientific journals in the biomedical field.

We are going to be using the cortical.io API to get a semantic fingerprint for the abstracts from every journal, compare them using the Jaccard distance metric and finally plot them in an interactive map using Bokeh.

Semantic fingerprinting is a technique based on embedding a word or text in a context so that the conceptual links to other concepts are being revealed. There are many ways to embed a text in a vector space. The method I am showing in this notebook relies on semantic folding. It has its origins in theoretical concepts on how the brain could be storing information, e.g., theories about sparse distributed representations. Find more information about semantic fingerprinting here.

The data used in this notebook (a sample of 200 abstracts for each journal) stems from Pubmed queries and can e.g., be obtained using the scripts in this repository. However, with adjustments it can be used to gather semantic fingerprints and compare them for any text-based data.

The final output html file is hosted here.

About

Semantic fingerprinting of scientific journals using the cortical.io API and visualization with Bokeh

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published