-
Notifications
You must be signed in to change notification settings - Fork 356
Knowledge sources
rspeer edited this page Sep 27, 2011
·
13 revisions
Here's a checklist of sources we may include when building ConceptNet 5.
- OMCS assertions (represented similarly to ConceptNet 4) -- Rob will work on this
- Rule-based extractions from OMCS assertions (for example, the ones that made
conceptnet_en_big.graph
) -- Rob will work on this - Scraping Wiktionary for definitions and translations -- Rob will work on this
- Verbosity
- Links among articles on Wikipedia
- ReVerb, Creative Commons version (lots of highly specific assertions from CC websites, alignable with ConceptNet) -- claimed by Yen-Ling
- WordNet (interesting alignment problem)
- XKCD color survey (and someone should make a cool visualization of it for Sponsor Week)
- VerbNet / FrameNet / PropBank (I get these confused with each other. Probably one or two of them can tell us very useful things about verb structure that aren't in any Linked Data project yet.)
- DBPedia (de facto Semantic Web standard, but ugly to actually query)
- Freebase (we have some code for it already, and it has lots of overlap with DBPedia; might want to be selective as it has hundreds of millions of assertions)
- Google N-grams 2006 (gets associations between words; problems with spam, stupidity, and sheer volume)
- Google Books 2009 collocations (ginormous, but we wouldn't attempt to use nearly all of it; need a good querying interface)
Starting points
Reproducibility
Details