Skip to content

Knowledge sources

rspeer edited this page Oct 9, 2011 · 13 revisions

Here's a checklist of sources we may include when building ConceptNet 5.

Rob intends to start chugging through these sources faster when there's a way to pour in data through Gremlin instead of through the relatively slow REST API.

Source checklist

  • OMCS Chinese: done
  • ReVerb: done
  • Wikipedia links (Hooyoung?)
  • DBPedia categories (Justin)
  • Verbosity: done
  • OMCS English (ConceptNet 4)
  • OMCS Japanese
  • OMCS Portuguese
  • Wiktionary translations: preparing
  • GoalNet, from WikiHow and OMICS
  • Rule-based extractions from ConceptNet 4
  • WordNet

Potential future sources

  • XKCD color survey (and someone should make a cool visualization of it for Sponsor Week)
  • VerbNet / FrameNet / PropBank (I get these confused with each other. Probably one or two of them can tell us very useful things about verb structure that aren't in any Linked Data project yet.)
  • Freebase (we have some code for it already, and it has lots of overlap with DBPedia; might want to be selective as it has hundreds of millions of assertions)
  • Google N-grams 2006 (gets associations between words; problems with spam, stupidity, and sheer volume)
  • Google Books 2009 collocations (ginormous, but we wouldn't attempt to use nearly all of it; need a good querying interface)
Clone this wiki locally