Skip to content

Knowledge sources

rspeer edited this page Apr 30, 2012 · 13 revisions

Here's a checklist of sources we include or may someday include when building ConceptNet 5.

Source checklist

Done

  • OMCS English, Portuguese, Japanese, Dutch, Korean, French (ConceptNet 4)
  • OMCS Chinese
  • Filtered output from ReVerb over Wikipedia
  • GlobalMind
  • Verbosity
  • Wiktionary translations
  • WordNet (word senses should possibly be revised)
  • DBPedia's type relationships

On hold

  • GoalNet, from WikiHow and OMICS. We're missing some structure here; we need to be able to tell which list of steps corresponds to a particular plan, not just the steps for all plans together.
  • Full DBPedia. Maybe what we really want is just the nodes and simple is-a relationships, and an interface for extracting further information from DBPedia?

Potential future sources

  • generalized ReVerb links (such as "TakesObject")
  • Wikipedia links
  • Rule-based extractions from ConceptNet 4
  • XKCD color survey
  • VerbNet / FrameNet / PropBank (I get these confused with each other. Probably one or two of them can tell us very useful things about verb structure that aren't in any Linked Data project yet.)
  • Freebase (we have some code for it already, and it has lots of overlap with DBPedia; might want to be selective as it has hundreds of millions of assertions)
  • Google N-grams 2006 (gets associations between words; problems with spam, stupidity, and sheer volume)
  • Google Books 2009 collocations (ginormous, but we wouldn't attempt to use nearly all of it; probably involves accessing it through Amazon)
  • OpenCyc (but we'd need to decide what to do with compound concepts that are written as CycL expressions)

Sources we probably can't use

  • EuroWordNet. It's under a very restrictive license, it seems.
Clone this wiki locally