Skip to content

Changelog

Rob Speer edited this page Aug 27, 2014 · 44 revisions

ConceptNet 5.3.0 (unreleased)

ConceptNet 5.3 is being developed in Git branch "version5.3". It introduces changes in many areas:

New API and search features

  • The search index is now implemented in pure Python, using SQLite. Solr is no longer a dependency.
  • The API now uses this new search index. One effect of this is that it matches only complete path components, not any prefix of a URI. Searching for "/c/en/cat" will get "/c/en/cat" and "/c/en/cat/n/animal", but not "/c/en/catamaran".
  • Exact matches are also possible. Searching for "/c/en/cat/." will find only "/c/en/cat".

New and updated data

  • ConceptNet now imports data from Umbel, a Semantic Web-style view of OpenCyc.
  • Indonesian (id) and Malay (ms) concepts have been unified into the Malay macrolanguage (also designated ms), similarly to the way we already unify Chinese, because of their highly overlapping vocabularies. In the next version, we may be able to make distinctions between languages within a macrolanguage when necessary.
  • We've implemented a better Wiktionary reader using Grako, a framework for writing recursive parsers in Python. This parser is able to understand the structure of a Wiktionary entry, giving more results and fewer errors than what we did before.
  • Wiktionary parsing now covers entries written in German as well as English. (As before, the entries are about words in hundreds of languages.)

Edges in msgpack format

The intermediate format for lists of ConceptNet edges is now msgpack instead of JSON. This format is compatible with JSON but saves disk space and parsing time.

Updated assoc-space building

The "assoc-space", a dimensionality-reduced vector space of which words are like other words, uses an updated version of the assoc_space package. It can now be built in shards that are combined to form the complete space, instead of having to be built all at once, making it possible to run using a reasonable amount of RAM.

Getting a subset of ConceptNet under the CC-By license

Some of ConceptNet's data is available under the Creative Commons Attribution (CC-By) license, even though the dataset as a whole requires the Creative Commons Attribution-ShareAlike license (CC-By-SA). This information is marked on each edge, but in ConceptNet 5.2, there was no easy way to get the CC-By subset.

By now, there are enough CC-By-SA data sources that it doesn't make sense to attempt a complete build of ConceptNet without them. However, ConceptNet 5.3's downloads include a file containing only the CC-By edges, as individual edges that aren't grouped into assertions.

Deprecation of Python 2

ConceptNet 5.3's support code still runs on Python 2, but we would like to drop support for Python 2 in an upcoming version. As has been the case since version 5.2, the data cannot be built correctly on Python 2.

ConceptNet 5.2.3 (2014 June 27)

  • Fix a typo in the Makefile that prevented it from downloading the initial raw data.
  • Enforce the rate limit in the API.
  • Merge in NLP code from metanl, instead of having it as an external dependency. The dependency is now on the simpler package ftfy.
  • Add a MANIFEST.in so that the necessary data can still be found after a pip install or setup.py install.

ConceptNet 5.2.2 (2014 April 16)

  • Fix the accidental omission of nadya.jp data.

ConceptNet 5.2.1 (2014 April 8)

5.2.1 is a significant revision to the code that builds ConceptNet, but it retains mostly the same representation and almost all of the same knowledge as 5.2.0. The cases where they differ are largely due to bugs that were discovered in the refactor.

  • Reorganized much of the code for working with nodes and edges in ConceptNet.
  • The code is now designed for Python 3 as its primary environment. A small amount of compatibility code makes sure that it will still run on Python 2.7 as well, but it will not necessarily get the same results from all Unicode operations.
  • Removed a fair amount of dead code.
  • Added test cases that cover most of the code; removed tests for 5.0 that clearly wouldn't work anymore.
  • Combined assertions (such as what the 5.2 API returns) keep track of their full list of sources and their first-seen dataset, so they can be searched like edges in 5.1.

A change will be noticeable in the Web API, because for a while it was serving the union of ConceptNet 5.1 and 5.2 data structures, with both separate edges and combined assertions. Now it is only serving the combined assertions. The results should be similar, but with less duplication.

ConceptNet 5.2 (2013 September 17)

  • The set of knowledge sources has changed. JMdict is in. ReVerb is out, because we couldn't filter it well enough.
  • Some bugs in building from existing sources were fixed.
  • ConceptNet can now be built from its raw data using a Makefile. (See Build process)
  • The code comes with everything you need to build and query "assoc spaces" -- vector spaces representing semantic connections between concepts -- thanks to the open-source release of assoc_space by Luminoso.
  • The API now returns one result per assertion, even if that assertion comes from multiple sources.
  • Because of that, the representation of knowledge sources has changed. The sources used to be lists of reasons that an assertion got added, and each one implicitly represented a conjunction. The "sources" field in the API now always contains one element for each assertion, and that element contains the full AND-OR tree of sources.

ConceptNet 5.1 (2012 April 30)

Version 5.1 has a new, simpler representation of nodes and edges than ConceptNet 4.0 or 5.0, making it suitable to represent ConceptNet 5 with downloadable flat files and efficient search indexes.

  • Made base URIs shorter. For example, /concept/en/dog becomes /c/en/dog.
  • Changed the representation of assertions. Assertions are a bundle of edges (hyperedges, really) that connect two arguments and a relation. These edges are labeled with all the appropriate metadata.
  • Created JSON and CSV flat-files.
  • Created a Solr index and an accompanying API. The MongoDB is deprecated.

ConceptNet 5.1.1 was an incremental update that maintains full API compatibility with 5.1.

ConceptNet 5.0 (2011 October 28)

  • First API for ConceptNet 5.
  • All assertions were reified as nodes, with edges for arguments. This turned out to be an ineffective representation.
Clone this wiki locally