louismullie · pabramowitsch · Dec 11, 2014 · Dec 12, 2014
diff --git a/README.md b/README.md
@@ -1,13 +1,16 @@
 [![Build Status](https://secure.travis-ci.org/louismullie/stanford-core-nlp.png)](http://travis-ci.org/louismullie/stanford-core-nlp)
 
 **About**
-
+  
 This gem provides high-level Ruby bindings to the [Stanford Core NLP package](http://nlp.stanford.edu/software/corenlp.shtml), a set natural language processing tools for tokenization, sentence segmentation, part-of-speech tagging, lemmatization, and parsing of English, French and German. The package also provides named entity recognition and coreference resolution for English.
 
-This gem is compatible with Ruby 1.9.2 and 1.9.3 as well as JRuby 1.7.1. It is tested on both Java 6 and Java 7.
+This gem is compatible with Ruby 1.9.2, 1.9.3, and 2.1.1 as well as JRuby 1.7.1. It is tested on both Java 6 and Java 7.
+It is only compatible with Stanford's  3.4.1 release and above. Serious repackaging occured between the 3.3 and 3.4 versions
 
 **Installing**
 
+NOTE:  Please see instructions on "using the latest version" below.   The packaging of the stanford version has changed.
+
 First, install the gem: `gem install stanford-core-nlp`. Then, download the Stanford Core NLP JAR and model files. Two packages are available:
 
 * A [minimal package](http://louismullie.com/treat/stanford-core-nlp-minimal.zip) with the default tagger and parser models for English, French and German.
@@ -71,7 +74,7 @@ text.get(:sentences).each do |sentence|
     puts token.get(:named_entity_tag).to_s
     # Coreference
     puts token.get(:coref_cluster_id).to_s
-    # Also of interest: coref, coref_chain,
+    # Also of interest: coref, coref_chain, 
     # coref_cluster, coref_dest, coref_graph.
   end
 end
@@ -81,7 +84,7 @@ end
 
 The Ruby symbol (e.g. `:named_entity_tag`) corresponding to a Java annotation class is the `snake_case` of the class name, with 'Annotation' at the end removed. For example, `NamedEntityTagAnnotation` translates to `:named_entity_tag`, `PartOfSpeechAnnotation` to `:part_of_speech`, etc.
 
-A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the `config.rb` file inside the gem.
+A good reference for names of annotations are the Stanford Javadocs for [CoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/ling/CoreAnnotations.html), [CoreCorefAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/dcoref/CorefCoreAnnotations.html), and [TreeCoreAnnotations](http://nlp.stanford.edu/nlp/javadoc/javanlp/edu/stanford/nlp/trees/TreeCoreAnnotations.html). For a full list of all possible annotations, see the `config.rb` file inside the gem. 
 
 
 **Loading specific classes**
@@ -90,12 +93,12 @@ You may want to load additional Java classes (including any class from the Stanf
 
 ```ruby
 # Default base class is edu.stanford.nlp.pipeline.
-StanfordCoreNLP.load_class('PTBTokenizerAnnotator')
+StanfordCoreNLP.load_class('PTBTokenizerAnnotator')  
 puts StanfordCoreNLP::PTBTokenizerAnnotator.inspect
   # => #<Rjb::Edu_stanford_nlp_pipeline_PTBTokenizerAnnotator>
 
 # Here, we specify another base class.
-StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger')
+StanfordCoreNLP.load_class('MaxentTagger', 'edu.stanford.nlp.tagger') 
 puts StanfordCoreNLP::MaxentTagger.inspect
   # => <Rjb::Edu_stanford_nlp_tagger_maxent_MaxentTagger:0x007f88491e2020>
 ```
@@ -147,24 +150,26 @@ To run the specs for each language (after copying the JARs into the `bin` folder
 
 **Using the latest version of the Stanford CoreNLP**
 
-Using the latest version of the Stanford CoreNLP (version 3.3.1 as of 6/1/2014) requires some additional manual steps:
+Using the latest version of the Stanford CoreNLP (version 3.4.1 as of 8/27/2014) requires some additional manual steps:
 
-* Download [Stanford CoreNLP version 3.3.1](http://nlp.stanford.edu/software/stanford-corenlp-full-2014-01-04.zip) from http://nlp.stanford.edu/.
+* Download [Stanford CoreNLP version 3.4.1](http://nlp.stanford.edu/software/stanford-corenlp-full-2014-08-27.zip) from http://nlp.stanford.edu/.
 * Place the contents of the extracted archive inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/) or inside the directory location configured by setting StanfordCoreNLP.jar_path.
-* Download [the full Stanford Tagger version 3.3.1](http://nlp.stanford.edu/software/stanford-postagger-full-2014-01-04.zip) from http://nlp.stanford.edu/.
+* Extract the contents of the stanford-corenlp-3.4.1-models.jar file in the bin folder.  The Jar and its exploded file structure are both accessed by the gem.  Note that if you are locating the stanford exploded model files outside
+*   the gem's bin folder, your StanfordCoreNLP.model_path should be set to the root of that file structure. 
+* Download [the full Stanford Tagger version 3.4.1](http://nlp.stanford.edu/software/stanford-postagger-2014-08-27.zip) from http://nlp.stanford.edu/.
 * Make a directory named 'taggers' inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/) or inside the directory configured by setting StanfordCoreNLP.jar_path.
 * Place the contents of the extracted archive inside taggers directory.
 * Download [the bridge.jar file](https://github.com/louismullie/stanford-core-nlp/blob/master/bin/bridge.jar?raw=true) from https://github.com/louismullie/stanford-core-nlp.
-* Place the downloaded bridger.jar file inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/taggers/) or inside the directory configured by setting StanfordCoreNLP.jar_path.
+* Place the downloaded bridge.jar file inside the /bin/ folder of the stanford-core-nlp gem (e.g. [...]/gems/stanford-core-nlp-0.x/bin/taggers/) or inside the directory configured by setting StanfordCoreNLP.jar_path.
 * Configure your setup (for English) as follows:
 ```ruby
 StanfordCoreNLP.use :english
 StanfordCoreNLP.model_files = {}
 StanfordCoreNLP.default_jars = [
   'joda-time.jar',
   'xom.jar',
-  'stanford-corenlp-3.3.1.jar',
-  'stanford-corenlp-3.3.1-models.jar',
+  'stanford-corenlp-3.4.1.jar',
+  'stanford-corenlp-3.4.1-models.jar',
   'jollyday.jar',
   'bridge.jar'
 ]
@@ -178,8 +183,8 @@ StanfordCoreNLP.set_model('pos.model', 'french.tagger')
 StanfordCoreNLP.default_jars = [
   'joda-time.jar',
   'xom.jar',
-  'stanford-corenlp-3.3.1.jar',
-  'stanford-corenlp-3.3.1-models.jar',
+  'stanford-corenlp-3.4.1.jar',
+  'stanford-corenlp-3.4.1-models.jar',
   'jollyday.jar',
   'bridge.jar'
 ]
@@ -193,16 +198,17 @@ StanfordCoreNLP.set_model('pos.model', 'german-fast.tagger')
 StanfordCoreNLP.default_jars = [
   'joda-time.jar',
   'xom.jar',
-  'stanford-corenlp-3.3.1.jar',
-  'stanford-corenlp-3.3.1-models.jar',
+  'stanford-corenlp-3.4.1.jar',
+  'stanford-corenlp-3.4.1-models.jar',
   'jollyday.jar',
   'bridge.jar'
 ]
 end
 ```
+
 **Contributing**
 
 Simple.
 
 1. Fork the project.
-2. Send me a pull request!
+2. Send me a pull request!
diff --git a/lib/stanford-core-nlp.rb b/lib/stanford-core-nlp.rb
@@ -26,16 +26,16 @@ module StanfordCoreNLP
   StanfordCoreNLP.log_file = nil
 
   # Default JAR files to load.
+  # note must be version 3.4.1 and above
   StanfordCoreNLP.default_jars = [
     'joda-time.jar',
     'xom.jar',
-    'stanford-parser.jar',
     'stanford-corenlp.jar',
-    'stanford-segmenter.jar',
     'jollyday.jar',
     'bridge.jar'
   ]
 
+
   # Default classes to load.
   StanfordCoreNLP.default_classes = [
     ['StanfordCoreNLP', 'edu.stanford.nlp.pipeline', 'CoreNLP'],
@@ -57,15 +57,15 @@ module StanfordCoreNLP
 
   require 'stanford-core-nlp/bridge'
   extend StanfordCoreNLP::Bridge
-
+  
   class << self
     # The model file names for a given language.
     attr_accessor :model_files
     # The folder in which to look for models.
     attr_accessor :model_path
     # Store the language currently being used.
     attr_accessor :language
-    #Custom properties
+     #Custom properties
     attr_accessor :custom_properties
   end
 
@@ -75,7 +75,7 @@ class << self
   # with the individual models inside. By default, this
   # is the same as the JAR path.
   self.model_path = self.jar_path
-
+  
   # ########################### #
   # Public configuration params #
   # ########################### #
@@ -106,7 +106,7 @@ def self.use(language)
 
   # Use english by default.
   self.use :english
-
+  
   # Set a model file.
   def self.set_model(name, file)
     n = name.split('.')[0].intern
@@ -118,7 +118,7 @@ def self.set_model(name, file)
   # ########################### #
 
   def self.bind
-
+    
     # Take care of Windows users.
     if self.running_on_windows?
       self.jar_path.gsub!('/', '\\')
@@ -133,16 +133,16 @@ def self.bind
       klass = const_get(info.first)
       self.inject_get_method(klass)
     end
-
+  
   end
-
+  
   # Load a StanfordCoreNLP pipeline with the
   # specified JVM flags and StanfordCoreNLP
   # properties.
   def self.load(*annotators)
-
+    
     self.bind unless self.bound
-
+    
     # Prepend the JAR path to the model files.
     properties = {}
     self.model_files.each do |k,v|
@@ -160,7 +160,7 @@ def self.load(*annotators)
       end
       properties[k] = f
     end
-
+    
     properties['annotators'] = annotators.map { |x| x.to_s }.join(', ')
 
     unless self.language == :english
@@ -172,46 +172,46 @@ def self.load(*annotators)
       # Otherswise throws java.lang.NullPointerException: null.
       properties['parse.buildgraphs'] = 'false'
     end
-
+    
     # Bug fix for NER system. Otherwise throws:
     # Error initializing binder 1 at edu.stanford.
     # nlp.time.Options.<init>(Options.java:88)
     properties['sutime.binders'] = '0'
-
+    
     # Manually include SUTime models.
     if annotators.include?(:ner)
-      properties['sutime.rules'] =
-      self.model_path + 'sutime/defs.sutime.txt, ' +
-      self.model_path + 'sutime/english.sutime.txt'
+      properties['sutime.rules'] = 
+      self.model_path + './edu/stanford/nlp/models/sutime/defs.sutime.txt, ' +
+      self.model_path + './edu/stanford/nlp/models/sutime/english.sutime.txt'
     end
-
+    
     props = get_properties(properties)
-
+    
     # Hack for Java7 compatibility.
     bridge = const_get(:AnnotationBridge)
     bridge.getPipelineWithProperties(props)
 
   end
-
+  
   # Hack in order not to break backwards compatibility.
   def self.const_missing(const)
     if const == :Text
       puts "WARNING: StanfordCoreNLP::Text has been deprecated." +
       "Please use StanfordCoreNLP::Annotation instead."
       Annotation
-    else
+    else 
       super(const)
     end
   end
 
   private
-
+  
   # Create a java.util.Properties object from a hash.
   def self.get_properties(properties)
     properties = properties.merge(self.custom_properties)
     props = Properties.new
     properties.each do |property, value|
-      props.set_property(property.to_s, value.to_s)
+      props.set_property(property, value)
     end
     props
   end

diff --git a/lib/stanford-core-nlp/config.rb b/lib/stanford-core-nlp/config.rb
@@ -12,10 +12,10 @@ class Config
 
     # Folders inside the JAR path for the models.
     ModelFolders = {
-      :pos => 'taggers/',
-      :parse => 'grammar/',
-      :ner => 'classifiers/',
-      :dcoref => 'dcoref/'
+      :pos => 'edu/stanford/nlp/models/pos-tagger/english-left3words/',
+      :parse => '/edu/stanford/nlp/models/lexparser/',
+      :ner => '/edu/stanford/nlp/models/ner/',
+      :dcoref => '/edu/stanford/nlp/models/dcoref/'
     }
 
     # Tag sets used by Stanford for each language.
@@ -41,7 +41,7 @@ class Config
       },
 
       :ner => {
-        :english => 'all.3class.distsim.crf.ser.gz'
+        :english => 'english.all.3class.distsim.crf.ser.gz'
         # :german => {} # Add this at some point.
       },
 
@@ -351,7 +351,7 @@ class Config
         'ConstraintAnnotation'
       ],
 
-      'nlp.trees.semgraph.SemanticGraphCoreAnnotations' => [
+      'nlp.semgraph.SemanticGraphCoreAnnotations' => [
         'BasicDependenciesAnnotation',
         'CollapsedCCProcessedDependenciesAnnotation',
         'CollapsedDependenciesAnnotation'

diff --git a/spec/english_spec.rb b/spec/english_spec.rb
@@ -9,8 +9,8 @@
     StanfordCoreNLP.default_jars = [
       'joda-time.jar',
       'xom.jar',
-      'stanford-corenlp-3.3.1.jar',
-      'stanford-corenlp-3.3.1-models.jar',
+      'stanford-corenlp-3.4.1.jar',
+      'stanford-corenlp-3.4.1-models.jar',
       'jollyday.jar',
       'bridge.jar'
     ]
@@ -57,4 +57,4 @@
     pipeline.annotate(annotation)
     annotation.get(:sentences).size.should eql 2
   end
-end
+end
diff --git a/spec/french_spec.rb b/spec/french_spec.rb
@@ -10,8 +10,8 @@
     StanfordCoreNLP.default_jars = [
       'joda-time.jar',
       'xom.jar',
-      'stanford-corenlp-3.3.1.jar',
-      'stanford-corenlp-3.3.1-models.jar',
+      'stanford-corenlp-3.4.1.jar',
+      'stanford-corenlp-3.4.1-models.jar',
       'jollyday.jar',
       'bridge.jar'
     ]
@@ -36,4 +36,4 @@
       last_char.should eql [7, 8, 11, 16, 20, 23, 28, 35, 38, 46, 47, 50, 54, 56, 57, 58, 64, 67, 75, 76]
     end
   end
-end
+end
diff --git a/spec/german_spec.rb b/spec/german_spec.rb
@@ -10,8 +10,8 @@
     StanfordCoreNLP.default_jars = [
       'joda-time.jar',
       'xom.jar',
-      'stanford-corenlp-3.3.1.jar',
-      'stanford-corenlp-3.3.1-models.jar',
+      'stanford-corenlp-3.4.1.jar',
+      'stanford-corenlp-3.4.1-models.jar',
       'jollyday.jar',
       'bridge.jar'
     ]
@@ -33,4 +33,4 @@
       last_char.should eql [2, 7, 14, 19, 25, 31, 36, 44, 45]
     end
   end
-end
+end
diff --git a/spec/spec_helper.rb b/spec/spec_helper.rb
@@ -27,4 +27,4 @@ def get_information(text, with_name_tag=false, with_coref=false)
 
   [sentences, tokens, tags, lemmas, begin_char, last_char, name_tags, coref_ids]
 
-end
+end
Original file line number	Diff line number	Diff line change
Expand Up		@@ -27,4 +27,4 @@ def get_information(text, with_name_tag=false, with_coref=false)

		[sentences, tokens, tags, lemmas, begin_char, last_char, name_tags, coref_ids]

		end
		end