Skip to content

Step 5 Preparing The Database

TBuscher137 edited this page Mar 11, 2016 · 2 revisions
  1. Open your browser and go to localhost:5984/_utils/
    IF THIS DOESN’T OPEN THE DATABASE, SOMETHING WENT WRONG BEFORE THIS POINT
  2. Type cd
  3. Type cd tools/
  4. Run bash ./getcorpora.sh /home/vagrant/
  5. Run bash ./en-sp-align_words.sh /home/vagrant/ /home/vagrant/tools/mgiza_configfile (This will take ~15 minutes)
  6. Type head -n 99 ../corpora/src_trg.dict.A3.final.part000 > ./sample_data.txt (This shortens the processing time during development by reducing the number of records loaded into the database)
  7. Run python ./parse_mgiza.py /home/vagrant/tools/sample_data.txt ./sample.out
  8. Go to Step 6