ocr cleaner has bug with gcc library / scikit image version #9

vsoch · 2019-01-04T20:53:21Z

The entire container libraries / base needs to be debugged, unfortunately.

>>> maybe_text = dicom.select_text_among_candidates(saved_model)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "user/__init__.py", line 122, in select_text_among_candidates
    model = cPickle.load(fin)
  File "data/__init__.py", line 29, in <module>
    from sklearn.svm import LinearSVC
  File "/opt/anaconda2/lib/python2.7/site-packages/sklearn/svm/__init__.py", line 13, in <module>
    from .classes import SVC, NuSVC, SVR, NuSVR, OneClassSVM, LinearSVC
  File "/opt/anaconda2/lib/python2.7/site-packages/sklearn/svm/classes.py", line 1, in <module>
    from .base import BaseLibLinear, BaseSVC, BaseLibSVM
  File "/opt/anaconda2/lib/python2.7/site-packages/sklearn/svm/base.py", line 8, in <module>
    from . import libsvm, liblinear
ImportError: /opt/anaconda2/lib/python2.7/site-packages/sklearn/svm/libsvm.so: undefined symbol: __cxa_throw_bad_array_new_length

See notes in #8

The text was updated successfully, but these errors were encountered:

danielsnider · 2019-01-04T21:09:47Z

I've seen two people say to run: conda install libgcc
[1] scikit-learn/scikit-learn#7869 (comment)
[2] https://stackoverflow.com/questions/42181453/sklearn-modules-on-ubuntu-oracle-virtual-box-throw-error

Would you have time to try it?

vsoch · 2019-01-04T21:11:41Z

yep!

vsoch · 2019-01-04T21:12:00Z

Lord I hope the fix is that easy, an image that doesn't reproduce when you build it again is my worst nightmare.

vsoch · 2019-01-04T21:17:27Z

It could also help to try installing sckit-learn from conda instead of pip. But I have a terrible feeling there is going to be some new conflict with nolearn (I can't remember off the top of my head why I stayed with python 2.7 in the first place but it was some dependency issue).

vsoch · 2019-01-04T22:10:37Z

Okay, here is an update! The first error was with libgfortran:

  File "<string>", line 1, in <module>
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/__init__.py", line 170, in <module>
    from . import add_newdocs
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/lib/__init__.py", line 18, in <module>
    from .polynomial import *
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/lib/polynomial.py", line 19, in <module>
    from numpy.linalg import eigvals, lstsq, inv
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/linalg/__init__.py", line 51, in <module>
    from .linalg import *
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 29, in <module>
    from numpy.linalg import lapack_lite, _umath_linalg
ImportError: libgfortran.so.1: cannot open shared object file: No such file or directory

I resolved with:

conda install libgfortran==1

(if you install without the version you get another error). Then I get this error about numpy versions:

/opt/anaconda2/lib/python2.7/site-packages/dask/array/numpy_compat.py:32: RuntimeWarning: divide by zero encountered in divide
  not np.allclose(np.divide(1, .5, dtype='i8'), 2) or
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "user/__init__.py", line 2, in <module>
    from skimage.io import imread
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/io/__init__.py", line 7, in <module>
    from .manage_plugins import *
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/io/manage_plugins.py", line 28, in <module>
    from .collection import imread_collection_wrapper
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/io/collection.py", line 14, in <module>
    from ..external.tifffile import TiffFile
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/__init__.py", line 1, in <module>
    from .tifffile import imsave, imread, imshow, TiffFile, TiffWriter, TiffSequence
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.py", line 293, in <module>
    from . import _tifffile
RuntimeError: module compiled against API version 0xa but this version of numpy is 0x9

And I'm still trying random numpy versions (from repos where it's reported to work) to see if it resolves.

vsoch · 2019-01-04T22:12:01Z

It's been resolving the conda enviroment for easily 5 minutes now. :/

vsoch · 2019-01-04T22:16:08Z

Is it worth trying to update the entire thing to python 3+, or is that a forest path I don't want to venture down?

danielsnider · 2019-01-04T22:16:18Z

Ack! Thank you for fighting the good fight. I wish dependency hell was a thing of the past. Need smarter python. Daniel Snider ツ

…

On Fri, Jan 4, 2019 at 5:12 PM Vanessa Sochat ***@***.***> wrote: It's been resolving the conda enviroment for easily 5 minutes now. :/ — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABqDWMT8AysmENh_Ni6yAmnJa1tJNf9lks5u_9GxgaJpZM4ZtQp7> .

danielsnider · 2019-01-04T22:20:55Z

It's a scary forest. My recent adventure down that path may help you a lot. I recently got `pydicom` and `gdcm` working in py3. Here's how: pydicom/pydicom#331 (comment)

…

On Fri, Jan 4, 2019 at 5:16 PM Vanessa Sochat ***@***.***> wrote: Is it worth trying to update the entire thing to python 3+, or is that a forest path I don't want to venture down? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABqDWPFeFIA-mQxM1kvvd05NKgK-aMitks5u_9KpgaJpZM4ZtQp7> .

vsoch · 2019-01-04T22:23:07Z

Thanks, this might help! The issue is with scikit learn but maybe a global update can resolve still...

vsoch · 2019-01-04T23:16:51Z

okay, so this won't work unless the model is rebuilt from scratch. It was built with an older sklearn, specifically even if I can get the pickle to load the _classes attribute is missing:

AttributeError: 'LinearSVC' object has no attribute 'classes_'`

This would require downloading the entire CIFAR dataset and doing over. Did you test the original image and it doesn't work for you? -> https://hub.docker.com/r/vanessa/dicom-scraper

It's dangerous to use this as a base, but we could potentially do that and install gdcm to read your images. It of course is a (long term) bad idea because we will forever be stuck with that python version, etc., but if you want a quick way to run it that might be easiest.

danielsnider · 2019-01-05T06:00:59Z

That’s sad. Sorry about that. I appreciate your smart, pragmatic advice. The original docker image for the OCR scraper didn’t like my compressed dicom images. If you can share any results showing how well the ocr scraper works that would help me consider the options. We could trade notes later next week! Thank you again,

…

On Jan 4, 2019, at 6:16 PM, Vanessa Sochat ***@***.***> wrote: okay, so this won't work unless the model is rebuilt from scratch. It was built with an older sklearn, specifically even if I can get the pickle to load the _classes attribute is missing: AttributeError: 'LinearSVC' object has no attribute 'classes_'` This would require downloading the entire CIFAR dataset and doing over. Did you test the original image and it doesn't work for you? -> https://hub.docker.com/r/vanessa/dicom-scraper It's dangerous to use this as a base, but we could potentially do that and install gdcm to read your images. It of course is a (long term) bad idea because we will forever be stuck with that python version, etc., but if you want a quick way to run it that might be easiest. — You are receiving this because you commented. Reply to this email directly, view it on GitHub, or mute the thread.

vsoch · 2019-01-05T07:22:14Z

Hey I haven't lost hope - there are still two things to try!

using the original as a base image and installing gdcm
rebuilding the model

I'll try both this weekend and post an update. It would be really cool to be able to do that comparison! :)

vsoch · 2019-01-05T19:08:01Z

hey @danielsnider this isn't going to easily work unfortunately, and even rebuilding the model would require substantial refactoring that would probably require a full time effort (I do this in my free time, mostly for fun). You can likely use the old image if you can find non-gdcm images, but it's probably not worth it.

I'm generally unhappy and disappointed with this work, and wish I could allocate the time to do it over - it was literally a small weekend project I did and then nobody needed it, so I didn't work on it further. Do you think it's worth trying to plug in some newer / better OCR implementation and update the image so you have something to test against?

danielsnider · 2019-01-05T19:21:10Z

I'm generally disappointed with python dependencies! No worries tho. I've got a presentation Monday so I have to stick to my OCR implementation at the moment. I'll let you know how goes and I'll be very happy to share it nicely. Daniel Snider ツ

…

On Sat, Jan 5, 2019 at 2:08 PM Vanessa Sochat ***@***.***> wrote: hey @danielsnider <https://github.com/danielsnider> this isn't going to easily work unfortunately, and even rebuilding the model would require substantial refactoring that would probably require a full time effort (I do this in my free time, mostly for fun). You can likely use the old image if you can find non-gdcm images, but it's probably not worth it. I'm generally unhappy and disappointed with this work, and wish I could allocate the time to do it over - it was literally a small weekend project I did and then nobody needed it, so I didn't work on it further. Do you think it's worth trying to plug in some newer / better OCR implementation and update the image so you have something to test against? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/ABqDWN8c_NW5taftkGwfd5Obwy0-QoaPks5vAPgSgaJpZM4ZtQp7> .

NJ2020 · 2019-11-02T10:24:13Z

It's a scary forest. My recent adventure down that path may help you a lot. I recently got pydicom and gdcm working in py3. Here's how: pydicom/pydicom#331 (comment)
…
On Fri, Jan 4, 2019 at 5:16 PM Vanessa Sochat @.***> wrote: Is it worth trying to update the entire thing to python 3+, or is that a forest path I don't want to venture down? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABqDWPFeFIA-mQxM1kvvd05NKgK-aMitks5u_9KpgaJpZM4ZtQp7 .

Thanks. How do we do the same thing for Windows10?

vsoch mentioned this issue Jan 4, 2019

Add GDCM dependency to Dockerfile #5

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ocr cleaner has bug with gcc library / scikit image version #9

ocr cleaner has bug with gcc library / scikit image version #9

vsoch commented Jan 4, 2019

danielsnider commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

danielsnider commented Jan 4, 2019 via email

danielsnider commented Jan 4, 2019 via email

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

danielsnider commented Jan 5, 2019 via email

vsoch commented Jan 5, 2019

vsoch commented Jan 5, 2019

danielsnider commented Jan 5, 2019 via email

NJ2020 commented Nov 2, 2019

ocr cleaner has bug with gcc library / scikit image version #9

ocr cleaner has bug with gcc library / scikit image version #9

Comments

vsoch commented Jan 4, 2019

danielsnider commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

danielsnider commented Jan 4, 2019 via email

danielsnider commented Jan 4, 2019 via email

vsoch commented Jan 4, 2019

vsoch commented Jan 4, 2019

danielsnider commented Jan 5, 2019 via email

vsoch commented Jan 5, 2019

vsoch commented Jan 5, 2019

danielsnider commented Jan 5, 2019 via email

NJ2020 commented Nov 2, 2019