Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ocr cleaner has bug with gcc library / scikit image version #9

Open
vsoch opened this issue Jan 4, 2019 · 16 comments
Open

ocr cleaner has bug with gcc library / scikit image version #9

vsoch opened this issue Jan 4, 2019 · 16 comments

Comments

@vsoch
Copy link
Member

vsoch commented Jan 4, 2019

The entire container libraries / base needs to be debugged, unfortunately.

>>> maybe_text = dicom.select_text_among_candidates(saved_model)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "user/__init__.py", line 122, in select_text_among_candidates
    model = cPickle.load(fin)
  File "data/__init__.py", line 29, in <module>
    from sklearn.svm import LinearSVC
  File "/opt/anaconda2/lib/python2.7/site-packages/sklearn/svm/__init__.py", line 13, in <module>
    from .classes import SVC, NuSVC, SVR, NuSVR, OneClassSVM, LinearSVC
  File "/opt/anaconda2/lib/python2.7/site-packages/sklearn/svm/classes.py", line 1, in <module>
    from .base import BaseLibLinear, BaseSVC, BaseLibSVM
  File "/opt/anaconda2/lib/python2.7/site-packages/sklearn/svm/base.py", line 8, in <module>
    from . import libsvm, liblinear
ImportError: /opt/anaconda2/lib/python2.7/site-packages/sklearn/svm/libsvm.so: undefined symbol: __cxa_throw_bad_array_new_length

See notes in #8

@danielsnider
Copy link

I've seen two people say to run: conda install libgcc
[1] scikit-learn/scikit-learn#7869 (comment)
[2] https://stackoverflow.com/questions/42181453/sklearn-modules-on-ubuntu-oracle-virtual-box-throw-error

Would you have time to try it?

@vsoch
Copy link
Member Author

vsoch commented Jan 4, 2019

yep!

@vsoch
Copy link
Member Author

vsoch commented Jan 4, 2019

Lord I hope the fix is that easy, an image that doesn't reproduce when you build it again is my worst nightmare.

@vsoch
Copy link
Member Author

vsoch commented Jan 4, 2019

It could also help to try installing sckit-learn from conda instead of pip. But I have a terrible feeling there is going to be some new conflict with nolearn (I can't remember off the top of my head why I stayed with python 2.7 in the first place but it was some dependency issue).

@vsoch
Copy link
Member Author

vsoch commented Jan 4, 2019

Okay, here is an update! The first error was with libgfortran:

  File "<string>", line 1, in <module>
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/__init__.py", line 170, in <module>
    from . import add_newdocs
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/add_newdocs.py", line 13, in <module>
    from numpy.lib import add_newdoc
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/lib/__init__.py", line 18, in <module>
    from .polynomial import *
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/lib/polynomial.py", line 19, in <module>
    from numpy.linalg import eigvals, lstsq, inv
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/linalg/__init__.py", line 51, in <module>
    from .linalg import *
  File "/opt/anaconda2/lib/python2.7/site-packages/numpy/linalg/linalg.py", line 29, in <module>
    from numpy.linalg import lapack_lite, _umath_linalg
ImportError: libgfortran.so.1: cannot open shared object file: No such file or directory

I resolved with:

conda install libgfortran==1

(if you install without the version you get another error). Then I get this error about numpy versions:

/opt/anaconda2/lib/python2.7/site-packages/dask/array/numpy_compat.py:32: RuntimeWarning: divide by zero encountered in divide
  not np.allclose(np.divide(1, .5, dtype='i8'), 2) or
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "user/__init__.py", line 2, in <module>
    from skimage.io import imread
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/io/__init__.py", line 7, in <module>
    from .manage_plugins import *
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/io/manage_plugins.py", line 28, in <module>
    from .collection import imread_collection_wrapper
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/io/collection.py", line 14, in <module>
    from ..external.tifffile import TiffFile
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/__init__.py", line 1, in <module>
    from .tifffile import imsave, imread, imshow, TiffFile, TiffWriter, TiffSequence
  File "/opt/anaconda2/lib/python2.7/site-packages/skimage/external/tifffile/tifffile.py", line 293, in <module>
    from . import _tifffile
RuntimeError: module compiled against API version 0xa but this version of numpy is 0x9

And I'm still trying random numpy versions (from repos where it's reported to work) to see if it resolves.

@vsoch
Copy link
Member Author

vsoch commented Jan 4, 2019

It's been resolving the conda enviroment for easily 5 minutes now. :/

@vsoch
Copy link
Member Author

vsoch commented Jan 4, 2019

Is it worth trying to update the entire thing to python 3+, or is that a forest path I don't want to venture down?

@danielsnider
Copy link

danielsnider commented Jan 4, 2019 via email

@danielsnider
Copy link

danielsnider commented Jan 4, 2019 via email

@vsoch
Copy link
Member Author

vsoch commented Jan 4, 2019

Thanks, this might help! The issue is with scikit learn but maybe a global update can resolve still...

@vsoch
Copy link
Member Author

vsoch commented Jan 4, 2019

okay, so this won't work unless the model is rebuilt from scratch. It was built with an older sklearn, specifically even if I can get the pickle to load the _classes attribute is missing:

AttributeError: 'LinearSVC' object has no attribute 'classes_'`

This would require downloading the entire CIFAR dataset and doing over. Did you test the original image and it doesn't work for you? -> https://hub.docker.com/r/vanessa/dicom-scraper

It's dangerous to use this as a base, but we could potentially do that and install gdcm to read your images. It of course is a (long term) bad idea because we will forever be stuck with that python version, etc., but if you want a quick way to run it that might be easiest.

@danielsnider
Copy link

danielsnider commented Jan 5, 2019 via email

@vsoch
Copy link
Member Author

vsoch commented Jan 5, 2019

Hey I haven't lost hope - there are still two things to try!

  • using the original as a base image and installing gdcm
  • rebuilding the model

I'll try both this weekend and post an update. It would be really cool to be able to do that comparison! :)

@vsoch
Copy link
Member Author

vsoch commented Jan 5, 2019

hey @danielsnider this isn't going to easily work unfortunately, and even rebuilding the model would require substantial refactoring that would probably require a full time effort (I do this in my free time, mostly for fun). You can likely use the old image if you can find non-gdcm images, but it's probably not worth it.

I'm generally unhappy and disappointed with this work, and wish I could allocate the time to do it over - it was literally a small weekend project I did and then nobody needed it, so I didn't work on it further. Do you think it's worth trying to plug in some newer / better OCR implementation and update the image so you have something to test against?

@danielsnider
Copy link

danielsnider commented Jan 5, 2019 via email

@NJ2020
Copy link

NJ2020 commented Nov 2, 2019

It's a scary forest. My recent adventure down that path may help you a lot. I recently got pydicom and gdcm working in py3. Here's how: pydicom/pydicom#331 (comment)

On Fri, Jan 4, 2019 at 5:16 PM Vanessa Sochat @.***> wrote: Is it worth trying to update the entire thing to python 3+, or is that a forest path I don't want to venture down? — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#9 (comment)>, or mute the thread https://github.com/notifications/unsubscribe-auth/ABqDWPFeFIA-mQxM1kvvd05NKgK-aMitks5u_9KpgaJpZM4ZtQp7 .

Thanks. How do we do the same thing for Windows10?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants