Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ray component: tune-sklearn #40554

Closed
TheDohn opened this issue Oct 22, 2023 · 2 comments
Closed

Ray component: tune-sklearn #40554

TheDohn opened this issue Oct 22, 2023 · 2 comments
Assignees
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks tune Tune-related issues

Comments

@TheDohn
Copy link

TheDohn commented Oct 22, 2023

What happened + What you expected to happen

  • Activated the Conda environment as defined below.
  • Sourced the random forest example via python random_forest.py.
  • Original script can be found here: https://github.com/ray-project/tune-sklearn/blob/master/examples/random_forest.py)
  • I expected it to run successfully and print the accuracy.
  • Instead I got the following error
    DeprecationWarning: fetch_trial_dataframesis deprecated. Access thetrial_dataframes property instead.

Versions / Dependencies

  • Operating system:
    ** ProductName: macOS
    ** ProductVersion: 13.6 (Ventura)
    ** BuildVersion: 22G120

The conda environment is defined by the following yaml (note that ray==2.7.1):

name: ray_testing
channels:
  - conda-forge
dependencies:
  - bzip2=1.0.8=h3422bc3_4
  - ca-certificates=2023.7.22=hf0a4a13_0
  - libffi=3.4.2=h3422bc3_5
  - libsqlite=3.43.2=h091b4b1_0
  - libzlib=1.2.13=h53f4e23_5
  - ncurses=6.4=h7ea286d_0
  - openssl=3.1.3=h53f4e23_0
  - pip=23.3.1=pyhd8ed1ab_0
  - python=3.11.0=h3ba56d0_1_cpython
  - readline=8.2=h92ec313_1
  - setuptools=68.2.2=pyhd8ed1ab_0
  - tk=8.6.13=hb31c410_0
  - wheel=0.41.2=pyhd8ed1ab_0
  - xz=5.2.6=h57fd34a_0
  - pip:
      - aiohttp==3.8.6
      - aiohttp-cors==0.7.0
      - aiorwlock==1.3.0
      - aiosignal==1.3.1
      - anyio==3.7.1
      - async-timeout==4.0.3
      - attrs==23.1.0
      - blessed==1.20.0
      - cachetools==5.3.1
      - certifi==2023.7.22
      - charset-normalizer==3.3.0
      - click==8.1.7
      - colorful==0.5.5
      - distlib==0.3.7
      - fastapi==0.104.0
      - filelock==3.12.4
      - frozenlist==1.4.0
      - fsspec==2023.10.0
      - google-api-core==2.12.0
      - google-auth==2.23.3
      - googleapis-common-protos==1.61.0
      - gpustat==1.1.1
      - grpcio==1.59.0
      - h11==0.14.0
      - idna==3.4
      - joblib==1.3.2
      - jsonschema==4.19.1
      - jsonschema-specifications==2023.7.1
      - msgpack==1.0.7
      - multidict==6.0.4
      - numpy==1.26.1
      - nvidia-ml-py==12.535.108
      - opencensus==0.11.3
      - opencensus-context==0.1.3
      - packaging==23.2
      - pandas==2.1.1
      - platformdirs==3.11.0
      - prometheus-client==0.17.1
      - protobuf==4.24.4
      - psutil==5.9.6
      - py-spy==0.3.14
      - pyarrow==13.0.0
      - pyasn1==0.5.0
      - pyasn1-modules==0.3.0
      - pydantic==1.10.13
      - python-dateutil==2.8.2
      - pytz==2023.3.post1
      - pyyaml==6.0.1
      - ray==2.7.1
      - referencing==0.30.2
      - requests==2.31.0
      - rpds-py==0.10.6
      - rsa==4.9
      - scikit-learn==1.3.1
      - scipy==1.11.3
      - six==1.16.0
      - smart-open==6.4.0
      - sniffio==1.3.0
      - starlette==0.27.0
      - tensorboardx==2.6.2.2
      - threadpoolctl==3.2.0
      - tune-sklearn==0.4.6
      - typing-extensions==4.8.0
      - tzdata==2023.3
      - urllib3==2.0.7
      - uvicorn==0.23.2
      - virtualenv==20.21.0
      - watchfiles==0.21.0
      - wcwidth==0.2.8
      - yarl==1.9.2
prefix: /opt/homebrew/Caskroom/miniconda/base/envs/ray_testing

Note that while the above environment fails, the following environment works. Implying there was a breaking change introduce in between Ray 2.5.1 and 2.7.1

name: ray_testing_2.5
channels:
  - conda-forge
dependencies:
  - bzip2=1.0.8=h3422bc3_4
  - c-ares=1.20.1=h93a5062_1
  - ca-certificates=2023.7.22=hf0a4a13_0
  - grpcio=1.59.1=py311h79dd126_0
  - libabseil=20230802.1=cxx17_h13dd4ca_0
  - libcxx=16.0.6=h4653b0c_0
  - libexpat=2.5.0=hb7217d7_1
  - libffi=3.4.2=h3422bc3_5
  - libgrpc=1.59.1=hbcf6334_0
  - libprotobuf=4.24.4=hc9861d8_0
  - libre2-11=2023.06.02=h1753957_0
  - libsqlite=3.43.2=h091b4b1_0
  - libzlib=1.2.13=h53f4e23_5
  - ncurses=6.4=h7ea286d_0
  - openssl=3.1.3=h53f4e23_0
  - pip=23.3.1=pyhd8ed1ab_0
  - python=3.11.6=h47c9636_0_cpython
  - python_abi=3.11=4_cp311
  - re2=2023.06.02=h6135d0a_0
  - readline=8.2=h92ec313_1
  - setuptools=68.2.2=pyhd8ed1ab_0
  - tk=8.6.13=hb31c410_0
  - wheel=0.41.2=pyhd8ed1ab_0
  - xz=5.2.6=h57fd34a_0
  - pip:
      - aiosignal==1.3.1
      - attrs==23.1.0
      - certifi==2023.7.22
      - charset-normalizer==3.3.1
      - click==8.1.7
      - filelock==3.12.4
      - frozenlist==1.4.0
      - idna==3.4
      - joblib==1.3.2
      - jsonschema==4.19.1
      - jsonschema-specifications==2023.7.1
      - msgpack==1.0.7
      - numpy==1.26.1
      - packaging==23.2
      - pandas==2.1.1
      - protobuf==4.24.4
      - pyarrow==13.0.0
      - python-dateutil==2.8.2
      - pytz==2023.3.post1
      - pyyaml==6.0.1
      - ray==2.5.1
      - referencing==0.30.2
      - requests==2.31.0
      - rpds-py==0.10.6
      - scikit-learn==1.3.1
      - scipy==1.11.3
      - six==1.16.0
      - tensorboardx==2.6.2.2
      - threadpoolctl==3.2.0
      - tune-sklearn==0.4.6
      - tzdata==2023.3
      - urllib3==2.0.7
prefix: /opt/homebrew/Caskroom/miniconda/base/envs/ray_testing_2.5

Reproduction script

The script is a minor variation of the example at https://github.com/ray-project/tune-sklearn/blob/master/examples/random_forest.py. 

"""
An example training a RandomForestClassifier, performing
randomized search using TuneSearchCV.
"""

from tune_sklearn import TuneSearchCV
from sklearn.ensemble import RandomForestClassifier
from sklearn import datasets
from sklearn.model_selection import train_test_split
from scipy.stats import randint
import numpy as np

import os # I added this

digits = datasets.load_digits()
x = digits.data
y = digits.target
x_train, x_test, y_train, y_test = train_test_split(x, y, test_size=.2)

clf = RandomForestClassifier()
param_distributions = {
    "n_estimators": randint(20, 80),
    "max_depth": randint(2, 10)
}

tune_search = TuneSearchCV(
    clf,
    param_distributions,
    n_trials=3, 
    # I added local_dir to prevent this error: https://github.com/ray-project/ray/issues/40349
    local_dir = os.getcwd() + '/ray_examples/checkpoints', 
    verbose = True
        )

tune_search.fit(x_train, y_train)

pred = tune_search.predict(x_test)
accuracy = np.count_nonzero(np.array(pred) == np.array(y_test)) / len(pred)
print(accuracy)```



### Issue Severity

Medium: It is a significant difficulty but I can work around it.
@TheDohn TheDohn added bug Something that is supposed to be working; but isn't triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 22, 2023
@anyscalesam anyscalesam added the tune Tune-related issues label Oct 23, 2023
@matthewdeng matthewdeng added P1 Issue that should be fixed within a few weeks and removed triage Needs triage (eg: priority, bug/not-bug, and owning component) labels Oct 24, 2023
@justinvyu
Copy link
Contributor

This will be fixed by ray-project/tune-sklearn#272, I will update this thread once a new release with the patch is in.

@justinvyu
Copy link
Contributor

This has been fixed in tune-sklearn==0.5.0. Let me know if that works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something that is supposed to be working; but isn't P1 Issue that should be fixed within a few weeks tune Tune-related issues
Projects
None yet
Development

No branches or pull requests

4 participants