Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fedot crushes after using pandas dataframe in second time #943

Closed
aPovidlo opened this issue Oct 19, 2022 · 0 comments · Fixed by #950
Closed

Fedot crushes after using pandas dataframe in second time #943

aPovidlo opened this issue Oct 19, 2022 · 0 comments · Fixed by #950
Assignees
Labels
bug Something isn't working

Comments

@aPovidlo
Copy link
Collaborator

Fedot crushes after using pandas dataframe in second time. In my mind, Fedot edited data in the process of it running, even if this variable is passed from global space.

Traceback (most recent call last):
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\api\api_utils\assumptions\assumptions_handler.py", line 56, in fit_assumption_and_check_correctness
    data_train, data_test = train_test_data_setup(self.data)
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\core\data\data_split.py", line 222, in train_test_data_setup
    train_data, test_data = _train_test_single_data_setup(data, split_ratio, shuffle_flag, **kwargs)
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\core\data\data_split.py", line 183, in _train_test_single_data_setup
    train_data, test_data = split_func(data, task, split_ratio,
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\core\data\data_split.py", line 131, in _split_table
    return _split_any(data, task, DataTypesEnum.table, split_ratio, with_shuffle)
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\core\data\data_split.py", line 104, in _split_any
    train_test_split(idx,
  File "C:\Users\andre\Documents\GitHub\FEDOT\venv\lib\site-packages\sklearn\model_selection\_split.py", line 2448, in train_test_split
    n_train, n_test = _validate_shuffle_split(
  File "C:\Users\andre\Documents\GitHub\FEDOT\venv\lib\site-packages\sklearn\model_selection\_split.py", line 2126, in _validate_shuffle_split
    raise ValueError(
ValueError: With n_samples=1, test_size=0.19999999999999996 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

Script for repeating the bug:

problem = 'classification'

train_data_path = f'{fedot_project_root()}/cases/data/scoring/scoring_train.csv'
test_data_path = f'{fedot_project_root()}/cases/data/scoring/scoring_test.csv'

train = pd.read_csv(train_data_path)
test = pd.read_csv(test_data_path)

baseline_model = Fedot(problem=problem, timeout=10, seed=42)
baseline_model.fit(features=train, target='target', predefined_model='rf')

baseline_model.predict(features=test)
print(baseline_model.get_metrics())

auto_model = Fedot(problem=problem, seed=42, timeout=10, n_jobs=-1, preset='best_quality', max_pipeline_fit_time=5, metric='roc_auc')

auto_model.fit(features=train, target='target')

prediction = auto_model.predict_proba(features=test)
print(auto_model.get_metrics())
@aPovidlo aPovidlo added the bug Something isn't working label Oct 19, 2022
@gkirgizov gkirgizov self-assigned this Oct 20, 2022
gkirgizov added a commit that referenced this issue Oct 20, 2022
… (#950)

* some name/move refactorings

* Fix inplace modification of input data during data define

* fixup! some name/move refactorings
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants