Fedot crushes after using pandas dataframe in second time #943

aPovidlo · 2022-10-19T22:11:46Z

Fedot crushes after using pandas dataframe in second time. In my mind, Fedot edited data in the process of it running, even if this variable is passed from global space.

Traceback (most recent call last):
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\api\api_utils\assumptions\assumptions_handler.py", line 56, in fit_assumption_and_check_correctness
    data_train, data_test = train_test_data_setup(self.data)
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\core\data\data_split.py", line 222, in train_test_data_setup
    train_data, test_data = _train_test_single_data_setup(data, split_ratio, shuffle_flag, **kwargs)
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\core\data\data_split.py", line 183, in _train_test_single_data_setup
    train_data, test_data = split_func(data, task, split_ratio,
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\core\data\data_split.py", line 131, in _split_table
    return _split_any(data, task, DataTypesEnum.table, split_ratio, with_shuffle)
  File "C:\Users\andre\Documents\GitHub\FEDOT\fedot\core\data\data_split.py", line 104, in _split_any
    train_test_split(idx,
  File "C:\Users\andre\Documents\GitHub\FEDOT\venv\lib\site-packages\sklearn\model_selection\_split.py", line 2448, in train_test_split
    n_train, n_test = _validate_shuffle_split(
  File "C:\Users\andre\Documents\GitHub\FEDOT\venv\lib\site-packages\sklearn\model_selection\_split.py", line 2126, in _validate_shuffle_split
    raise ValueError(
ValueError: With n_samples=1, test_size=0.19999999999999996 and train_size=None, the resulting train set will be empty. Adjust any of the aforementioned parameters.

Script for repeating the bug:

problem = 'classification'

train_data_path = f'{fedot_project_root()}/cases/data/scoring/scoring_train.csv'
test_data_path = f'{fedot_project_root()}/cases/data/scoring/scoring_test.csv'

train = pd.read_csv(train_data_path)
test = pd.read_csv(test_data_path)

baseline_model = Fedot(problem=problem, timeout=10, seed=42)
baseline_model.fit(features=train, target='target', predefined_model='rf')

baseline_model.predict(features=test)
print(baseline_model.get_metrics())

auto_model = Fedot(problem=problem, seed=42, timeout=10, n_jobs=-1, preset='best_quality', max_pipeline_fit_time=5, metric='roc_auc')

auto_model.fit(features=train, target='target')

prediction = auto_model.predict_proba(features=test)
print(auto_model.get_metrics())

The text was updated successfully, but these errors were encountered:

… (#950) * some name/move refactorings * Fix inplace modification of input data during data define * fixup! some name/move refactorings

aPovidlo added the bug Something isn't working label Oct 19, 2022

gkirgizov self-assigned this Oct 20, 2022

gkirgizov mentioned this issue Oct 20, 2022

Fix inplace modification of data during data definition (resolves #943) #950

Merged

gkirgizov closed this as completed in #950 Oct 20, 2022

gkirgizov added a commit that referenced this issue Oct 20, 2022

Fix inplace modification of data during data definition (resolves #943)…

3381f4b

… (#950) * some name/move refactorings * Fix inplace modification of input data during data define * fixup! some name/move refactorings

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fedot crushes after using pandas dataframe in second time #943

Fedot crushes after using pandas dataframe in second time #943

aPovidlo commented Oct 19, 2022

Fedot crushes after using pandas dataframe in second time #943

Fedot crushes after using pandas dataframe in second time #943

Comments

aPovidlo commented Oct 19, 2022