The question of algorithm improvement #185

deedy5 · 2022-05-03T08:42:26Z

After fixing some bottlenecks (#183), from the performance test results table I selected those files from the dataset on which the program showed a runtime > 0.1.
performance_comparison_master.xlsx

From these files I made a separate dataset
char-dataset_>0.1s.zip

and ran tests on it.

test file
test_0.1s.py

from glob import glob
from os.path import isdir
from charset_normalizer import detect

def performance_compare(size_coeff):
    if not isdir("./char-dataset_>0.1s"):
        print("This script require char-dataset_>0.1s to be cloned on package root directory")
        exit(1)
    for tbt_path in sorted(glob("./char-dataset_>0.1s/**/*.*")):
        with open(tbt_path, "rb") as fp:
            content = fp.read() * size_coeff            
        detect(content)

if __name__ == "__main__":
    performance_compare(1)

1. pprofile

pprofile --format callgrind --out cachegrind.out.0.1s.test test_0.1s.py

cachegrind.out.0.1s.zip

2. vprof heatmap

vprof -c h test_0.1s.py

vprof (5_3_2022 10_48_28 AM).zip

The text was updated successfully, but these errors were encountered:

deedy5 · 2022-05-03T10:29:41Z

Sorry. The previous vprof test is not relevant, apparently this result was caused by lack of memory.
I reduced the size of dataset and left one file per encoding.
char-dataset_>0.1s.zip

vprof

vprof -c h test_0.1s.py

vprof (5_3_2022 1_13_21 PM).zip

There are no particularly pronounced bottlenecks.

The question is closed.

deedy5 closed this as completed May 3, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

The question of algorithm improvement #185

The question of algorithm improvement #185

deedy5 commented May 3, 2022 •

edited

Loading

deedy5 commented May 3, 2022

The question of algorithm improvement #185

The question of algorithm improvement #185

Comments

deedy5 commented May 3, 2022 • edited Loading

deedy5 commented May 3, 2022

There are no particularly pronounced bottlenecks.

deedy5 commented May 3, 2022 •

edited

Loading