Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The question of algorithm improvement #185

Closed
deedy5 opened this issue May 3, 2022 · 1 comment
Closed

The question of algorithm improvement #185

deedy5 opened this issue May 3, 2022 · 1 comment

Comments

@deedy5
Copy link
Contributor

deedy5 commented May 3, 2022

After fixing some bottlenecks (#183), from the performance test results table I selected those files from the dataset on which the program showed a runtime > 0.1.
performance_comparison_master.xlsx
0 1s

From these files I made a separate dataset
char-dataset_>0.1s.zip

and ran tests on it.


test file
test_0.1s.py

from glob import glob
from os.path import isdir
from charset_normalizer import detect

def performance_compare(size_coeff):
    if not isdir("./char-dataset_>0.1s"):
        print("This script require char-dataset_>0.1s to be cloned on package root directory")
        exit(1)
    for tbt_path in sorted(glob("./char-dataset_>0.1s/**/*.*")):
        with open(tbt_path, "rb") as fp:
            content = fp.read() * size_coeff            
        detect(content)

if __name__ == "__main__":
    performance_compare(1)

1. pprofile

pprofile --format callgrind --out cachegrind.out.0.1s.test test_0.1s.py

pprofile_test_0 1s
cachegrind.out.0.1s.zip

2. vprof heatmap

vprof -c h test_0.1s.py

vprof_heatmap
vprof (5_3_2022 10_48_28 AM).zip

@deedy5
Copy link
Contributor Author

deedy5 commented May 3, 2022

Sorry. The previous vprof test is not relevant, apparently this result was caused by lack of memory.
I reduced the size of dataset and left one file per encoding.
char-dataset_>0.1s.zip

  1. vprof
vprof -c h test_0.1s.py

screen

vprof (5_3_2022 1_13_21 PM).zip


There are no particularly pronounced bottlenecks.

The question is closed.

@deedy5 deedy5 closed this as completed May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

1 participant