Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add text size to unstructured profiler #340

Merged
merged 18 commits into from
Jul 19, 2021
Merged

Add text size to unstructured profiler #340

merged 18 commits into from
Jul 19, 2021

Conversation

AnhTruong
Copy link
Contributor

  • add capacity to global stats
  • add tests

@JGSweets JGSweets added Medium Priority Significant improvement or bug / feature reducing overall performance New Feature A feature addition not currently in the library labels Jul 16, 2021
@JGSweets JGSweets enabled auto-merge (squash) July 19, 2021 16:13
JGSweets
JGSweets previously approved these changes Jul 19, 2021
:type data: Union[list, numpy.array, pandas.DataFrame]
:param unit: capacity unit (B, K, M, or G)
:type unit: string
:return: capacity of the input data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to update docstring to get rid of capacity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

}

# ensure all data are of type str
data = data.apply(str)

# get capacity
base_stats = {"memory_size": utils.get_capacity(data, unit='M')}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still called capacity

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed

README.md Outdated Show resolved Hide resolved
JGSweets
JGSweets previously approved these changes Jul 19, 2021
grant-eden
grant-eden previously approved these changes Jul 19, 2021
@@ -504,8 +504,39 @@ def get_memory_size(data, unit='M'):
if unit not in unit_map:
raise ValueError('Currently only supports the '
'memory size unit in {}'.format(list(unit_map.keys())))
capacity = 0
memory_size = 0
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this tabbed over?

@AnhTruong AnhTruong dismissed stale reviews from grant-eden and JGSweets via 0cb1cb9 July 19, 2021 20:27
grant-eden
grant-eden previously approved these changes Jul 19, 2021
@JGSweets JGSweets merged commit 6857bc1 into capitalone:main Jul 19, 2021
stevensecreti pushed a commit to stevensecreti/DataProfiler that referenced this pull request Jun 15, 2022
* add text size

* add error raise for unit

* clean code

* clean test

* fix test

* clean test

* clean test

* clean test

* clean test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Medium Priority Significant improvement or bug / feature reducing overall performance New Feature A feature addition not currently in the library
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants