Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Switched inverse function to run pseudo-inverse, to account for correlated and 0 variance variables #5

Merged
merged 7 commits into from
May 3, 2017
Merged
Show file tree
Hide file tree
Changes from 6 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions Kruskals/kruskals.py
Original file line number Diff line number Diff line change
Expand Up @@ -67,7 +67,7 @@ def driver_score(self, directional=False):
self._driver_score = (pij_row_mean + pijm_row_mean) / ((ind_c - 1) + fact)
if directional:
self._driver_score = self._driver_score * np.apply_along_axis(self.correlation_coef, 0, self._ndarr, self._arr)
return self._driver_score
return np.nan_to_num(self._driver_score)

def percentage(self):
"""
Expand All @@ -94,7 +94,7 @@ def pcor_squared(ndarr):
"""
Internal method to calculate the partial correlation squared
"""
icvx = np.linalg.inv(np.cov(ndarr))
icvx = np.linalg.pinv(np.cov(ndarr))
return (icvx[0, 1] * icvx[0, 1]) / (icvx[0, 0] * icvx[1, 1])

def percentage(self):
Expand Down
4 changes: 4 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,10 @@ Kruskal's Driver Analysis (Not to be confused with his Distance Measure algorith

This package provides a python implementation of [Kruskal's Algorithm](https://en.wikipedia.org/wiki/Kruskal%27s_algorithm)

Caveats
------------

To calculate the inverse it uses the (Moore–Penrose pseudoinverse)[https://en.wikipedia.org/wiki/Moore%E2%80%93Penrose_pseudoinverse] which permits highly correlated independent variables to be passed as well as variables that have zero variance. It is up to the user of this library to ensure they are comfortable with this. N.B. if the normal matrix inversion would work, that is used, the psuedoinverse is only applied if the former fails.

Installation
------------
Expand Down
2 changes: 0 additions & 2 deletions circle.yml
Original file line number Diff line number Diff line change
Expand Up @@ -9,5 +9,3 @@ dependencies:
test:
override:
- detox
- py.test --cov=./
- codecov --token=88ffd259-37b7-4c79-89a2-b4a00b83b519
23 changes: 23 additions & 0 deletions tests/test_kruskals.py
Original file line number Diff line number Diff line change
Expand Up @@ -121,3 +121,26 @@ def test_that_direction_is_applied_on_directional_drivers_analysis():
series = Kruskals.Kruskals(ndarr, arr).driver_score_to_series(True)

assert (series.values < 0).any()

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also I think this should be a double newline

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not given that the others aren't. We can correct in a different PR if you feel strongly

def test_ability_to_handle_all_same_type():
"""
Test to make sure that kruskals can handle data
when all the values for and independent set are 0
"""
ndarr = np.array([
[10, 0, 3, 4, 5, 6],
[6, 0, 4, 3, 5, 1],
[1, 0, 9, 1, 5, 1],
[9, 0, 2, 2, 5, 2],
[3, 0, 3, 9, 5, 3],
[1, 0, 2, 9, 5, 4],
[1, 0, 2, 9, 5, 4],
[1, 0, 2, 9, 5, 4]
])

arr = np.array([1, 2, 3, 4, 5, 6, 7, 8])

series = Kruskals.Kruskals(ndarr, arr).driver_score()

assert series[1] == 0.0
assert series[4] == 0.0
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

New line at end of file

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

☺️

9 changes: 7 additions & 2 deletions tox.ini
Original file line number Diff line number Diff line change
@@ -1,7 +1,12 @@
[tox]
skipsdist = True
skipsdist = true
envlist = py27,py34,py35,py36
[testenv]
# necessary to make cov find the .coverage file
# see http://blog.ionelmc.ro/2014/05/25/python-packaging/
usedevelop = true
skip_install = true
commands =
py.test
py.test --cov=./
codecov --token=88ffd259-37b7-4c79-89a2-b4a00b83b519
publish: python setup.py sdist upload --sign -r pypi