-
Notifications
You must be signed in to change notification settings - Fork 50
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Similarity Module (trajectroy distances) #13
Conversation
svenruf
commented
Sep 16, 2020
- Similarity Module to calculate tajectory distances (DTW, EDR, ...)
- Storing the ids of extracted triplegs also in the original positionfixes -> pfs can be used for further calculations
trackintel/similarity/detection.py
Outdated
from console_progressbar import ProgressBar | ||
|
||
|
||
def e_dist_tuples(c1,c2): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should be replaced with a real euclidean distance function
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good work but we need to add some changes so that we can integrate it.
trackintel/similarity/__init__.py
Outdated
@@ -0,0 +1,2 @@ | |||
from .measures import * | |||
from .detection import * |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dominikbucher Is this something that we should do in every submodule?
trackintel/similarity/detection.py
Outdated
def e_dist_tuples(c1,c2): | ||
return np.sqrt((c1[0]-c2[0])**2 + (c1[1]-c2[1])**2) | ||
|
||
def similarity_matrix(data, method, field='tripleg_id', trsh=None, eps=None, dist=False, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there are some critical adjustments that have to be made before we can merge this.
- I don't think that tripleg similarity should be based on positionfixes with an ID column but only on the tripleg data (and then on the query points of the tripleg). This will allow us to cover use cases where we have triplegs but no positionfixes (e.g., GreenClass).
- I don't think that the matrix assignment should be ID based. At the moment we are calculating a full distance matrix (meaning a distance between all pairs) but store it in a sparse matrix. I can see the need to maintain the relation between IDs and rows but it should be done by either returning an ID vector or by telling the user in the documentation that the order in the distance matrix is the same as the order in the dataframes that were given to the function.
- The function should either call or be merged with the trackintel.geogr.distances.calculate_distance_matrix function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
edit: The function actually supports sparse calculation
trackintel/similarity/measures.py
Outdated
|
||
|
||
|
||
def ses_extract(tpl): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
doc missing
trackintel/similarity/measures.py
Outdated
@@ -0,0 +1,215 @@ | |||
""" | |||
Algorithms e_dtw and e_edr adapted from https://github.com/bguillouet/traj-dist |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the idea of having a collection of trajectory distance metrics as it is done in measures.py. In calculate_distance_matrix we tried to be as compatible as possible to scikit-learn. I think here it would also be a good idea to try to implement valid metrics so that we can _piggyback _ on the pairwise_distances implementation of scikit (maybe it is even possible to use their parallelization).
I think this is mostly already done here.
Update for merge plan:
|
… later. The dtw and edit distance implementations are based on point geodataframes (using position fixes) which is not compatible at the moment.
Codecov Report
@@ Coverage Diff @@
## master #13 +/- ##
==========================================
+ Coverage 48.91% 50.96% +2.05%
==========================================
Files 27 29 +2
Lines 1147 1238 +91
Branches 174 197 +23
==========================================
+ Hits 561 631 +70
- Misses 528 543 +15
- Partials 58 64 +6
Continue to review full report at Codecov.
|
@hong2223 : I'd merge it now if you have no other comments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
.