Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Similarity Module (trajectroy distances) #13

Merged
merged 54 commits into from
Nov 17, 2020
Merged
Show file tree
Hide file tree
Changes from 40 commits
Commits
Show all changes
54 commits
Select commit Hold shift + click to select a range
6069414
Update postgis.py
svenruf Mar 11, 2020
c239d21
Update postgis.py
svenruf Mar 11, 2020
2564b0c
Update positionfixes.py
svenruf Mar 12, 2020
20cce15
copy from installed fw
svenruf Mar 18, 2020
83e1e2b
Update postgis.py
svenruf Mar 18, 2020
a9fdcc6
Revert "Update postgis.py"
svenruf Mar 18, 2020
203ea38
Revert "copy from installed fw"
svenruf Mar 18, 2020
ac27391
corrected syntax
Mar 18, 2020
ce3776c
syntax korrigiert
Mar 19, 2020
8fe89c4
corrected index error
Mar 19, 2020
07e87d9
updated read epsg to geoPandas 0.7
Mar 19, 2020
26d1345
Merge branch 'bugfixes_sr' into exp_sr
Mar 19, 2020
620af94
updated rest of io.postgis.py to geoPandas 0.7 (Read SRID)
Mar 19, 2020
d379f2a
updated read SRID to geoPandas 0.7
Mar 19, 2020
3791b15
Merge branch 'bugfixes_sr' into exp_sr
Mar 19, 2020
78d228f
updated case 3 in preprocessing/positionfixes to write tripleg_id int…
Mar 19, 2020
ea405fa
added first parts of similarity module
Mar 23, 2020
afd49b4
no message
Mar 25, 2020
2d86a4a
built detection method in similarity
Mar 31, 2020
dc51fd7
continued writing detection method
Mar 31, 2020
a092dba
Added similarity to testing
Apr 1, 2020
9e317ad
testing + test data
Apr 7, 2020
bd9bc56
continued implement testing of similartiy module
Apr 8, 2020
0c7e954
update testing
Apr 16, 2020
ddfd667
div bugfixes and changes
Apr 20, 2020
e11f42a
corrected tripleg_id writing into positionfixes, case 1
Apr 21, 2020
096a3ac
div in testing
Apr 27, 2020
3016156
Changed testing due to the "write tripleg_id in pfs" feature.
Apr 27, 2020
38baa9e
Built min_dist method for pre-check of 2 trajectories, inverted dtw d…
Apr 28, 2020
7e5c3c7
doc in similarity_detection + progress bar
Apr 28, 2020
7103bcc
div, edr added
May 3, 2020
c99f2c2
div. changes
May 5, 2020
f4b3326
begin to build start_end_similarity
May 5, 2020
afaa5ac
div.
May 12, 2020
3d09e92
norm by length of trajecotry in dtw
May 14, 2020
6bd6ba9
div changes and bugfixes in measures.ses and detection
May 18, 2020
7babcf1
Added Doc of Start_end measure
May 22, 2020
8b7f358
Updated SED
May 30, 2020
aa3fd88
no message
Sep 16, 2020
4915e88
gerge remote-tracking branch 'upstream/master' into exp_sr
henrymartin1 Nov 16, 2020
20115ac
Feat: Trajectory metrics for distance matrix
henrymartin1 Nov 16, 2020
8935ace
Excluded start_end_dist from this pull request. We could still add it…
henrymartin1 Nov 16, 2020
7ba7b19
Fix: Added tests
henrymartin1 Nov 16, 2020
bf99141
merge to update to upstream
henrymartin1 Nov 16, 2020
137d2cd
Fix: clean-up unused files
henrymartin1 Nov 16, 2020
69b306a
Feat: x and y with different length
henrymartin1 Nov 16, 2020
abf6ed5
Fix: Doc and comments
henrymartin1 Nov 17, 2020
f6f887b
Fix: Doc and comments
henrymartin1 Nov 17, 2020
1c37a90
Fix: Doc and comments
henrymartin1 Nov 17, 2020
3cbce24
Fix: Doc error
henrymartin1 Nov 17, 2020
a9dd9a9
Fix: Deleted dependencies that are no longer in use
henrymartin1 Nov 17, 2020
949c44a
clean up
henrymartin1 Nov 17, 2020
bbdac10
Fix: Clean up requirements.txt
henrymartin1 Nov 17, 2020
8030076
Merge remote-tracking branch 'upstream/master' into exp_sr
henrymartin1 Nov 17, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
228 changes: 228 additions & 0 deletions examples/example_similarity/TI_similarity_introduction.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,228 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# trackintel Similarity Module \n",
"demonstration notebook. run the following cells to get an overview what the similarity module provides."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### import of framework and data"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"import trackintel as ti\n",
"\n",
"pfs = ti.io.file.read_positionfixes_csv('testtplset.csv')"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### preprocessing of the trajectories\n",
"When starting from raw tracking data, the following steps have to be performed:\n",
"- extract staypoints \n",
"- extract triplegs\n",
"\n",
"the test data are already preprocessed positionfixes. To calculate some similarities between trajectories you always need trajectories as positionfixes. The positionfixes have tripleg_ids to distinguish the trajectories. You can access a single trajectory using command as:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pfs[pfs['tripleg_id']==22]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's store two trajectories out of the pfs data frame. "
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ta = pfs[pfs['tripleg_id']==22]\n",
"tb = pfs[pfs['tripleg_id']==33]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"A trajectory distance between these two can be calculated using the methods available in ti.similarity.measures. These are Dynamic Time Warping (DTW) and Edit Distance on Real Sequences (EDR). An algorithm called Start End Distance is also available, but works a bit different. This one is explained later.\n",
"\n",
"The DTW distance of two trajectories can be calculated like this:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"ti.similarity.e_dtw(ta,tb)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"How can this distance be interpreted? The DTW distance uses a euclidian distance function. So the distance is dependent on the coordinates of the positionfixes. To see this information you can call:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(pfs.crs)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The crs is empty. To set the initial projection, you can write the EPSG id of the coordinate system in the pfs GeoDataFrame. In this case this would be WGS84 with EPSG id 4326."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"pfs.crs='EPSG:4326'"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"To reproject the data set you could call pfs.to_crs(epsg=1234). To avoid changes in the positionfixes, it is recommended to reproject a copy or to reproject directly when calculating the similarity matrix of the data set. In this example the data is reprojected to CH1903+. To calculate a distance matrix with the DTW method, the following code can be executed."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"distmatrix = ti.similarity.similarity_matrix(pfs.to_crs(epsg=2056), 'dtw', dist=True)"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The similarity values are stored in 'simmatrix'. To access a value, normal python matrix syntax can be used, the row and column indices correspond to the tripleg_ids."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"distmatrix[22,33]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The dtw distance value, compared to the value calculated above, is now in meters. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The method similarity matrix can also be executed on a positionfixes object. By not setting the dist parameter to True, the trajectory distances will be inverted. This is recommended for large data sets (or data sets with high tripleg ids), as the matrix will not store zero values and be more performant."
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"simmatrix = pfs.to_crs(epsg=2056).as_positionfixes.similarity_matrix('dtw')"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"simmatrix[22,33]"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Comparing the two matrices, the distance matrix has at each position a stored value, the similarity matrix stores only the relevant values:"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(distmatrix)"
]
},
{
"cell_type": "code",
"execution_count": null,
"metadata": {},
"outputs": [],
"source": [
"print(simmatrix)"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.7.6"
}
},
"nbformat": 4,
"nbformat_minor": 4
}
32 changes: 32 additions & 0 deletions examples/example_similarity/testtplset.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
tracked_at,tripleg_id,user_id,longitude,latitude
2020-03-23 13:05:01+00:00,22,22,7.571296692000001,47.53789097
2020-03-23 13:05:08+00:00,22,22,7.571897507,47.53702179
2020-03-23 13:05:15+00:00,22,22,7.571940422000001,47.53631196
2020-03-23 13:05:22+00:00,22,22,7.5717473029999995,47.53573973
2020-03-23 13:05:29+00:00,22,22,7.57158637,47.53536308
2020-03-23 13:05:36+00:00,22,22,7.571178675,47.534732899999995
2020-03-23 13:05:43+00:00,22,22,7.570695877,47.53392886
2020-03-23 13:05:50+00:00,22,22,7.57029891,47.53326969
2020-03-23 13:05:57+00:00,22,22,7.569547892,47.53195132
2020-03-23 13:06:04+00:00,22,22,7.56919384,47.53111102
2020-03-23 13:06:11+00:00,22,22,7.569032907,47.53044457
2020-03-23 13:06:18+00:00,22,22,7.568678856,47.5294159
2020-03-23 13:06:25+00:00,22,22,7.568421364,47.52805397
2020-03-23 13:06:32+00:00,22,22,7.568228245,47.52686587
2020-03-23 13:06:39+00:00,22,22,7.567884922,47.52590957
2020-03-23 13:05:57+00:00,33,33,7.575137615,47.53776059
2020-03-23 13:06:04+00:00,33,33,7.575330734,47.53728255
2020-03-23 13:06:11+00:00,33,33,7.575502396,47.53673207
2020-03-23 13:06:18+00:00,33,33,7.575073242,47.53593531
2020-03-23 13:06:25+00:00,33,33,7.57481575,47.53518199
2020-03-23 13:06:32+00:00,33,33,7.574365139,47.53458803
2020-03-23 13:06:39+00:00,33,33,7.573978901,47.53367534
2020-03-23 13:06:46+00:00,33,33,7.57352829,47.53297994
2020-03-23 13:06:53+00:00,33,33,7.573442459,47.53228454
2020-03-23 13:07:00+00:00,33,33,7.573335171,47.53138629
2020-03-23 13:07:07+00:00,33,33,7.573292255,47.53057496
2020-03-23 13:07:14+00:00,33,33,7.573356627999999,47.52912613
2020-03-23 13:07:21+00:00,33,33,7.573378086,47.52809743
2020-03-23 13:07:28+00:00,33,33,7.573335171,47.52683689
2020-03-23 13:07:35+00:00,33,33,7.573120594,47.52605447
2020-03-23 13:07:42+00:00,33,33,7.572691441,47.52541693
4 changes: 3 additions & 1 deletion requirements.txt
Original file line number Diff line number Diff line change
Expand Up @@ -9,4 +9,6 @@ geoalchemy2
osmnx
psycopg2
imageio
simplification
simplification
console_progressbar
hongyeehh marked this conversation as resolved.
Show resolved Hide resolved
scipy
4 changes: 3 additions & 1 deletion setup.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,9 @@
'geoalchemy2',
'osmnx',
'scikit-learn',
'simplification'
'simplification',
'console_progressbar',
'scipy'
]

# What packages are optional?
Expand Down
7 changes: 7 additions & 0 deletions tests/data/sim1.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
user_id;tracked_at;latitude;longitude;elevation;accuracy
1;2015-11-27T08:00:00Z;47.37651505;8.548779488;456;1
1;2015-11-27T08:20:22Z;47.39935;8.5277;524;1
1;2015-11-27T12:39:28Z;47.40854;8.51169;537;1
1;2015-11-27T13:57:20Z;47.41144;8.54472;447;1
1;2015-11-28T12:39:28Z;43.40854;8.51169;537;1

7 changes: 7 additions & 0 deletions tests/data/sim2.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
user_id;tracked_at;latitude;longitude;elevation;accuracy
1;2015-11-27T08:00:00Z;47.37651505;8.548779488;456;1
1;2015-11-27T08:20:22Z;47.39935;8.5277;524;1
1;2015-11-27T12:39:28Z;47.40854;8.51169;537;1
1;2015-11-27T13:57:20Z;47.41144;8.54472;447;1
1;2015-11-28T12:39:28Z;43.40854;8.51169;537;1
1;2015-11-28T13:57:20Z;43.41144;8.54472;447;1
22 changes: 22 additions & 0 deletions tests/data/test_edr_no.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
tracked_at,longitude,latitude,user_id
2020-03-23T10:01:11Z,7.739524841,47.53366085,1
2020-03-23T10:01:23Z,7.740460932,47.5340683,1
2020-03-23T10:01:47Z,7.741346061,47.5343997,1
2020-03-23T10:01:59Z,7.741574049,47.53449748,1
2020-03-23T10:02:11Z,7.742451131,47.53482706,1
2020-03-23T10:02:23Z,7.743835151,47.53535221,1
2020-03-23T10:02:35Z,7.744918764,47.53575965,1
2020-03-23T10:02:47Z,7.74631083,47.53629023,1
2020-03-23T10:02:59Z,7.750661373,47.53794891,1
2020-03-23T10:03:11Z,7.753686905,47.5389919,1
2020-03-23T10:03:23Z,7.760617733,47.54087503,1
2020-03-23T10:03:35Z,7.772955894,47.54323608,1
2020-03-23T10:03:47Z,7.783727646,47.54471349,1
2020-03-23T10:03:59Z,7.79705286,47.54455416,1
2020-03-23T10:04:11Z,7.800765038,47.54506111,1
2020-03-23T10:04:23Z,7.802739143,47.54533631,1
2020-03-23T10:04:35Z,7.799928188,47.54342438,1
2020-03-23T10:04:47Z,7.803447247,47.53803583,1
2020-03-23T10:04:59Z,7.80600071,47.53419688,1
2020-03-23T10:05:11Z,7.808768749,47.53005339,1
2020-03-23T10:05:23Z,7.812395096,47.52772072,1
22 changes: 22 additions & 0 deletions tests/data/test_edr_outlier.csv
Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
tracked_at,user_id,longitude,latitude
2020-03-23T10:01:11Z,2,7.739524841,47.53366085
2020-03-23T10:01:23Z,2,7.740460932,47.5340683
2020-03-23T10:01:47Z,2,7.741346061,47.5343997
2020-03-23T10:01:59Z,2,7.741574049,47.53449748
2020-03-23T10:02:11Z,2,7.742451131,47.53482706
2020-03-23T10:02:23Z,2,7.743835151,47.53535221
2020-03-23T10:02:35Z,2,7.744918764,47.53575965
2020-03-23T10:02:47Z,2,7.74631083,47.53629023
2020-03-23T10:02:59Z,2,7.750661373,47.53794891
2020-03-23T10:03:11Z,2,7.753686905,47.5389919
2020-03-23T10:03:23Z,2,7.760617733,47.54087503
2020-03-23T10:03:35Z,2,7.755661010742187,47.56413326214536
2020-03-23T10:03:47Z,2,7.783727646,47.54471349
2020-03-23T10:03:59Z,2,7.79705286,47.54455416
2020-03-23T10:04:11Z,2,7.800765038,47.54506111
2020-03-23T10:04:23Z,2,7.802739143,47.54533631
2020-03-23T10:04:35Z,2,7.799928188,47.54342438
2020-03-23T10:04:47Z,2,7.803447247,47.53803583
2020-03-23T10:04:59Z,2,7.80600071,47.53419688
2020-03-23T10:05:11Z,2,7.808768749,47.53005339
2020-03-23T10:05:23Z,2,7.812395096,47.52772072
Loading