Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOC/TST: generate tripleg #609

Merged
merged 12 commits into from
Mar 15, 2024
Merged

Conversation

hongyeehh
Copy link
Member

@hongyeehh hongyeehh commented Mar 11, 2024

Following #607, this PR does the following:

  • Clean up tests for positionfix generation
  • Adding docstring to explain triplegs' time and geometry generation
  • Adding more tests, in particular test_stability(), to ensure consistency between two cases

@hongyeehh hongyeehh marked this pull request as draft March 11, 2024 14:07
Copy link

codecov bot commented Mar 11, 2024

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 93.41%. Comparing base (59e3e1c) to head (9abd8fd).

Additional details and impacted files
@@            Coverage Diff             @@
##           master     #609      +/-   ##
==========================================
- Coverage   93.44%   93.41%   -0.03%     
==========================================
  Files          33       33              
  Lines        2076     2082       +6     
  Branches      367      371       +4     
==========================================
+ Hits         1940     1945       +5     
  Misses        126      126              
- Partials       10       11       +1     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@hongyeehh
Copy link
Member Author

A major change to the _generate_triplegs_overlap_staypoints() function is to assign pfs geometries to triplegs instead of the defined initially staypoint geometries (basically removing line 439-445 in trackintel/preprocessing/positionfixes.py). This ensures consistency between the two cases in generate_triplegs() (the testing function test_stability()).

@munterfi, does this change affect your use cases or has any consequences for the original code? The tests seem to work fine after corresponding edits.

@hongyeehh hongyeehh marked this pull request as ready for review March 11, 2024 15:23
@munterfi
Copy link
Contributor

Hi @hongyeehh, thanks for extending the documentation and streamlining the test cases 👍

Concerning the LINESTRING geometries of the triplegs: Since the method says the triplegs should overlap with the staypoints, an overlap with the centroid of the staypoint is more intuitive from my point of view. Even if I see the inconsistency between the methods.

What is the advantage if the tripleg starts at the last positionfix of the staypoint and ends at the first positionfix of the next staypoint, if we assume that it is a staypoint where the observed movement only results from GPS inaccuracies but not from real movement of the actual subject?

In our usecase with very coarse GPS tracking data of locomotives, we add a further step after applying trackintel, where we detect gaps in the "lifetime" tracks of the vehicles. Those gaps are then quantified spatially and temporally. If there is no gap, the duration and distance between the staypoint and tripleg should consequently be zero. This assumption would now no longer apply, given the modification.

What do you think?

@hongyeehh
Copy link
Member Author

Hi @hongyeehh, thanks for extending the documentation and streamlining the test cases 👍

Concerning the LINESTRING geometries of the triplegs: Since the method says the triplegs should overlap with the staypoints, an overlap with the centroid of the staypoint is more intuitive from my point of view. Even if I see the inconsistency between the methods.

What is the advantage if the tripleg starts at the last positionfix of the staypoint and ends at the first positionfix of the next staypoint, if we assume that it is a staypoint where the observed movement only results from GPS inaccuracies but not from real movement of the actual subject?

In our usecase with very coarse GPS tracking data of locomotives, we add a further step after applying trackintel, where we detect gaps in the "lifetime" tracks of the vehicles. Those gaps are then quantified spatially and temporally. If there is no gap, the duration and distance between the staypoint and tripleg should consequently be zero. This assumption would now no longer apply, given the modification.

What do you think?

Thanks for the explanation! My primary concern is ensuring consistency in the code, e.g., obtaining the same tripleg generation results for cases 1 and 2.

Assigning pfs geom is probably the simplest solution, as it preserves the one-to-one correspondence of the geometries of pf-sp and pf-tpl, and does not lead to tpl table depending on the sp generation result. I agree that assigning sp geometries to triplegs might make more sense, considering the real-world GPS noise. This is more of a design choice, and I am fine with both options as long as the code makes the test_stability() pass :).

Maybe @NinaWie has a preference?

@NinaWie
Copy link
Member

NinaWie commented Mar 11, 2024

I agree that there needs to be consistency between the cases whether sp are provided as input or not. But I also agree that it would be nicer to use the actual staypoint geometry for the overlap method, since the staypoint geometry should indeed overlap with the tripleg geometry in this case. To implement this, we would have to recompute the staypoint centroid from the pfs, right? So within the tripleg generation functions, in the case where no staypoints are supplied, we need to use the staypoint_id column in pfs and use it to compute the staypoint geometry? This is a bit ugly since it should be done in the staypoint generation. The other workaround would be to add an assertion statement that throws an error when you try to use the overlap_staypoints method in the tripleg generation without supplying staypoints.

@hongyeehh hongyeehh self-assigned this Mar 12, 2024
Copy link
Member Author

@hongyeehh hongyeehh left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this version, I returned to assign sp geometry to tpls start and end. The docstrings are changed accordingly. More checks are implemented at the beginning of the function; the "between_staypoint" method now requires staypoint as input.

However, we still have the inconsistency issue between cases 1 and 2, because, in preprocessing of step 2 (line 268), we have not recovered the one-to-one match between pfs and sp, leading to wrong geometry assignments in line 458.

We can either:

  • Add more lines in the preprocessing to recover the match.
  • Remove case 2, and require input pfs to have corresponding "staypoint_id" column with the input sp.

I favor the latter, as I cannot think of its valid use case: users typically run the generate_triplegs() function after generate_staypoints(). Also, complicating preprocessing makes the function similar to generate_staypoints() as it needs to consider edge cases, e.g., temporal gaps.

trackintel/preprocessing/positionfixes.py Outdated Show resolved Hide resolved
trackintel/preprocessing/positionfixes.py Show resolved Hide resolved
tests/preprocessing/test_positionfixes.py Outdated Show resolved Hide resolved
@munterfi
Copy link
Contributor

munterfi commented Mar 14, 2024

Thank you for the suggestions!

I support excluding the case 2 entirely if the overlap_staypoints method is chosen.

A general input regarding the application scenario for case 2: Does this case imply that staypoints can be sourced differently, i.e. skipping the generate_staypoints step?

Given the high complexity and nesting within the generate_triplegs method, would it then make sense to extract case 2 into a separate function, e.g. assign_pfs_to_sps(pfs, sp), and set the staypoint_id in the positionfixes as a prerequisite for the generate_triplegs function?

Even though this is a breaking change for users, it would reduce the input checks and variants that need to be managed, and assign a single responsibility to generate_triplegs.

@hongyeehh
Copy link
Member Author

In the new version, an error will be raised if pfs with no staypoint_id is provided to the function, such that we only support case 1 for overlap_staypoints.

I like the idea of removing case 2 preprocessing from the function, which we can implement in future PRs. At the moment, I added a deprecation warning for the users. Since it is not important the print_progress input argument is removed.

Copy link
Member

@NinaWie NinaWie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the edits and cleanups! I also like the documentaiton, should be clear now.

trackintel/preprocessing/positionfixes.py Outdated Show resolved Hide resolved
@hongyeehh hongyeehh merged commit 4c0f376 into mie-lab:master Mar 15, 2024
8 checks passed
@hongyeehh hongyeehh deleted the doc-generate-tripleg branch March 15, 2024 09:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants