-
Notifications
You must be signed in to change notification settings - Fork 142
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added an example within the documentation for custom readers supporting pandas DataFrames. #707
Conversation
… definition of custom Readers that support pandas DataFrames. This has the benefit of being able to take advantage of the range of data formats that pandas supports for ground truth and detection data being read into stonesoup. The example includes the custom definition of a DataFrameGroundTruthReader and DataFrameDetectionReader class. Both of these inherit from the GroundTruthReader class, along with a custom defined _DataFrameReader class. Each of these classes supports reading of pandas dataframes that are already read into memory, in a similar way to the CSVGroundTruthReader and CSVDetectionReader [issue 354].
Thanks for the contribution @BenjaminFraser. I see docs are failing to build due to pandas being missing dependency. If you could add pandas the Lines 31 to 35 in 435883a
It'd be good to have the readers in the main code base (probably with an optional dependency on pandas) so users can easily access them. And also good to keep the example you've created as both a how to use them, but also, in reference to #354, to show how to create custom readers. (Minor issue of if they are modified, we'll have to be sure to update in both places, unless in the example could do something with |
Or use of Sphinx |
…, ref pull request dstl#707 failing to build docs.
That's no problem at all, and including the Readers within the main code base sounds like a good idea! The only sticking point was including it with pandas as an optional dependency, but I'll look into that, which should hopefully be straightforward enough. I'll take a look later when I have the chance and put together another PR for those points! |
We've done this before by simply raising an error on importing of dependencies. Stone-Soup/stonesoup/reader/opensky.py Lines 5 to 10 in 5276c1b
|
Codecov ReportBase: 94.81% // Head: 94.84% // Increases project coverage by
Additional details and impacted files@@ Coverage Diff @@
## main #707 +/- ##
==========================================
+ Coverage 94.81% 94.84% +0.02%
==========================================
Files 169 170 +1
Lines 8221 8296 +75
Branches 1216 1230 +14
==========================================
+ Hits 7795 7868 +73
- Misses 316 318 +2
Partials 110 110
Flags with carried forward coverage won't be shown. Click here to find out more.
Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here. ☔ View full report at Codecov. |
groundtruth_dict = {} | ||
updated_paths = set() | ||
previous_time = None | ||
for row in self.dataframe.to_dict(orient="records"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a thought, but wondering if you could take advantage of pandas to group by time field (and path_id field), such that you can simplify the logic below (i.e. no need for the previous_time
code)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm sure we could do this easily enough, although I've not included it in the current commit yet. Before doing this, I could do with confirming the precise functionality to avoid accidentally changing the current generator logic.
If we simply order by path_id and time, and then iteratively yield each time and updated_paths (detections for DataFrameDetectionReader), is that doing exactly the same functionality as the current groundtruth_paths_gen (and detections_gen)?
def detections_gen(self): | ||
detections = set() | ||
previous_time = None | ||
for row in self.dataframe.to_dict(orient="records"): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could possibly also group by time field here.
…ord, for consistency with the remainder of all Stone Soup documentation.
…DataFrameReader, as suggested by sdhiscocks.
…reader.py'. This includes a new _DataFrameReader class, which inherits from Reader and allows a Pandas DataFrame to be read. Two new classes are also developed for reading ground-truth data from pandas (DataFrameGroundTruthReader) and reading detections from pandas (DataFrameDetectionReader). Each of these new classes inherit from DetectionReader and _DataFrameReader, and yield outputs in the same way as the basic GroundTruthReader (yields previous time and updated_paths) and DetectionReader (yields previous time and detections) classes within Stone Soup. Tests are still to be generated for these classes.
…ns within both Custom_Pandas_Dataloader.py (documents example), and reader/pandas_reader.py.
…hin both Custom_Pandas_Dataloader.py (documents example), and reader/pandas_reader.py.
…ory. Tests now check both ground-truth reader and detection reader functionality.
…ectory. Now includes seperate tests for DataFrameGroundTruthReader and DataFrameDetectionReader.
… from blank lines.
… pandas_reader.py. Developed the same tests as defined within test_generic.py.
…st_pandas_reader.py to check for case where pandas column contains string formatted datetimes, rather than being Timestamp (already formatted in pandas before creating reader).
Added a new example (
Custom_Pandas_Dataloader.py
) within the documentation in docs/examples for the definition of custom Readers that support pandas DataFrames.This allows a wide range of data formats supported by pandas to be taken advantage of for Ground Truth Readers and Detection Readers, without the need manually define custom data ingestion processes for each type, e.g. JSON, XML, Parquet, HDF5, .txt, .zip.
Given its similarity to the requirements of the custom reader documentation example (#354), I've linked this pull request to that, which hopefully is not a problem.
These classes do have the disadvantage of requiring the entire dataset in memory. However, it seems that the ability to directly use pandas DataFrames is a feature several users of Stonesoup have shown interest in, which is understandable given the flexibility and processing functionalities this can provide.
The example in
Custom_Pandas_Dataloader.py
includes the definitions ofDataFrameGroundTruthReader
andDataFrameDetectionReader
classes. Each of these inherit from the existingGroundTruthReader
class, along with a custom defined_DataFrameReader
class.These classes operate similarly to the existing
CSVGroundTruthReader
andCSVDetectionReader
classes, except they take as input a pandas DataFrame already read into memory, rather than a path to .csv file. They also have modified generator functions for producing the time and paths / detections.These have been useful for some work I've done using Stonesoup for some UAV-based non-cooperative radar research, and so hopefully they are also of value to other members of the community!
Update on progression and fixes to aspects of this PR, as of 22 Oct 22:
A point noted with the tests is that there is currently full coverage of all classes defined in pandas_reader.py, however Codecov flags the pandas import check (which raises an import error if pandas is not installed) as failed.
To-do / enhancements:
inspect.getsource
.