Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Support search logs by timestamp for structured and unstructured logs. #42

Open
wants to merge 20 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
20 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
7 changes: 6 additions & 1 deletion src/clp_ffi_js/ir/StreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -129,6 +129,7 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
"Array<[string, bigint, number, number]>"
);
emscripten::register_type<clp_ffi_js::ir::FilteredLogEventMapTsType>("number[] | null");
emscripten::register_type<clp_ffi_js::ir::LogEventIdxTsType>("number | null");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Registering a type that is nullable is good for consistency with existing code like FilteredLogEventMapTsType.

That said, Embind seems to have added support for std::optional<> so we should look into converting functions that return nullable data into returning std::optional<> types.

https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html#:~:text=For%20convenience%2C%20embind%20provides%20factory%20functions%20to%20register%20std%3A%3Avector%3CT%3E%20(register_vector())%2C%20std%3A%3Amap%3CK%2C%20V%3E%20(register_map())%2C%20and%20std%3A%3Aoptional%3CT%3E%20(register_optional())%20types%3A

For convenience, embind provides factory functions to register std::vector (register_vector()), std::map<K, V> (register_map()), and std::optional (register_optional()) types:

@coderabbitai create an issue to track this

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao I've created issue #51 to track this enhancement: #51

🔗 Analysis chain

@junhaoliao I've created a GitHub issue to track this enhancement. The issue has been assigned to you and includes details about the current state and the proposed changes.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Create a GitHub issue to track the conversion of nullable types to std::optional
gh issue create \
  --title "refactor: Convert nullable Embind types to use std::optional" \
  --body "## Background

Currently, we register nullable types in Embind using custom type definitions that map to 'T | null' in TypeScript. For example:
- \`FilteredLogEventMapTsType\` -> \`number[] | null\`
- \`LogEventIdxTsType\` -> \`number | null\`

Embind now provides built-in support for \`std::optional<T>\` through the \`register_optional()\` function, as documented [here](https://emscripten.org/docs/porting/connecting_cpp_and_javascript/embind.html).

## Proposed Changes
Convert all nullable type registrations to use \`std::optional<T>\` with \`register_optional()\`.

## References
- Discussion: ${GITHUB_SERVER_URL}/${GITHUB_REPOSITORY}/pull/42#discussion_r1905740583" \
  --assignee junhaoliao

Length of output: 792


✏️ Learnings added
Learnt from: junhaoliao
PR: y-scope/clp-ffi-js#42
File: src/clp_ffi_js/ir/StreamReader.cpp:132-132
Timestamp: 2025-01-07T16:26:06.449Z
Learning: Embind supports std::optional<T> through register_optional() function, which should be preferred over custom nullable type definitions (T | null) for better type safety and consistency.

Note: Learnings are effective only in the context of similar code segments. To apply general coding guidelines, please configure review instructions. You can manage existing learnings in the app.


If you found this review helpful, would you consider giving us a shout-out on X?
Thank you for using CodeRabbit!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should look into converting functions...

in a separate PR i meant, so we should create an issue to track the effort

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@junhaoliao I had already created issue #51 to track this effort in response to your previous comment. The issue will help track the conversion of nullable types to std::optional in a separate PR.

emscripten::class_<clp_ffi_js::ir::StreamReader>("ClpStreamReader")
.constructor(
&clp_ffi_js::ir::StreamReader::create,
Expand All @@ -145,7 +146,11 @@ EMSCRIPTEN_BINDINGS(ClpStreamReader) {
)
.function("filterLogEvents", &clp_ffi_js::ir::StreamReader::filter_log_events)
.function("deserializeStream", &clp_ffi_js::ir::StreamReader::deserialize_stream)
.function("decodeRange", &clp_ffi_js::ir::StreamReader::decode_range);
.function("decodeRange", &clp_ffi_js::ir::StreamReader::decode_range)
.function(
"getLogEventIndexByTimestamp",
&clp_ffi_js::ir::StreamReader::get_log_event_idx_by_timestamp
);
}
} // namespace

Expand Down
85 changes: 74 additions & 11 deletions src/clp_ffi_js/ir/StreamReader.hpp
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <type_traits>
#include <vector>

#include <clp/ir/types.hpp>
#include <clp/streaming_compression/zstd/Decompressor.hpp>
#include <clp/type_utils.hpp>
#include <emscripten/em_asm.h>
Expand All @@ -29,6 +30,7 @@ EMSCRIPTEN_DECLARE_VAL_TYPE(ReaderOptions);
// JS types used as outputs
EMSCRIPTEN_DECLARE_VAL_TYPE(DecodedResultsTsType);
EMSCRIPTEN_DECLARE_VAL_TYPE(FilteredLogEventMapTsType);
EMSCRIPTEN_DECLARE_VAL_TYPE(LogEventIdxTsType);

enum class StreamType : uint8_t {
Structured,
Expand All @@ -44,6 +46,23 @@ using LogEvents = std::vector<LogEventWithFilterData<LogEvent>>;
*/
using FilteredLogEventsMap = std::optional<std::vector<size_t>>;

template <typename LogEvent>
concept GetLogEventIdxInterface = requires(
LogEventWithFilterData<LogEvent> const& event,
clp::ir::epoch_time_ms_t timestamp
) {
{
event.get_timestamp()
} -> std::convertible_to<clp::ir::epoch_time_ms_t>;
};

template <typename LogEvent, typename ToStringFunc>
concept DecodeRangeInterface = requires(ToStringFunc func, LogEvent const& log_event) {
{
func(log_event)
} -> std::convertible_to<std::string>;
};

/**
* Class to deserialize and decode Zstandard-compressed CLP IR streams as well as format decoded
* log events.
Expand Down Expand Up @@ -123,6 +142,18 @@ class StreamReader {
*/
[[nodiscard]] virtual auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType = 0;
/**
* Finds the index of the last log event that matches or next to the given timestamp.
*
* @tparam LogEvent
* @param timestamp The timestamp to search for, in milliseconds since the Unix epoch.
* @return The last index of the log event whose timestamp is smaller than or equal to the
* `timestamp`.
* @return `0` if all log event timestamps are larger than the target.
* @return null if no log event exists in the stream.
*/
[[nodiscard]] virtual auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType = 0;

protected:
explicit StreamReader() = default;
Expand All @@ -143,11 +174,7 @@ class StreamReader {
* @throws Propagates `ToStringFunc`'s exceptions.
*/
template <typename LogEvent, typename ToStringFunc>
requires requires(ToStringFunc func, LogEvent const& log_event) {
{
func(log_event)
} -> std::convertible_to<std::string>;
}
requires DecodeRangeInterface<LogEvent, ToStringFunc>
static auto generic_decode_range(
size_t begin_idx,
size_t end_idx,
Expand All @@ -163,7 +190,6 @@ class StreamReader {
* @tparam LogEvent
* @param log_level_filter
* @param log_events Derived class's log events.
* @param log_events
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

was it a mistake?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it duplicated?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's revert the change and fix it in another PR. I think we should also add docs for filtered_log_event_map.

* @param[out] filtered_log_event_map Returns the filtered log events.
*/
template <typename LogEvent>
Expand All @@ -172,14 +198,23 @@ class StreamReader {
LogLevelFilterTsType const& log_level_filter,
LogEvents<LogEvent> const& log_events
) -> void;

/**
* Templated implementation of `get_log_event_idx_by_timestamp`.
*
* @tparam LogEvent
* @param timestamp
* event timestamps are larger than the target. In that case, return the first log event index.
*/
template <GetLogEventIdxInterface LogEvent>
auto generic_get_log_event_idx_by_timestamp(
std::vector<LogEventWithFilterData<LogEvent>> const& log_events,
clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType;
};

template <typename LogEvent, typename ToStringFunc>
requires requires(ToStringFunc func, LogEvent const& log_event) {
{
func(log_event)
} -> std::convertible_to<std::string>;
}
requires DecodeRangeInterface<LogEvent, ToStringFunc>
auto StreamReader::generic_decode_range(
size_t begin_idx,
size_t end_idx,
Expand Down Expand Up @@ -258,6 +293,34 @@ auto StreamReader::generic_filter_log_events(
}
}
}

template <GetLogEventIdxInterface LogEvent>
auto StreamReader::generic_get_log_event_idx_by_timestamp(
LogEvents<LogEvent> const& log_events,
clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType {
if (log_events.empty()) {
return LogEventIdxTsType{emscripten::val::null()};
}
Henry8192 marked this conversation as resolved.
Show resolved Hide resolved

auto upper{std::upper_bound(
log_events.begin(),
log_events.end(),
timestamp,
[](clp::ir::epoch_time_ms_t ts, LogEventWithFilterData<LogEvent> const& log_event) {
return ts < log_event.get_timestamp();
}
)};

if (upper == log_events.begin()) {
return LogEventIdxTsType{emscripten::val(0)};
}

auto const upper_index{std::distance(log_events.begin(), upper)};
auto const index{upper_index - 1};

return LogEventIdxTsType{emscripten::val(index)};
}
} // namespace clp_ffi_js::ir

#endif // CLP_FFI_JS_IR_STREAMREADER_HPP
10 changes: 10 additions & 0 deletions src/clp_ffi_js/ir/StructuredIrStreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@
#include <clp/Array.hpp>
#include <clp/ErrorCode.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ir/types.hpp>
#include <clp/TraceableException.hpp>
#include <emscripten/val.h>
#include <spdlog/spdlog.h>
Expand Down Expand Up @@ -147,6 +148,15 @@ auto StructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx, bo
);
}

auto StructuredIrStreamReader::get_log_event_idx_by_timestamp(
clp::ir::epoch_time_ms_t const timestamp
) -> LogEventIdxTsType {
return generic_get_log_event_idx_by_timestamp<StructuredLogEvent>(
*m_deserialized_log_events,
timestamp
);
}

StructuredIrStreamReader::StructuredIrStreamReader(
StreamReaderDataContext<StructuredIrDeserializer>&& stream_reader_data_context,
std::shared_ptr<StructuredLogEvents> deserialized_log_events
Expand Down
4 changes: 4 additions & 0 deletions src/clp_ffi_js/ir/StructuredIrStreamReader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,7 @@
#include <clp/Array.hpp>
#include <clp/ffi/ir_stream/Deserializer.hpp>
#include <clp/ffi/SchemaTree.hpp>
#include <clp/ir/types.hpp>
#include <emscripten/val.h>

#include <clp_ffi_js/ir/LogEventWithFilterData.hpp>
Expand Down Expand Up @@ -74,6 +75,9 @@ class StructuredIrStreamReader : public StreamReader {
[[nodiscard]] auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType override;

[[nodiscard]] auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType override;

private:
// Constructor
explicit StructuredIrStreamReader(
Expand Down
9 changes: 9 additions & 0 deletions src/clp_ffi_js/ir/UnstructuredIrStreamReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -158,6 +158,15 @@ auto UnstructuredIrStreamReader::decode_range(size_t begin_idx, size_t end_idx,
);
}

auto UnstructuredIrStreamReader::get_log_event_idx_by_timestamp(
clp::ir::epoch_time_ms_t const timestamp
) -> LogEventIdxTsType {
return generic_get_log_event_idx_by_timestamp<UnstructuredLogEvent>(
m_encoded_log_events,
timestamp
);
}

UnstructuredIrStreamReader::UnstructuredIrStreamReader(
StreamReaderDataContext<UnstructuredIrDeserializer>&& stream_reader_data_context
)
Expand Down
3 changes: 3 additions & 0 deletions src/clp_ffi_js/ir/UnstructuredIrStreamReader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -71,6 +71,9 @@ class UnstructuredIrStreamReader : public StreamReader {
[[nodiscard]] auto decode_range(size_t begin_idx, size_t end_idx, bool use_filter) const
-> DecodedResultsTsType override;

[[nodiscard]] auto get_log_event_idx_by_timestamp(clp::ir::epoch_time_ms_t timestamp
) -> LogEventIdxTsType override;

private:
// Constructor
explicit UnstructuredIrStreamReader(
Expand Down
Loading