Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(clp-s): Add support for ingesting logs from S3. #639

Merged
merged 27 commits into from
Jan 8, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
f274a2e
Add Path and NetworkAuth structs which describe how to access a resou…
gibber9809 Dec 16, 2024
d0807c5
Remove boost::filesystem usage from ZstdDecompressor
gibber9809 Dec 16, 2024
cfbfa94
Add utility to create a reader for a resource given Path and NetworkAuth
gibber9809 Dec 16, 2024
97f1fe7
Add utilities for finding all files or archives in a directory, and a…
gibber9809 Dec 16, 2024
b33e3ae
Update CMakeLists to pull in boost::urls
gibber9809 Dec 16, 2024
528c0bd
Update ArchiveReader to accept Path and NetworkAuth
gibber9809 Dec 16, 2024
6cfdf46
Accept clp::ReaderInterface in JsonFileIterator
gibber9809 Dec 16, 2024
63d8bdc
Update JsonParser to accept Path and NetworkAuth
gibber9809 Dec 16, 2024
e8f5b37
Update JsonConstructor to accept Path and NetworkAuth
gibber9809 Dec 16, 2024
29049b3
Remove unnecessary dependency from kql build target
gibber9809 Dec 16, 2024
df0144a
Update clp_s end to end tests to account for new interface
gibber9809 Dec 16, 2024
cc75ffe
Update command line arguments and simplify clp-s.cpp
gibber9809 Dec 16, 2024
a64bd6b
Catch exception when failing to open archive during search
gibber9809 Dec 16, 2024
8fa5aa2
Attempt to fix build issue on macos
gibber9809 Dec 17, 2024
fe84cd4
Attempt to fix macos build again
gibber9809 Dec 17, 2024
561634c
Revert all changes to kql CMakeLists to simplify review
gibber9809 Dec 18, 2024
307be9e
Properly detect http errors when ingesting over the network
gibber9809 Dec 19, 2024
7557c84
Fix bug introduced while splitting up changes
gibber9809 Dec 19, 2024
fc8fad6
Fix obvious bug introduced in recent commit
gibber9809 Dec 20, 2024
84bc5c3
Improve error message for curl error during ingestion
gibber9809 Dec 20, 2024
2a87da2
Log an error when environment variables are unavailable for presigned…
gibber9809 Dec 20, 2024
1f7660b
Apply suggestions from code review
gibber9809 Dec 30, 2024
dbc16fb
Complete rename
gibber9809 Dec 31, 2024
86fa165
Address code review comments
gibber9809 Dec 31, 2024
2a1ae5d
Merge remote-tracking branch 'upstream/main' into clp-s-s3-ingestion
gibber9809 Dec 31, 2024
0b2c41a
Deduplicate some validation code in CommandLineArguments.cpp
gibber9809 Dec 31, 2024
93a1696
Address rabbit comments and fix macos build
gibber9809 Jan 3, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 3 additions & 1 deletion components/core/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -275,6 +275,8 @@ set(SOURCE_FILES_clp_s_unitTest
src/clp_s/FileReader.hpp
src/clp_s/FileWriter.cpp
src/clp_s/FileWriter.hpp
src/clp_s/InputConfig.cpp
src/clp_s/InputConfig.hpp
src/clp_s/JsonConstructor.cpp
src/clp_s/JsonConstructor.hpp
src/clp_s/JsonFileIterator.cpp
Expand Down Expand Up @@ -613,7 +615,7 @@ target_include_directories(unitTest
target_link_libraries(unitTest
PRIVATE
absl::flat_hash_map
Boost::filesystem Boost::iostreams Boost::program_options Boost::regex
Boost::filesystem Boost::iostreams Boost::program_options Boost::regex Boost::url
gibber9809 marked this conversation as resolved.
Show resolved Hide resolved
${CURL_LIBRARIES}
fmt::fmt
kql
Expand Down
23 changes: 17 additions & 6 deletions components/core/src/clp_s/ArchiveReader.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,20 +4,30 @@
#include <string_view>

#include "archive_constants.hpp"
#include "InputConfig.hpp"
#include "ReaderUtils.hpp"

using std::string_view;

namespace clp_s {
void ArchiveReader::open(string_view archives_dir, string_view archive_id) {
void ArchiveReader::open(Path const& archive_path, NetworkAuthOption const& network_auth) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

network_auth is not used in this method?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It gets used in the follow-up PR when we start supporting single file archive reading over the network.

I think it makes sense to keep it because Path requires NetworkAuth in the general case, and it makes the diff for the next PR a bit smaller.

if (m_is_open) {
throw OperationFailed(ErrorCodeNotReady, __FILENAME__, __LINE__);
}
m_is_open = true;
m_archive_id = archive_id;
std::filesystem::path archive_path{archives_dir};
archive_path /= m_archive_id;
auto const archive_path_str = archive_path.string();

if (false == get_archive_id_from_path(archive_path, m_archive_id)) {
throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__);
}

if (InputSource::Filesystem != archive_path.source) {
throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__);
}

if (false == std::filesystem::is_directory(archive_path.path)) {
throw OperationFailed(ErrorCodeBadParam, __FILENAME__, __LINE__);
}
auto const archive_path_str = archive_path.path;

m_var_dict = ReaderUtils::get_variable_dictionary_reader(archive_path_str);
m_log_dict = ReaderUtils::get_log_type_dictionary_reader(archive_path_str);
Expand Down Expand Up @@ -198,8 +208,9 @@ BaseColumnReader* ArchiveReader::append_reader_column(SchemaReader& reader, int3
column_reader = new DateStringColumnReader(column_id, m_timestamp_dict);
break;
// No need to push columns without associated object readers into the SchemaReader.
case NodeType::Object:
case NodeType::Metadata:
case NodeType::NullValue:
case NodeType::Object:
case NodeType::StructuredArray:
case NodeType::Unknown:
break;
Expand Down
9 changes: 4 additions & 5 deletions components/core/src/clp_s/ArchiveReader.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -7,9 +7,8 @@
#include <string_view>
#include <utility>

#include <boost/filesystem.hpp>

#include "DictionaryReader.hpp"
#include "InputConfig.hpp"
#include "PackedStreamReader.hpp"
#include "ReaderUtils.hpp"
#include "SchemaReader.hpp"
Expand All @@ -32,10 +31,10 @@ class ArchiveReader {

/**
* Opens an archive for reading.
* @param archives_dir
* @param archive_id
* @param archive_path
* @param network_auth
*/
void open(std::string_view archives_dir, std::string_view archive_id);
void open(Path const& archive_path, NetworkAuthOption const& network_auth);

/**
* Reads the dictionaries and metadata.
Expand Down
5 changes: 3 additions & 2 deletions components/core/src/clp_s/ArchiveWriter.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -270,9 +270,10 @@ void ArchiveWriter::initialize_schema_writer(SchemaWriter* writer, Schema const&
case NodeType::DateString:
writer->append_column(new DateStringColumnWriter(id));
break;
case NodeType::StructuredArray:
case NodeType::Object:
case NodeType::Metadata:
case NodeType::NullValue:
case NodeType::Object:
case NodeType::StructuredArray:
case NodeType::Unknown:
break;
}
Expand Down
1 change: 0 additions & 1 deletion components/core/src/clp_s/ArchiveWriter.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,6 @@
#include <string_view>
#include <utility>

#include <boost/filesystem.hpp>
#include <boost/uuid/uuid.hpp>
#include <boost/uuid/uuid_io.hpp>

Expand Down
27 changes: 26 additions & 1 deletion components/core/src/clp_s/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,17 @@ add_subdirectory(search/kql)

set(
CLP_SOURCES
../clp/aws/AwsAuthenticationSigner.cpp
../clp/aws/AwsAuthenticationSigner.hpp
../clp/BoundedReader.cpp
../clp/BoundedReader.hpp
../clp/CurlDownloadHandler.cpp
../clp/CurlDownloadHandler.hpp
../clp/CurlEasyHandle.hpp
../clp/CurlGlobalInstance.cpp
../clp/CurlGlobalInstance.hpp
../clp/CurlOperationFailed.hpp
../clp/CurlStringList.hpp
gibber9809 marked this conversation as resolved.
Show resolved Hide resolved
../clp/cli_utils.cpp
../clp/cli_utils.hpp
../clp/database_utils.cpp
Expand All @@ -28,11 +39,15 @@ set(
../clp/ffi/Value.hpp
../clp/FileDescriptor.cpp
../clp/FileDescriptor.hpp
../clp/FileReader.cpp
../clp/FileReader.hpp
../clp/GlobalMetadataDB.hpp
../clp/GlobalMetadataDBConfig.cpp
../clp/GlobalMetadataDBConfig.hpp
../clp/GlobalMySQLMetadataDB.cpp
../clp/GlobalMySQLMetadataDB.hpp
../clp/hash_utils.cpp
../clp/hash_utils.hpp
../clp/ir/EncodedTextAst.cpp
../clp/ir/EncodedTextAst.hpp
../clp/ir/parsing.cpp
Expand All @@ -43,18 +58,24 @@ set(
../clp/MySQLParamBindings.hpp
../clp/MySQLPreparedStatement.cpp
../clp/MySQLPreparedStatement.hpp
../clp/NetworkReader.cpp
../clp/NetworkReader.hpp
../clp/networking/socket_utils.cpp
../clp/networking/socket_utils.hpp
../clp/ReaderInterface.cpp
../clp/ReaderInterface.hpp
../clp/ReadOnlyMemoryMappedFile.cpp
../clp/ReadOnlyMemoryMappedFile.hpp
../clp/spdlog_with_specializations.hpp
../clp/streaming_archive/ArchiveMetadata.cpp
../clp/streaming_archive/ArchiveMetadata.hpp
../clp/streaming_compression/zstd/Decompressor.cpp
../clp/streaming_compression/zstd/Decompressor.hpp
../clp/Thread.cpp
../clp/Thread.hpp
../clp/TraceableException.hpp
../clp/time_types.hpp
../clp/type_utils.hpp
../clp/utf8_utils.cpp
../clp/utf8_utils.hpp
../clp/WriterInterface.cpp
Expand Down Expand Up @@ -89,6 +110,8 @@ set(
FileReader.hpp
FileWriter.cpp
FileWriter.hpp
InputConfig.cpp
InputConfig.hpp
JsonConstructor.cpp
JsonConstructor.hpp
JsonFileIterator.cpp
Expand Down Expand Up @@ -226,12 +249,14 @@ target_link_libraries(
clp-s
PRIVATE
absl::flat_hash_map
Boost::filesystem Boost::iostreams Boost::program_options
Boost::iostreams Boost::program_options Boost::regex Boost::url
${CURL_LIBRARIES}
clp::string_utils
kql
MariaDBClient::MariaDBClient
${MONGOCXX_TARGET}
msgpack-cxx
OpenSSL::Crypto
simdjson
spdlog::spdlog
yaml-cpp::yaml-cpp
Expand Down
Loading
Loading