Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/filestorage] replace path-unsafe characters in component names #20896

Conversation

andrzej-stencel
Copy link
Member

@andrzej-stencel andrzej-stencel commented Apr 13, 2023

Fixes #3148

This change was initially created to only fix #20731 by replacing the slash / characters in component names with a tilde ~. After that, we decided that it is a breaking change, and so it's better to fix other characters as well in a single breaking change.

@runforesight
Copy link

runforesight bot commented Apr 13, 2023

Foresight Summary

    
Major Impacts

build-and-test-windows duration(9 seconds) has decreased 30 minutes 25 seconds compared to main branch avg(30 minutes 34 seconds).
View More Details

⭕  build-and-test-windows workflow has finished in 9 seconds (30 minutes 25 seconds less than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
windows-unittest-matrix -     🔗  N/A See Details
windows-unittest -     🔗  N/A See Details

✅  check-links workflow has finished in 1 minute 4 seconds and finished at 13th Apr, 2023.


Job Failed Steps Tests
changed files -     🔗  N/A See Details
check-links -     🔗  N/A See Details

✅  telemetrygen workflow has finished in 1 minute 7 seconds and finished at 13th Apr, 2023.


Job Failed Steps Tests
build-dev -     🔗  N/A See Details
publish-latest -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details

✅  changelog workflow has finished in 2 minutes 44 seconds and finished at 13th Apr, 2023.


Job Failed Steps Tests
changelog -     🔗  N/A See Details

✅  prometheus-compliance-tests workflow has finished in 3 minutes 29 seconds (2 minutes 51 seconds less than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
prometheus-compliance-tests -     🔗  N/A See Details

✅  load-tests workflow has finished in 6 minutes 44 seconds (3 minutes 46 seconds less than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
setup-environment -     🔗  N/A See Details
loadtest (TestIdleMode) -     🔗  N/A See Details
loadtest (TestBallastMemory|TestLog10kDPS) -     🔗  N/A See Details
loadtest (TestMetric10kDPS|TestMetricsFromFile) -     🔗  N/A See Details
loadtest (TestMetricResourceProcessor|TestTrace10kSPS) -     🔗  N/A See Details
loadtest (TestTraceNoBackend10kSPS|TestTrace1kSPSWithAttrs) -     🔗  N/A See Details
loadtest (TestTraceBallast1kSPSWithAttrs|TestTraceBallast1kSPSAddAttrs) -     🔗  N/A See Details
loadtest (TestTraceAttributesProcessor) -     🔗  N/A See Details

✅  e2e-tests workflow has finished in 13 minutes 1 second and finished at 13th Apr, 2023.


Job Failed Steps Tests
kubernetes-test (v1.26.0) -     🔗  N/A See Details
kubernetes-test (v1.25.3) -     🔗  N/A See Details
kubernetes-test (v1.24.7) -     🔗  N/A See Details
kubernetes-test (v1.23.13) -     🔗  N/A See Details

✅  build-and-test workflow has finished in 34 minutes 24 seconds (12 minutes 12 seconds less than main branch avg.) and finished at 13th Apr, 2023.


Job Failed Steps Tests
govulncheck -     🔗  N/A See Details
setup-environment -     🔗  N/A See Details
check-codeowners -     🔗  N/A See Details
build-examples -     🔗  N/A See Details
checks -     🔗  N/A See Details
check-collector-module-version -     🔗  N/A See Details
correctness-metrics -     🔗  N/A See Details
correctness-traces -     🔗  N/A See Details
integration-tests -     🔗  N/A See Details
lint-matrix (receiver-0) -     🔗  N/A See Details
lint-matrix (receiver-1) -     🔗  N/A See Details
lint-matrix (processor) -     🔗  N/A See Details
lint-matrix (exporter) -     🔗  N/A See Details
lint-matrix (extension) -     🔗  N/A See Details
lint-matrix (connector) -     🔗  N/A See Details
lint-matrix (internal) -     🔗  N/A See Details
lint-matrix (other) -     🔗  N/A See Details
unittest-matrix (1.20, receiver-0) -     🔗  N/A See Details
unittest-matrix (1.20, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.20, processor) -     🔗  N/A See Details
unittest-matrix (1.20, exporter) -     🔗  N/A See Details
unittest-matrix (1.20, extension) -     🔗  N/A See Details
unittest-matrix (1.20, connector) -     🔗  N/A See Details
unittest-matrix (1.20, internal) -     🔗  N/A See Details
unittest-matrix (1.20, other) -     🔗  N/A See Details
unittest-matrix (1.19, receiver-0) -     🔗  N/A See Details
unittest-matrix (1.19, receiver-1) -     🔗  N/A See Details
unittest-matrix (1.19, processor) -     🔗  N/A See Details
unittest-matrix (1.19, exporter) -     🔗  N/A See Details
unittest-matrix (1.19, extension) -     🔗  N/A See Details
unittest-matrix (1.19, connector) -     🔗  N/A See Details
unittest-matrix (1.19, internal) -     🔗  N/A See Details
unittest-matrix (1.19, other) -     🔗  N/A See Details
lint -     🔗  N/A See Details
unittest (1.20) -     🔗  N/A See Details
unittest (1.19) -     🔗  N/A See Details
cross-compile (darwin, amd64) -     🔗  N/A See Details
cross-compile (darwin, arm64) -     🔗  N/A See Details
cross-compile (linux, 386) -     🔗  N/A See Details
cross-compile (linux, amd64) -     🔗  N/A See Details
cross-compile (linux, arm) -     🔗  N/A See Details
cross-compile (linux, arm64) -     🔗  N/A See Details
cross-compile (linux, ppc64le) -     🔗  N/A See Details
cross-compile (windows, 386) -     🔗  N/A See Details
cross-compile (windows, amd64) -     🔗  N/A See Details
build-package (deb) -     🔗  N/A See Details
build-package (rpm) -     🔗  N/A See Details
windows-msi -     🔗  N/A See Details
publish-check -     🔗  N/A See Details
publish-stable -     🔗  N/A See Details
publish-dev -     🔗  N/A See Details
rotate-milestone -     🔗  N/A See Details

🔎 See details on Foresight

*You can configure Foresight comments in your organization settings page.

atoulme
atoulme previously approved these changes Apr 18, 2023
@atoulme
Copy link
Contributor

atoulme commented Apr 20, 2023

Will this break existing checkpoint file paths, causing existing receivers to lose their history?

@atoulme atoulme dismissed their stale review April 23, 2023 17:28

Dismissing review since I have one mire question

@andrzej-stencel
Copy link
Member Author

andrzej-stencel commented Apr 24, 2023

Will this break existing checkpoint file paths, causing existing receivers to lose their history?

@atoulme You're right, this will change the directory used for storage for users already using components with a slash in the name, e.g. for the following configuration:

extensions:
  file_storage:
    directory: /tmp

receivers:
  filelog/logs/json:
    storage: file_storage

As long as the user has made sure the path /tmp/receiver_filelog_logs/json exists before running the collector (e.g. with mkdir -p /tmp/receiver_filelog_logs/json), this change will change the storage path from /tmp/receiver_filelog_logs/json to /tmp/receiver_filelog_logs~json.

I wonder how plausible this scenario is. 🤔

Perhaps we should treat this "fix" as a "breaking change" and introduce it via a feature gate?

@djaglowski
Copy link
Member

Code looks good to me but I do think we need to treat this as a breaking change.

@andrzej-stencel
Copy link
Member Author

Code looks good to me but I do think we need to treat this as a breaking change.

Fair enough, I'll add a feature gate for this.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label May 26, 2023
@andrzej-stencel
Copy link
Member Author

I'm still going to work on it. I think that if we want to treat this as a breaking change, it would be good to fix not only the slash issue, but also other potential problematic characters in one breaking change.

One example is percent sign %, which is problematic on Windows platform, but not on other platforms. To keep things consistent, I think we should replace the % in paths on all platforms, not only on Windows? 🤔

@github-actions github-actions bot removed the Stale label May 30, 2023
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 13, 2023
@andrzej-stencel
Copy link
Member Author

Let's keep it open 🙂 I'll be back!

@djaglowski djaglowski removed the Stale label Jun 13, 2023
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Jun 28, 2023
@andrzej-stencel andrzej-stencel force-pushed the replace-slashes-in-component-names branch from ead3989 to 7a877b5 Compare June 28, 2023 08:57
@andrzej-stencel andrzej-stencel changed the title [extension/filestorage] replace slashes in component names [extension/filestorage] replace path-unsafe characters in component names Jun 28, 2023
@andrzej-stencel andrzej-stencel force-pushed the replace-slashes-in-component-names branch from 7a877b5 to b916f7a Compare June 28, 2023 09:49
@andrzej-stencel andrzej-stencel marked this pull request as draft June 28, 2023 09:49
@github-actions github-actions bot removed the Stale label Jun 29, 2023
For example, for component `filelog/logs/json`, the data is stored:

- in path `receiver_filelog_logs/json` with the feature flag disabled (note that this is file named `json` inside directory named `receiver_filelog_logs`).
- in file `receiver_filelog_logs~json` with the feature flag enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to encode unsafe characters in terms of safe characters instead of replacing? Replacing creates the possibility of name clashes, e.g. components filelog/logs/json and filelog/logs~json both convert to receiver_filelog_logs~json.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, the issue of clashes crossed my mind. Especially with the replacement being quite "eager", replacing many different characters. I'll change the implementation to encode unsafe characters instead of replacing with ~.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astencel-sumo may I ask if you get any chance to work on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@newly12 the vacation period is over and I'm going to get back to it.

@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions
Copy link
Contributor

github-actions bot commented Aug 1, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Aug 1, 2023
@djaglowski djaglowski removed the Stale label Aug 1, 2023
@MovieStoreGuy
Copy link
Contributor

Can I ask, why not just error when there is a component.Name() that is invalid?

@djaglowski
Copy link
Member

Can I ask, why not just error when there is a component.Name() that is invalid?

We're generating a filename based on several fields, including a component ID, so we shouldn't error on anything characters that could be part of a valid component ID.

@github-actions
Copy link
Contributor

github-actions bot commented Sep 5, 2023

This PR was marked stale due to lack of activity. It will be closed in 14 days.

@github-actions github-actions bot added the Stale label Sep 5, 2023
@djaglowski djaglowski removed the Stale label Sep 5, 2023
@github-actions
Copy link
Contributor

This PR was marked stale due to lack of activity. It will be closed in 14 days.

This prevents clashes between similar unsafe names, e.g.:
- `filelog/logs/json`
- `filelog/logs~json`
@andrzej-stencel andrzej-stencel force-pushed the replace-slashes-in-component-names branch from f38f5db to b4f921d Compare September 28, 2023 09:10
@andrzej-stencel
Copy link
Member Author

@tigrannajaryan I have reworked the implementation to encode unsafe characters instead of replacing. Please take a look.

@djaglowski djaglowski merged commit 91e56e4 into open-telemetry:main Oct 3, 2023
@github-actions github-actions bot added this to the next release milestone Oct 3, 2023
@andrzej-stencel andrzej-stencel deleted the replace-slashes-in-component-names branch October 3, 2023 13:45
jmsnll pushed a commit to jmsnll/opentelemetry-collector-contrib that referenced this pull request Nov 12, 2023
…ames (open-telemetry#20896)

Fixes open-telemetry#3148

This change was initially created to only fix open-telemetry#20731 by replacing the
slash `/` characters in component names with a tilde `~`. After that, we
decided that it is a breaking change, and so it's better to fix other
characters as well in a single breaking change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
7 participants