Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[extension/filestorage] replace path-unsafe characters in component names #20896

16 changes: 16 additions & 0 deletions .chloggen/replace-slashes-in-component-names.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,16 @@
# One of 'breaking', 'deprecation', 'new_component', 'enhancement', 'bug_fix'
change_type: breaking

# The name of the component, or a single word describing the area of concern, (e.g. filelogreceiver)
component: extension/filestorage

# A brief description of the change. Surround your text with quotes ("") if it needs to start with a backtick (`).
note: Replace path-unsafe characters in component names

# One or more tracking issues related to the change
issues: [3148]

# (Optional) One or more lines of additional information to render under the primary note.
# These lines will be padded with 2 spaces and then inserted directly into the document.
# Use pipe (|) for multiline entries.
subtext:
29 changes: 29 additions & 0 deletions extension/storage/filestorage/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -93,3 +93,32 @@ processors:
exporters:
nop:
```

## Feature Gates

See the [Collector feature gates](https://github.com/open-telemetry/opentelemetry-collector/blob/main/featuregate/README.md#collector-feature-gates) for an overview of feature gates in the collector.

### `extension.filestorage.replaceUnsafeCharacters`

When enabled, characters that are not safe in file paths are replaced in component name using the extension before creating the file name to store data by the extension.

This may change the file path that the extension is writing component's data to.
For example, for component `filelog/logs/json`, the data is stored:

- in path `receiver_filelog_logs/json` with the feature flag disabled (note that this is file named `json` inside directory named `receiver_filelog_logs`).
- in file `receiver_filelog_logs~json` with the feature flag enabled.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be better to encode unsafe characters in terms of safe characters instead of replacing? Replacing creates the possibility of name clashes, e.g. components filelog/logs/json and filelog/logs~json both convert to receiver_filelog_logs~json.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's true, the issue of clashes crossed my mind. Especially with the replacement being quite "eager", replacing many different characters. I'll change the implementation to encode unsafe characters instead of replacing with ~.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@astencel-sumo may I ask if you get any chance to work on this?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@newly12 the vacation period is over and I'm going to get back to it.


The feature replaces all characters other than `A-Z`, `a-z`, `0-9`, dot `.`, hyphen `-`, underscore `_`, with a tilde `~`.
This replacement is done to prevent surprising behavior or errors in the file storage extension.

For more details, see the following issues:

- [File storage extension - invalid file name characters must be encoded #3148](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/3148)
- [[filestorage] receiver name sanitization #20731](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/20731)

The schedule for this feature gate is:

- Introduced in v0.81.0 (July 2023) as `alpha` - disabled by default.
- Moved to `beta` in October 2023 - enabled by default.
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I described the schedule for the feature gate in terms of dates, not release numbers, as this is pretty far in the future and I believe it's easier for everyone to focus on time frames as opposed to arbitrary release numbers.

The schedule I selected is slow: three months before moving the gate to beta and another three months between marking it stable. This is because the component itself is marked as beta, so users might expect changes to not happen rapidly. Also I believe there's no rush - the feature gate is available, so users can switch it on if the original behavior bothers them.

Let me know what you think!

- Moved to `stable` in January 2024 - cannot be disabled.
- Removed three releases after `stable`.
30 changes: 29 additions & 1 deletion extension/storage/filestorage/extension.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,13 +7,23 @@ import (
"context"
"fmt"
"path/filepath"
"regexp"

"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/extension"
"go.opentelemetry.io/collector/extension/experimental/storage"
"go.opentelemetry.io/collector/featuregate"
"go.uber.org/zap"
)

var replaceUnsafeCharactersFeatureGate = featuregate.GlobalRegistry().MustRegister(
"extension.filestorage.replaceUnsafeCharacters",
featuregate.StageAlpha,
featuregate.WithRegisterDescription("When enabled, characters that are not safe in file paths are replaced in component name using the extension. For example, the data for component `filelog/logs/json` will be stored in file `receiver_filelog_logs~json` and not in `receiver_filelog_logs/json`."),
featuregate.WithRegisterReferenceURL("https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/3148"),
featuregate.WithRegisterFromVersion("v0.81.0"),
)

type localFileStorage struct {
cfg *Config
logger *zap.Logger
Expand Down Expand Up @@ -49,7 +59,10 @@ func (lfs *localFileStorage) GetClient(_ context.Context, kind component.Kind, e
} else {
rawName = fmt.Sprintf("%s_%s_%s_%s", kindString(kind), ent.Type(), ent.Name(), name)
}
// TODO sanitize rawName

if replaceUnsafeCharactersFeatureGate.IsEnabled() {
rawName = sanitize(rawName)
}
absoluteName := filepath.Join(lfs.cfg.Directory, rawName)
client, err := newClient(lfs.logger, absoluteName, lfs.cfg.Timeout, lfs.cfg.Compaction)

Expand Down Expand Up @@ -84,3 +97,18 @@ func kindString(k component.Kind) string {
return "other" // not expected
}
}

// sanitize replaces characters in name that are not safe in a file path
func sanitize(name string) string {
// Specify unsafe characters by a negation of allowed characters:
// - uppercase and lowercase letters A-Z, a-z
// - digits 0-9
// - dot `.`
// - hyphen `-`
// - underscore `_`
// - tilde `~`
unsafeCharactersRegex := regexp.MustCompile(`[^A-Za-z0-9.\-_~]`)

// Replace all characters other than the above with the tilde `~`
return unsafeCharactersRegex.ReplaceAllString(name, "~")
}
54 changes: 54 additions & 0 deletions extension/storage/filestorage/extension_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,12 @@ import (
"sync"
"testing"

"github.com/stretchr/testify/assert"
"github.com/stretchr/testify/require"
"go.opentelemetry.io/collector/component"
"go.opentelemetry.io/collector/extension/experimental/storage"
"go.opentelemetry.io/collector/extension/extensiontest"
"go.opentelemetry.io/collector/featuregate"
)

func TestExtensionIntegrity(t *testing.T) {
Expand Down Expand Up @@ -185,6 +187,58 @@ func TestTwoClientsWithDifferentNames(t *testing.T) {
require.Equal(t, myBytes2, data)
}

func TestSanitize(t *testing.T) {
testCases := []struct {
name string
componentName string
sanitizedName string
}{
{
name: "safe characters",
componentName: `.UPPERCASE-lowercase_1234567890~`,
sanitizedName: `.UPPERCASE-lowercase_1234567890~`,
},
{
name: "unsafe characters",
componentName: `slash/backslash\colon:asterisk*questionmark?quote'doublequote"angle<>pipe|exclamationmark!percent%space `,
sanitizedName: `slash~backslash~colon~asterisk~questionmark~quote~doublequote~angle~~pipe~exclamationmark~percent~space~`,
},
}

for _, testCase := range testCases {
t.Run(testCase.name, func(t *testing.T) {
assert.Equal(t, testCase.sanitizedName, sanitize(testCase.componentName))
})
}
}

func TestComponentNameWithUnsafeCharacters(t *testing.T) {
err := featuregate.GlobalRegistry().Set("extension.filestorage.replaceUnsafeCharacters", true)
require.NoError(t, err)

tempDir := t.TempDir()

f := NewFactory()
cfg := f.CreateDefaultConfig().(*Config)
cfg.Directory = tempDir

extension, err := f.CreateExtension(context.Background(), extensiontest.NewNopCreateSettings(), cfg)
require.NoError(t, err)

se, ok := extension.(storage.Extension)
require.True(t, ok)

client, err := se.GetClient(
context.Background(),
component.KindReceiver,
newTestEntity("my/slashed/component*"),
"",
)

require.NoError(t, err)
require.NotNil(t, client)
}

func TestGetClientErrorsOnDeletedDirectory(t *testing.T) {
ctx := context.Background()

Expand Down
2 changes: 1 addition & 1 deletion extension/storage/go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ require (
go.opentelemetry.io/collector/component v0.80.0
go.opentelemetry.io/collector/confmap v0.80.0
go.opentelemetry.io/collector/extension v0.80.0
go.opentelemetry.io/collector/featuregate v1.0.0-rcv0013
go.uber.org/zap v1.24.0
)

Expand All @@ -30,7 +31,6 @@ require (
github.com/mitchellh/reflectwalk v1.0.2 // indirect
github.com/pmezard/go-difflib v1.0.0 // indirect
go.opentelemetry.io/collector/config/configtelemetry v0.80.0 // indirect
go.opentelemetry.io/collector/featuregate v1.0.0-rcv0013 // indirect
go.opentelemetry.io/collector/pdata v1.0.0-rcv0013 // indirect
go.opentelemetry.io/otel v1.16.0 // indirect
go.opentelemetry.io/otel/metric v1.16.0 // indirect
Expand Down