Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add PE field set under file and process #731

Merged
merged 10 commits into from
Feb 12, 2020
Merged

Add PE field set under file and process #731

merged 10 commits into from
Feb 12, 2020

Conversation

rw-access
Copy link
Contributor

Closes #676

Adds a PE field set under file and process.

@rw-access rw-access requested a review from webmat January 24, 2020 19:27
Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this field set is applicable to process, it should be nested at process.parent as well. Although the way to cleanly do that is in progress over in #698.

The way this is currently defined makes it inherently limited to Windows. I think you're right that the analogous executable metadata headers don't specify anything like the fields you're adding here (Mac's Mach-O or Linux' ELF). But if they ever do, it would be weird to have to use them via .pe.original_file_name... I wonder if we could directly add this to file and process instead? E.g. file.original_name, file.company, etc. Or perhaps file.meta.original_name.

Their definition could still mention Windows PE, and that's likely the only time they would be filled for now. But that definition could later be loosened, if say Mach-O (or another header standard) comes along and starts tracking the same concepts for the Mac or Linux.

schemas/pe.yml Show resolved Hide resolved
schemas/pe.yml Show resolved Hide resolved
schemas/pe.yml Outdated Show resolved Hide resolved
@webmat
Copy link
Contributor

webmat commented Feb 4, 2020

I think it's time to open a can of worms, here. Let's discuss the question, and see if we think this is worth looking into or not, at this time.

Actually it's two separate cans :-)

  1. I've long been thinking that we should have nested file under process. Perhaps most times we'd just end up populating process.file.path -- the equivalent of our current process.executable, and that's fine. However, any time we want to capture additional information about the file (owner, size, original_file_name, etc.) we're slowly duplicating fields from file to process. I think we should instead bite the bullet and nest file at process.file. Whether or not we deprecate process.executable for removal in ECS 2.0 is another matter.
  2. Second can of worms is that each one of these fields makes total sense on their own, directly on file. Windows could provide it via the PE headers, but they are all fundamental concepts that can totally be gathered in different ways on other platforms. Perhaps I would adjust the names this way, if we add them directly under file.:
  • pe.company moved to file.company
  • pe.description moved to file.description
  • pe.file_version renamed to file.version
  • pe.original_file_name renamed to file.original_name
  • pe.product moved to file.product -- this one I'm not sure whether to rename. Perhaps I would align with os.platform in spirit, and call it file.platform, as the platform this file is destined to run on?

If we decide to tackle these two cans of worms together, then we get a simplification. Add PE headers in file. directly, and nest file under process. which not only gives us all of the PE attributes, but also any other file attributes we need.

@rw-access WDYT?

@rw-access
Copy link
Contributor Author

rw-access commented Feb 4, 2020

1- I think we should hold off on process.file.* for now. More on this in my meta comment below.

2 - I don't think it makes sense to put all of these as top-level file fields, because most only make sense for a narrow subset of file events. Text files don't have "company" or "description" or "version". These concepts only exist within the file header, so they don't necessarily pertain to the metadata surrounding a file event (size, location, acting process, etc). The concept of file_version means one thing to PE/COFF, and something entirely different to sharepoint or VCS. A top level field called version is way too subject to interpretation, and I would like to avoid that.

Removing the pe fieldset and nesting introduces significant ambiguities for hypothetical use cases. I would rather stick with pe for our well known SIEM uses cases.

(meta) Honestly, I'm fairly resistant to making the process.file major sweeping changes in ECS now. I've been down that road before with https://car.mitre.org/data_model, and you end up recreating CyBox, which was such a perfect model that it was no longer usable. Instead, we opted for flat and took a pragmatic approach and it helped significantly. I think ECS needs to gain more stability first in the eyes of users, and I would rather accumulate the tech debt so that we can move forward with SIEM, and then for ECS 2.0 we can revisit these things with our lessons learned, and only make breaking changes once.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I think you convinced me on no 2, let's not introduce ambiguities, and let's keep it nested under pe..

With regards to no 1, I'm not proposing to make a breaking change now. I'm proposing to declare the deprecation earlier, in preparation for 2.0 :-) But are you saying that you don't think attributes like file.owner or file.attributes are necessary when looking at process? If that's the case, perhaps you're right and we should hold off.

However if we anticipate we may want to do that at some point, then instead of having to deprecate only process.executable, now we'd be deprecating and migrating process.pe and process.code_signature as well... That's why I bring it up now :-)

schemas/pe.yml Show resolved Hide resolved
schemas/pe.yml Show resolved Hide resolved
@webmat
Copy link
Contributor

webmat commented Feb 6, 2020

Alright, let's close can of worms no 1 as well, on process.file. If we need to track more details about the file behind a process, the guidance could be to populate both field sets at the top level in the same event, instead.

{
  "process": { "name": "ruby", "executable": "/usr/bin/ruby" },
  "file": { "path": "/usr/bin/ruby", "owner": "root", "mode": 0755 ... }
}

The clarifications on current field names convinced me as well. Let's proceed with the field names as they are. Descriptions look good as well :-)

The only thing we need to do here, is merge master, re-generate the files, and we'll be good to merge. Let's merge this PR first, and adjust #679 after.

@webmat
Copy link
Contributor

webmat commented Feb 6, 2020

The only thing we need to do here, is merge master, re-generate the files, and we'll be good to merge. Let's merge this PR first, and adjust #679 after.

Sorry, on top of the above, please don't forget to remove all of the unneeded format: attributes as well :-)

@rw-access
Copy link
Contributor Author

rw-access commented Feb 7, 2020

Thanks for the comments, and good idea for process.file.attributes, once we get there. I hadn't thought about that, but it could be really powerful. I don't think Sysmon or Endpoint collect this yet IIRC, but this could detect a lot of interesting detection use cases.

If DLL merges, first then I'll add the nesting for dll.pe here. Otherwise, I'll just flip depending on which one is ready first.

Copy link
Contributor

@webmat webmat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, LGTM :-)

@rw-access rw-access merged commit 40fb29b into elastic:master Feb 12, 2020
@rw-access rw-access deleted the pe-metadata branch February 12, 2020 21:22
webmat pushed a commit to webmat/ecs that referenced this pull request Mar 4, 2020
- code_signature (elastic#733)
- second entry for elastic#739 in the schema section, mentioning the addition of `process.parent.hash`

Also adjusted the wording of elastic#731 and elastic#747.
dcode pushed a commit to dcode/ecs that referenced this pull request Apr 15, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

PE header metadata
2 participants