Skip to content
This repository has been archived by the owner on Oct 22, 2023. It is now read-only.

Produce SpotBugs results in SARIF v2.1.0 format #95

Closed
ghost opened this issue Jun 19, 2020 · 17 comments
Closed

Produce SpotBugs results in SARIF v2.1.0 format #95

ghost opened this issue Jun 19, 2020 · 17 comments

Comments

@ghost
Copy link

ghost commented Jun 19, 2020

Now that SARIF v2.1.0 is an OASIS Standard, there's an internal Microsoft initiative to integrate SpotBugs into our SARIF-driven static analysis ecosystem. We'd like to do it in a way that benefits the entire community.

@michaelcfanning and I (@lgolding) are the co-editors of the SARIF specification, and we're happy to help in whatever way you think is most appropriate. We've integrated SARIF into open source tools such as Cake and ESLint, as well as Microsoft tools such as the C++, C#, and VB compilers.

There are various approaches to adding SARIF support:

  • Integrate SARIF production directly into SpotBugs as an output format, alongside your existing XML, HTML, emacs, and xdoc formats.

  • Provide an open source "SARIF converter" that would plug into the existing "SARIF MultiTool". We have taken this approach with other tools; for example, with the Fortify FPR converter.

  • If SpotBug supported a plugin mechanism for output formats as it does for analysis rules, a SARIF emitter could plug into it.

In addition to enabling SpotBugs to participate in SARIF-driven ecosystems, SARIF output could alleviate certain issues in the SpotBugs XML output format:

  • As mentioned in spotbugs/discuss/#69, the XML format has trouble representing multiple code flows that share a common set of locations. SARIF has explicit support for that scenario: a result can contain multiple code flows, and the code locations can be shared among the code flows (or not, on a per-location basis).

  • The BugInstance sub-element that specifies the location of the bug varies, depending on the BugInstance's type. SARIF would provide a uniform representation for all results (because its result object has a locations property that is explicitly seprarate from its codeFlows property).

@KengoTODA
Copy link
Member

KengoTODA commented Jun 20, 2020

At this moment, the report generation process isn't pluggable. A workaround is that you generate your report from XML report that is parser-friendly.

To solve known issues, it's technically possible to make this process pluggable, but it's stateful now so it needs some time.

@ghost
Copy link
Author

ghost commented Jun 23, 2020

What do you think about the direct export option? We can continue to explore the XML->SARIF option but it has some complications (described above). Also, by the way, GitHub's automatic code scanning will accept direct export from tools that produce SARIF 2.1.0, so there would be a tangible benefit to SpotBugs in supporting it.

@KengoTODA
Copy link
Member

It sounds awesome. And it should be technically possible to implement the direct export option.

I'm not sure about this format, and probably have no time to handle, so I hope that other contributors will have a try.

@uhafner
Copy link

uhafner commented Jun 24, 2020

This would be really helpful to have a different format since parsing of the current format is a complicate task. (And using the internal parser is also not very elegant since the whole SpotBugs library is required as a dependency).

On the other hand, is there a Java parser for SARIF already available? In my role as author of the Jenkins warnings plugin I am a consumer of the current SpotBugs format. So exporting the bugs to a new format is only half the way, we then need also the way back from the XML file to the object model of the bugs. Is there anything planned here on the SARIF side?

@ghost
Copy link
Author

ghost commented Jun 24, 2020

We currently have .NET and TypeScript language bindings for the SARIF object model. We don't yet have a Java binding. The .NET binding is actually generated programatically from the SARIF JSON schema. If there is a JSON-schema-to-Java-OM utility around, you would get the bindings for free. Do you know of one? I didn't find one in a quick search just now.

@ghost
Copy link
Author

ghost commented Jun 24, 2020

@uhafner, by the way, if your Jenkins warning plugin consumed SARIF, you would automatically have support for any tool that produces that standard format. This sounds like another great opportunity. We can talk about it over on your repo if you'd like.

@michaelcfanning FYI.

@michaelcfanning
Copy link

I believe that @lcartey of CodeQL has produced a Java OM from the SARIF schema.

@uhafner
Copy link

uhafner commented Jun 26, 2020

I see. The schema looks quite complex. The tools that normally show up in the warnings plugin typically produce a list of warnings with a couple of properties only. Are there a lot of tools using SARIF already?

@michaelcfanning
Copy link

It's a complex format, intended to cover the range of static analysis tools out there. You might find the SARIF Tutorial a little friendlier for ramping up. We have a well-developed C# SARIF-SDK to facilitate read/write, other scenarios but that isn't helpful for Java, of course.

Internally at Microsoft, every tool that's run as part of security/other policy has SARIF support. Every tool owned internally or for which a Microsoft engineer serves as open source coordinator (such as BinSkim exports the format directly, Both of Microsoft's publicly shipping analysis platforms (PREfast and Roslyn) have direct support.

Externally, there's support for ESLint, CLang analyzer (built-in). GrammaTech supports the format, as does Semmle/CodeQL and MicroFocus Fortify is working on it (I believe). We have open source converters for Fortify and Contrast Security.

The discussion was actually prompted by Microsoft utilization of SpotBugs as Larry mentioned. All our engineering systems for producing results, filing work items, etc., are driven by the format, so we're looking for some sort of SARIF solution so that we can get SpotBugs plugged in.

@lgolding and I are both happy to help advise on SARIF support. If it's helpful, we could attempt a contribution for this. If direct support in SpotBugs looks too intimidating to take on, I think Microsoft will fund authoring an open source converter from SpotBugs XML.

@KengoTODA
Copy link
Member

KengoTODA commented Jun 27, 2020

note: Following files are key factors to generate spotbugs report. Not sure that they provide enough feature to generate in the SARIF format or not.

@KengoTODA
Copy link
Member

I'll check sarif-tutorials later. Thanks for your share!

@KengoTODA
Copy link
Member

KengoTODA commented Jun 28, 2020

I'm working on this issue. I cannot find a good way to generate Java binding from JSON schema, so writing it on the top of the org.json:json library.
https://github.com/spotbugs/spotbugs/compare/sarif-report

Still not sure that we can solve the problem in XML format.

@ghost
Copy link
Author

ghost commented Jun 28, 2020

I'm glad you're working on this! I'm happy to video conference with you to discuss the details of how to map your internal bug representation to SARIF. We can start simple, and then there are many ways to produce SARIF output that's effective for end users. In fact I'm writing a document about that now, which I'll share with you very soon.

@KengoTODA
Copy link
Member

KengoTODA commented Jun 28, 2020

Thank you, Current my idea is that:

And here is known issues:

  • Need discussion about how to map SpotBugs' priority to Level.
  • SpotBugs' detailed descriptions for bug pattern are written in HTML, see example. We need to convert it to plain text or markdown, to use it as fullDescription.
  • By default SpotBugs' bug pattern has no URL property. For default detectors we can use spotbugs.readthedocs.io/en/stable/bugDescriptions.html#${BUG_PATTERN_TYPE} but for plugins we have no way to provide. I found <BugsUrl> in message.xml.

I will list more my doubt/questions later.

@lcartey
Copy link

lcartey commented Jul 10, 2020

@KengoTODA Sorry for not responding sooner, but I have had some success with:

http://www.jsonschema2pojo.org/

For generating a Java object model for SARIF. They also provide Maven/Gradle/Ant plugins for automating the process.

@KengoTODA
Copy link
Member

SpotBugs 4.1.0 provides an experimental support for the SARIF 2.1.0. The generated report can pass the latest SARIF validator.

It should have known and unknown issues (e.g. spotbugs/spotbugs#1221), please feel free to issue at https://github.com/spotbugs/spotbugs/issues

@yongyan-gh
Copy link

@KengoTODA Sorry for not responding sooner, but I have had some success with:

http://www.jsonschema2pojo.org/

For generating a Java object model for SARIF. They also provide Maven/Gradle/Ant plugins for automating the process.

Hi @lcartey , have you ever successfully generated POJO classes from Sarif Json schema.
KengoTODA faced some issue in generating AdditionalProperties.
The jsonschema2pojo may not have a full coverage of Json schema specs.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants