-
Notifications
You must be signed in to change notification settings - Fork 32
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multi PEM parsing and speed up PEM parsing in general #39
Conversation
var pemString = pemString.utf8[...] | ||
var pemDocuments = [PEMDocument]() | ||
while true { | ||
guard let lazyPEMDocument = try pemString.readNextPEMDocument() else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tends to read better as while let
.
guard let data = Data(base64Encoded: base64EncodedDERString, options: .ignoreUnknownCharacters) else { | ||
throw ASN1Error.invalidPEMDocument(reason: "PEMDocument not correctly base64 encoded") | ||
} | ||
guard data.isEmpty == false else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Prefer if
to guard == false
.
guard data.isEmpty == false else { | ||
throw ASN1Error.invalidPEMDocument(reason: "PEMDocument has an empty body") | ||
} | ||
guard let type = String(discriminator) else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is this failable initializer coming from?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// First find the BEGIN marker: `-----BEGIN <SOME DISCRIMINATOR>----- | ||
guard | ||
let beginDiscriminatorPrefix = self.firstRange(of: "-----BEGIN ".utf8[...]), | ||
let beginDiscriminatorSuffix = self[beginDiscriminatorPrefix.upperBound...].firstRange( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we encapsulate this pattern in a helper function? We do it twice.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also this function could provide a few ranges, instead of the two we have here, so we can do more semantic named range lookups later in the function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have refactored it into this method:
extension Substring.UTF8View {
func firstRangesOf(
`prefix`: Substring,
suffix: Substring
) -> (
prefix: Range<Index>,
infix: Range<Index>,
suffix: Range<Index>
)?
}
/// -----END <SOME DISCRIMINATOR>----- | ||
/// ``` | ||
/// This function attempts find the BEGIN and END marker. | ||
/// It then tries to extract the discriminator and the base64 encoded. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The base64 encoded what?
private func checkLineLengthsOfBase64EncodedString() throws { | ||
var message = self | ||
let lastIndex = message.index(before: message.endIndex) | ||
while message.isEmpty == false { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Avoid explicit comparison to false, prefer !message.isEmpty
.
Address all comments in this commit . |
Motivation
Trust roots on Ubuntu and other linux distributions are stored in a single PEM file that contains multiple certificates. We want to support loading these in the future and therefore need to be able to parse multiple certificates from a single PEM file.
Modificaitons
PEMDocument.parseMultiple(pemString:)
that returns an array ofPEMDocument
sPEMDocument(pemString:)
as well to speed up parsing and reduce allocations significantlyResult
TL;DR: Parsing & decoding a PEM document is now ~5x faster and mallocs ~12x less. This allows us to parse the WebPKI trust roots from its PEM string representation to the Swift type
Certificate
fromswift-certificates
in under 5ms.I have run a couple benchmarks (Swift 5.8.1 on arm64 (M1 Max) in docker) that parses 130 certificates (100 times in a loop) found at
/etc/ssl/certs
on Ubuntu.The first test just parse the PEM
String
to aPEMDocument
:The second test parse the PEM
String
to aPEMDocument
and subsequently as aCertificate
:I also run a benchmark that uses the new
PEMDocument.parseMultiple(pemString:)
method:which is roughly the same as parsing each PEM individually:
Note that macOS is even faster, likely because of the different base64 decode implementation: