Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Converting Audit bson file to JSON #27

Open
myuconnect opened this issue Apr 15, 2021 · 3 comments
Open

Converting Audit bson file to JSON #27

myuconnect opened this issue Apr 15, 2021 · 3 comments

Comments

@myuconnect
Copy link

Hi Shane,

We have a requirment to process all audit bson file of Mongo database and store it at a centralized location for reporting. In our current process, we scan all the audit bson file convert it to json and send the json file to be persisted at centralized location via REST API call. Following is a snippet of code...

**from bson.json_util import loads, dumps, DEFAULT_JSON_OPTIONS
from bson import decode_all

if not self.util.isFileExists(auditFile):
return self.util.buildResponse(self.Globals.unsuccess, f"file {auditFile} is missing ")

myAuditFileSize = self.util.getFileSizeBytes(auditFile) / (1024*1024)

if myAuditFileSize > self.BSON_FILE_SIZE_LIMIT_MB:
print(f"Audit bson file '{auditFile}' size is larger than {self.BSON_FILE_SIZE_LIMIT_MB}")
return self.util.buildResponse(self.Globals.unsuccess, f"Audit bson file '{auditFile}' size is larger than {self.BSON_FILE_SIZE_LIMIT_MB}MB ")

3. processing - converting bson to json

try:
if self.util.getFileExtn(auditFile).lower() == "json":
myMongoAuditData = self.util.readJsonFile(auditFile)
else:
with open(auditFile, 'rb') as file:
myMongoAuditData = decode_all(file.read())

return myMongoAuditData**

We are facing issue on processing larger bson file thus restricting the size of audit bson file which will be processed. I need your help to use "bsonjs" module to process the audit bson file to generate the json file (will be better to generate smaller json file).

Pls assist.

Thanks,

Anil Kumar

@ShaneHarvey
Copy link
Collaborator

We are facing issue on processing larger bson file thus restricting the size of audit bson file which will be processed. I need your help to use "bsonjs" module to process the audit bson file to generate the json file (will be better to generate smaller json file).

Can you describe the issue you're facing? What does the "audit bson file" look like? Does it contain many small documents, many large documents, or a single large document?

Have you tried using bson.decode_file_iter() from pymongo? This method decodes a bson stream file without needing to read the entire file at once.

with open(auditFile, 'rb') as file:
    for doc in bson.decode_file_iter(file):   # Iterate over all the documents in the file
        print(doc)

@myuconnect
Copy link
Author

Shane,

Thanks for your response, our audit bson file is huge around 2gb, I was wondering if there is a way to use bsonjs to iter over the bson document while converting it to json as we can do it in bson.decode_file_iter

Thanks,

Anil

@ShaneHarvey
Copy link
Collaborator

There is no decode_file_iter equivalent in bsonjs yet. We could add one or you could implement it yourself with some reading of the BSON format (see http://bsonspec.org/spec.html). Check out the decode_file_iter source from pymongo: https://github.com/mongodb/mongo-python-driver/blob/3.11.3/bson/__init__.py#L1135-L1161

Or you could try using bson.decode_file_iter() from pymongo instead of using bsonjs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants