Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

gzip: Add to_deflate/from_deflate functions #472

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

wader
Copy link
Owner

@wader wader commented Oct 23, 2022

No description provided.

@detunized
Copy link

Hi! Thanks for a great tool! I recently discovered that there seems to be no support for inflating the data chunks in the tool. At least I have not been able to find it. I have a json that has deflated base64 encoded strings and I wanted to query | decode | decompress | repl those binary blobs and I figured it wasn't possible. I tried different things along these lines:

.key.subkey | base64 | zlib/inflate?

Now I found this pull request and it seems to do what I need. Does it? Would it be possible to merge this? Thanks!

@wader
Copy link
Owner Author

wader commented Jun 6, 2023

Thanks! to add to_deflate (or inflate?) shouldn't be much work, then i think you should be able to do:

.key.subkey |= (from_base64 | from_deflate | <do something> | to_deflate | to_base64)

I kind of forgot about this PR, i know i was a bit reluctant about how to support formats like this. Should they be proper "normal" formats somehow or should they be just native jq function like in this PR? maybe in in the end i would look the same to a user in a query anyway so maybe ok to add.

@wader wader force-pushed the gzip-fromdeflate branch from 9502516 to dcfabdf Compare June 6, 2023 10:17
@wader
Copy link
Owner Author

wader commented Jun 6, 2023

Rebased and added to_deflate:

➜  fq git:(gzip-fromdeflate) ✗ go run . -n '"abc" | to_deflate | to_base64 | debug | from_base64 | from_deflate'
["DEBUG","SkxKBgQAAP//"]
   │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f│0123456789abcdef│
0x0│61 62 63│                                      │abc│            │.: raw bits 0x0-0x2.7 (3)

Give that a go and see if it works for you

@wader wader changed the title gzip: Add fromdeflate function gzip: Add to_deflate/from_deflate functions Jun 6, 2023
@detunized
Copy link

Thanks for a quick fix. I'm not able to build this it on my machine at this point. I'll wait for a release and update.

@wader
Copy link
Owner Author

wader commented Jun 8, 2023

Ok, but it would be great if you could try and see if it fits your use case. If you have golang installed you can run this PR-branch like this:

$ go run github.com/wader/fq@gzip-fromdeflate -n '"abc" | to_deflate | to_base64 | debug | from_base64 | from_deflate'
["DEBUG","SkxKBgQAAP//"]
   │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d 0e 0f│0123456789abcdef│
0x0│61 62 63│                                      │abc│            │.: raw bits 0x0-0x2.7 (3)

@wader
Copy link
Owner Author

wader commented Jun 8, 2023

Could you provide some example json and how you would like to transform it?

@detunized
Copy link

I tested with the go command line you've provided. It failed for me with the following:

❯ go run github.com/wader/fq@gzip-fromdeflate '.ck.a0.c | from_base64 | from_deflate' test.json
error: test.json: flate: corrupt input before offset 5
exit status 5

I think the problem is that there are two (at least) kind of deflate formats. Zlib and raw deflate. Raw deflate is just a compressed stream. Zlib deflate has a two byte header, like in my case:

00000000  78 da e5 55 41 6c 1b 45 14 1d 27 52 ea 6f 97 10  |xÚåUAl.E..'Rêo..|
00000010  68 b7 2d 8d 94 aa a0 aa 12 a7 b9 a0 ac 0f 43 85  |h·-..ª ª.§¹ ¬.C.|
00000020  54 09 19 5f 68 b3 87 25 54 48 b9 70 08 7b f1 89  |T.._h³.%TH¹p.{ñ.|
00000030  1a 5b 1c 40 0a 71 11 97 22 71 00 42 56 82 a9 47  |.[[email protected].."q.BV.©G|
00000040  8a 45 b5 68 4e 51 54 37 27 c3 8a 4a 94 48 b4 e2  |.EµhNQT7'Ã.J.H´â|
00000050  60 b4 6d 2d 4d 17 71 b4 d4 93 e1 cf ee da d9 a4  |`´m-M.q´Ô.áÏîÚÙ¤|
00000060  4e 5b 24 40 14 0e d6 9f f9 3b 9e f9 ff fd f7 ff  |N[$@..Ö.ù;.ùÿý÷ÿ|

The first two bytes 78 da are the Zlib header. I think you'd have to make two sets functions for zlib and for the raw format.

Take a look at CyberChef, they have Raw and Zlib inflate/deflate: https://gchq.github.io/CyberChef/#recipe=Zlib_Inflate(0,0,'Adaptive',false,false)Raw_Inflate(0,0,'Adaptive',false,false)

This is the json:

{
  "ck": {
    "a0": {
      "c": "eNrlVUFsG0UUHSdS6m+XEGi3LY2UqqCqEqe5oKwPQ4VUCRlfaLOHJVRIuXAIe/GJGlscQApxEZcicQBCVoKpR4pFtWhOUVQ3J8OKSpRItOJgtG0tTRdxtNST4c/u2tmkTlskQBQO1p/5O575//33/5uae/X9/Kcvv/jcauvXN57cVwyb73735Y9vXnr7pSPFy88bk71vPvrq5uTY7+MkkxubIqctBrCBP4/9EGyhzTAGseVgMAEVxrjBfbAEQIVz35PXA4NKaLsKKpTzT6gCj5LI13MVi30Ufd2gQAEsClS5HNZdRQ3q4t6klsugber/C+ZRGWz7+dB/TgpxS14qGnwVCvi2JZaF5TNQkmEcPs/wLlhsJol9NTAYxshnMF6M0ZdBQTGwVFO1qY+xSqgovNPHO/0vigbD95jJChzv5YS3dW5MsHvAYOLq+NUcVPt95yjU7cWGE9r1lTN2K2xXW17Zrh8AW+9KZ+PtYejUymW7E9peyWkstqpl5zgsVa/YvXotjNzr1UNl+1QLr5mvduxDEJ8qNQae/dBoOXY9zEOoT07C/IVWJ2zUWvvBCeuN1mIOwot9faxfbdSdaVi0w+SpUvJ2re4cAa/hVMvl2rHtQAzY5ZgaPO7YYT+DBMjsIycMBkmRWQI0F7rQz2Ch4wIK2qOSGi4CaSKQLhYYAfXcu8E1icV3/WADz3kuD94DbVeDnivZhqvXbrSuuIy3scgF9FmucHumpOumdD0Xgt+SQsfFEHyNcyzC0STUG2eWbtSWlmoto9bSWKUhGYF9OsM8PAiwy2MkA5kcObHdAEhEJL/FNR6cX8MG8BgPdBOsRc3gJzhJxEglzTCH5GWpZtDEX01wcxE3RV+JsFFB1Bw7cHDddWyNmOg/FfW7ljDhqSj/+3kyItsBHOmsH8KTGSj0eqXeVmb2lFE6NWtsLqyUFlbWMwvGNvFidA9Cv17r2I5zOn7sYoaQfZkJcsDAIhcQMIsxVgEEQXQDS3J/Q2KikgaGFNh53K9IBET66OPR94pk7JzAZMVbRQMkrIFO9DB8v2kMYih9XNjKrOyOZAqw4eoX+zXEIm7NQZK3IhJnyfR5jMdAANvxlOKGQDL7GKOP7wqMUZEAiUdjUi4HmsyWKeg2mWn03TJV5DNMCndcEzyTBW2cFucUxq1+LmoiTFzdG8T7y/YwWHcmNCzyQvVYcuBZQjBLMqZh15CNAOyD8XGSI9PkBX2miRzGsnDd103kkuayhqYpRAINrn3ksuAsXsvhmvtKr7mB0DUlhYqPft8FA0vLlYl7IaYBS6ogOKC6sIXcztBVFdvlxM6oWdqETUpgls6hvYv2PMy678Cmexvta2g/Q/s12jtgYEma5nU9V5hec/NatD4IEOiBnIWlD7NQXpsE5+yF0Ksda5GDA5QSiMgkztBGWEAgCRkBIXl6OMpLiDaJMU93zYjmQrpnoXUSp0gHx2gWrtzLQuPzLBz/Ngv261noYFTeSYD5+cYTeP0vFxp1m9yczGllzf8pZfV3Kas5QlnNR1RWE4dRd4Symnzo36GsUY8s+xaqqlIPUFYxQlkHgjBKWSMhIaLt/z+VNX+fsg6G0V7K2v1PKWtupLKKR1BWf5eymn+Rsvr/bmXNDpU1Khbjeyor/ceUNb9DWf2UsmI8+K7cU1m7I5S1+7gqa36orDylrCKlrH5KWWVKWWVKWWVKWVVKWRV+o3+jsnZTytp9jJX1D9BsYtI="
    }
  }
}

@wader wader force-pushed the gzip-fromdeflate branch from dcfabdf to db19876 Compare June 9, 2023 20:30
@wader
Copy link
Owner Author

wader commented Jun 9, 2023

Aha i see. I actually ran into the deflate (or just flate?) vs zlib confusing when i fiddled around with TLS support, the RFC says DEFLATE but in reality it's zlib, took a while to understand :)

So something like this? this adds "extra" in front of .ck.a0.c. What format is the uncompressed data btw?

$ go run . -i . ~/src/jq/a.json
json> .
{
  "ck": {
    "a0": {
      "c": "eNrlVUFsG0UUHSdS6m+XEGi3LY2UqqCqEqe5oKwPQ4VUCRlfaLOHJVRIuXAIe/GJGlscQApxEZcicQBCVoKpR4pFtWhOUVQ3J8OKSpRItOJgtG0tTRdxtNST4c/u2tmkTlskQBQO1p/5O575//33/5uae/X9/Kcvv/jcauvXN57cVwyb73735Y9vXnr7pSPFy88bk71vPvrq5uTY7+MkkxubIqctBrCBP4/9EGyhzTAGseVgMAEVxrjBfbAEQIVz35PXA4NKaLsKKpTzT6gCj5LI13MVi30Ufd2gQAEsClS5HNZdRQ3q4t6klsugber/C+ZRGWz7+dB/TgpxS14qGnwVCvi2JZaF5TNQkmEcPs/wLlhsJol9NTAYxshnMF6M0ZdBQTGwVFO1qY+xSqgovNPHO/0vigbD95jJChzv5YS3dW5MsHvAYOLq+NUcVPt95yjU7cWGE9r1lTN2K2xXW17Zrh8AW+9KZ+PtYejUymW7E9peyWkstqpl5zgsVa/YvXotjNzr1UNl+1QLr5mvduxDEJ8qNQae/dBoOXY9zEOoT07C/IVWJ2zUWvvBCeuN1mIOwot9faxfbdSdaVi0w+SpUvJ2re4cAa/hVMvl2rHtQAzY5ZgaPO7YYT+DBMjsIycMBkmRWQI0F7rQz2Ch4wIK2qOSGi4CaSKQLhYYAfXcu8E1icV3/WADz3kuD94DbVeDnivZhqvXbrSuuIy3scgF9FmucHumpOumdD0Xgt+SQsfFEHyNcyzC0STUG2eWbtSWlmoto9bSWKUhGYF9OsM8PAiwy2MkA5kcObHdAEhEJL/FNR6cX8MG8BgPdBOsRc3gJzhJxEglzTCH5GWpZtDEX01wcxE3RV+JsFFB1Bw7cHDddWyNmOg/FfW7ljDhqSj/+3kyItsBHOmsH8KTGSj0eqXeVmb2lFE6NWtsLqyUFlbWMwvGNvFidA9Cv17r2I5zOn7sYoaQfZkJcsDAIhcQMIsxVgEEQXQDS3J/Q2KikgaGFNh53K9IBET66OPR94pk7JzAZMVbRQMkrIFO9DB8v2kMYih9XNjKrOyOZAqw4eoX+zXEIm7NQZK3IhJnyfR5jMdAANvxlOKGQDL7GKOP7wqMUZEAiUdjUi4HmsyWKeg2mWn03TJV5DNMCndcEzyTBW2cFucUxq1+LmoiTFzdG8T7y/YwWHcmNCzyQvVYcuBZQjBLMqZh15CNAOyD8XGSI9PkBX2miRzGsnDd103kkuayhqYpRAINrn3ksuAsXsvhmvtKr7mB0DUlhYqPft8FA0vLlYl7IaYBS6ogOKC6sIXcztBVFdvlxM6oWdqETUpgls6hvYv2PMy678Cmexvta2g/Q/s12jtgYEma5nU9V5hec/NatD4IEOiBnIWlD7NQXpsE5+yF0Ksda5GDA5QSiMgkztBGWEAgCRkBIXl6OMpLiDaJMU93zYjmQrpnoXUSp0gHx2gWrtzLQuPzLBz/Ngv261noYFTeSYD5+cYTeP0vFxp1m9yczGllzf8pZfV3Kas5QlnNR1RWE4dRd4Symnzo36GsUY8s+xaqqlIPUFYxQlkHgjBKWSMhIaLt/z+VNX+fsg6G0V7K2v1PKWtupLKKR1BWf5eymn+Rsvr/bmXNDpU1Khbjeyor/ceUNb9DWf2UsmI8+K7cU1m7I5S1+7gqa36orDylrCKlrH5KWWVKWWVKWWVKWVVKWRV+o3+jsnZTytp9jJX1D9BsYtI="
    }
  }
}
json> .ck.a0.c |= (from_base64 | from_zlib | ["extra",.] | to_zlib | to_base64) | ., (.ck.a0.c | from_base64 | from_zlib)
{
  "ck": {
    "a0": {
      "c": "eJzklUFoHFUYx98kkO5/t8ZoO21tIKVKKXh6F9nZw7MIBYl7se0cxhiEHPQQ97Ii2HEXDwox2+KlQg9qzAN93QdZLCPvFEInOa0+LFgDtniITOzCdMTjQk+rzOxsskk3bQUVq6f37Tdvee/7f9///V4///ZbM2PnXv4g98mLzz+z6P/62uP7JqPGe99+8cMbV9554cjk1WfN0fbXH315a3To92FiZIfGyGmbASsM8Nj3wToDDMbQXQVMJuEyJkyhYUvAFUJ76kZgUoUmD+FSIS7TEB4lSa7NQ9bNUXi0FRQoYFPQkAss85CalMOmFrU5Q9OK/y+ZR1WwnRdb+Wkl5aa6MmmKRRQkYMt5aWuGUDG4QgtDtGCzifTui4HJNGwxAZcJ7WkVFEIGO2yETarR5ApuKNm0lnJTfz5pMg6bWawgAFsQ0YxrY5LdBcPI6vBqFpVOp3QUNWe2Xoqc2sIZx4+aFd8rO7UDcOJfxbPdn4exUS2XnY3I8Yql+qxfKZeOY65yzWnXqlGSXq4cKjun/MipTVU2nEPo7irWe5n9qPslpxblEMU7RzF1wd+I6lV/P0pRre7PZhFd6sTbOpV6rTSOWSdKjyqmZ1drpSPw6qVKuVw9tn0RE7sSY73DS07UMYiRNfaREyZD2mSWCi1k3Oin1I20gZK2qaIm12haCi4X4jJX8Pid4LoK4XEdrFDA4yJ4H/G6GLS5Yis8jnkSu5yJJuUocMDmkrctRZctxT2O4Le00d1mSLEkBEZWj6ZXvXlm7mZ1bq7qm1U/1qpfkgHa91eYw/0EuzpEDBhZcmLbACowGYMtYj2EuM4EPCaC2ARLiRl0qpOCLcPUDOcCk7I+M8SDv5jqxmmbhvSlRJswSMyxQwfOl2nIu4P+42R8ri0tPJHUf++cDKi2J0d/1Q+YkwkU2u1ie93InzKLp/Lm2sxCcWZh2Zgxtwevq+5BdGrVDadUOt097JJByD5jhBwwARQYYDPGXAjtyVZgK6FXFOApGphKwg6FdpUQl5WGp0Ty3VWMTUspN+WbkyYUlhAXehjfrZm9OxQ/LqwbC7tvMgbHj2qXOtW6H3Wt2StyMxniDBk/zwBTAM3uKyVMyVDQgK2FdqXQXkgCm0vaHcr5IB5m25J0e5hp8t22wiRnWhS3uQXPYkGTcUyHUm6GP03GgzCyureI97btQbLuLGiryTOVY+mGpwkxiEGGYtljyQYI9uHwMMmScfJcvKfBGFwIEfu6IUQyy7E0DSlTaYCG1nClYN1YbcVCh3EsTA00FIWrBTM1h6kAEVpwtZTj0PBCBAfCFtYpYNDFsLvOp+tEmKcNrFGCPD2HNXoHeXoeef4u1vgvyPNXsMY/RZ5/hTV+GyYHGtaN+F1hcSys60l8EAjiBzmDuYsZlJdGUTp7IfKqx3xysKdSKhEZRd2vR4VOp0TIAAnJk1tPebHS6ZCu5v2uGWAug5AM/JM5dDYq5XIG1+5mUP8sg+PfZOC8msHGxQy8k8DUVP0x1JyfL9RrDrk1mo3JmvtTZNW7yGoNIKv1kGS1GJqtAWS1xFZ+B1kTj8xrWzGE4X3IKgeQtQeEQWRNQEJkU/8/yZq7h6y9x2gvsrb+U2TNDiSrfAiy6l1ktf4isup/N1kzW2RNmsXEnmSl/xhZczvIqvvIqgBbCbUnWVsDyNp6VMma2yKr6COr7COr7iOr6iOr6iOr6iNr2EfWEBD0byRrq4+srUeYrH8EAAD//58rZPY="
    }
  }
}
     │00 01 02 03 04 05 06 07 08 09 0a 0b 0c 0d│0123456789abcd│
0x000│65 78 74 72 61 10 53 50 83 0b 9a 47 3e 23│extra.SP...G>#│.: raw bits 0x0-0xedc.7 (3805)
*    │until 0xedc.7 (end) (3805)               │              │

Another problem, but probably not in practice, is that doing the above will not preserve compression level and possibly compression method (but i think RFC 1950 and go standard library only supports deflate?).

@wader
Copy link
Owner Author

wader commented Jun 10, 2023

Wondering if zlib should be turned into a proper format, meaning from_zlib would return a decode value structure with fields for compression method, level, and uncompressed data etc. But that would be a bit "asymmetric" with how to_zlib would work 🤔

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants