Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Port variable-length-quantity exercise. #960

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
12 changes: 12 additions & 0 deletions config.json
Original file line number Diff line number Diff line change
Expand Up @@ -1174,6 +1174,18 @@
"topics": [
"strings"
]
},
{
"slug": "variable-length-quantity",
"name": "Variable Length Quantity",
"uuid": "dd914e41-bff2-4954-a3cf-fabe603795c1",
"practices": [],
"prerequisites": [],
"difficulty": 2,
"topics": [
"bitwise",
"either"
]
}
],
"foregone": [
Expand Down
1 change: 1 addition & 0 deletions exercises/.gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@

32 changes: 32 additions & 0 deletions exercises/practice/variable-length-quantity/.docs/instructions.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,32 @@
# Instructions

Implement variable length quantity encoding and decoding.

The goal of this exercise is to implement [VLQ](https://en.wikipedia.org/wiki/Variable-length_quantity) encoding/decoding.

In short, the goal of this encoding is to encode integer values in a way that would save bytes.
Only the first 7 bits of each byte is significant (right-justified; sort of like an ASCII byte).
So, if you have a 32-bit value, you have to unpack it into a series of 7-bit bytes.
Of course, you will have a variable number of bytes depending upon your integer.
To indicate which is the last byte of the series, you leave bit #7 clear.
In all of the preceding bytes, you set bit #7.

So, if an integer is between `0-127`, it can be represented as one byte.
Although VLQ can deal with numbers of arbitrary sizes, for this exercise we will restrict ourselves to only numbers that fit in a 32-bit unsigned integer.
Here are examples of integers as 32-bit values, and the variable length quantities that they translate to:

```text
NUMBER VARIABLE QUANTITY
00000000 00
00000040 40
0000007F 7F
00000080 81 00
00002000 C0 00
00003FFF FF 7F
00004000 81 80 00
00100000 C0 80 00
001FFFFF FF FF 7F
00200000 81 80 80 00
08000000 C0 80 80 00
0FFFFFFF FF FF FF 7F
```
8 changes: 8 additions & 0 deletions exercises/practice/variable-length-quantity/.meta/config.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
{
"authors": [],
"files": {
"solution": [],
"test": [],
"example": []
}
}
80 changes: 80 additions & 0 deletions exercises/practice/variable-length-quantity/.meta/tests.toml
Original file line number Diff line number Diff line change
@@ -0,0 +1,80 @@
[canonical-tests]

# zero
"35c9db2e-f781-4c52-b73b-8e76427defd0" = true

# arbitrary single byte
"be44d299-a151-4604-a10e-d4b867f41540" = true

# largest single byte
"ea399615-d274-4af6-bbef-a1c23c9e1346" = true

# smallest double byte
"77b07086-bd3f-4882-8476-8dcafee79b1c" = true

# arbitrary double byte
"63955a49-2690-4e22-a556-0040648d6b2d" = true

# largest double byte
"29da7031-0067-43d3-83a7-4f14b29ed97a" = true

# smallest triple byte
"3345d2e3-79a9-4999-869e-d4856e3a8e01" = true

# arbitrary triple byte
"5df0bc2d-2a57-4300-a653-a75ee4bd0bee" = true

# largest triple byte
"f51d8539-312d-4db1-945c-250222c6aa22" = true

# smallest quadruple byte
"da78228b-544f-47b7-8bfe-d16b35bbe570" = true

# arbitrary quadruple byte
"11ed3469-a933-46f1-996f-2231e05d7bb6" = true

# largest quadruple byte
"d5f3f3c3-e0f1-4e7f-aad0-18a44f223d1c" = true

# smallest quintuple byte
"91a18b33-24e7-4bfb-bbca-eca78ff4fc47" = true

# arbitrary quintuple byte
"5f34ff12-2952-4669-95fe-2d11b693d331" = true

# maximum 32-bit integer input
"7489694b-88c3-4078-9864-6fe802411009" = true

# two single-byte values
"f9b91821-cada-4a73-9421-3c81d6ff3661" = true

# two multi-byte values
"68694449-25d2-4974-ba75-fa7bb36db212" = true

# many multi-byte values
"51a06b5c-de1b-4487-9a50-9db1b8930d85" = true

# one byte
"baa73993-4514-4915-bac0-f7f585e0e59a" = true

# two bytes
"72e94369-29f9-46f2-8c95-6c5b7a595aee" = true

# three bytes
"df5a44c4-56f7-464e-a997-1db5f63ce691" = true

# four bytes
"1bb58684-f2dc-450a-8406-1f3452aa1947" = true

# maximum 32-bit integer
"cecd5233-49f1-4dd1-a41a-9840a40f09cd" = true

# incomplete sequence causes error
"e7d74ba3-8b8e-4bcb-858d-d08302e15695" = true

# incomplete sequence causes error, even if value is zero
"aa378291-9043-4724-bc53-aca1b4a3fcb6" = true

# multiple values
"a91e6f5a-c64a-48e3-8a75-ce1a81e0ebee" = true

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems that you are more informed than me about the purpose of .meta/tests.toml.

I'd have to get back to you about this, or perhaps you can link to where the format and purpose reads.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I don't either, but many other exercise has this file and its content seems straightforward to follow.

Original file line number Diff line number Diff line change
@@ -0,0 +1,22 @@
name: variable-length-quantity
version: 0.0.0.1 # TODO: what should this number be?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Example solutions don't have a version number.


dependencies:
- base

library:
exposed-modules: Vlq
source-dirs: src
ghc-options: -Wall
dependencies:
- mtl

tests:
test:
main: Tests.hs
source-dirs: test
dependencies:
- variable-length-quantity
- hspec
- text
- QuickCheck
Original file line number Diff line number Diff line change
@@ -0,0 +1,66 @@
{-# LANGUAGE BinaryLiterals #-}
{-# LANGUAGE FlexibleContexts #-}
{-# LANGUAGE LambdaCase #-}
{-# LANGUAGE NumericUnderscores #-}

module Vlq
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How about VariableLengthQuality?

Suggested change
module Vlq
module VariableLengthQuality

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have the impression that one word module name seems to be preferred on exercises that have long names, but I couldn't find a good one, Encode is probably fine, given that Data.Text.Encoding in text does have both encoding and decoding functions.

( encodes
, decodes
, DecodeError (..)
)
where

import Control.Monad
import Control.Monad.Except
import Control.Monad.State.Strict
import Data.Bits
import Data.List
import Data.Word

data DecodeError
= IncompleteSequence
| TooManyBits
deriving (Show, Eq)

encodeOne :: Word32 -> [Word8]
encodeOne 0 = [0]
encodeOne x = reverse . unfoldr go $ (x, True)
where
go (cur, fstOctet) = do
guard $ cur /= 0
let (q, r) = cur `quotRem` 0b1000_0000
r' = fromIntegral $ if fstOctet then r else r .|. 0b1000_0000
pure (r', (q, False))

encodes :: [Word32] -> [Word8]
encodes = concatMap encodeOne

decodeOne :: MonadError DecodeError m => [Word8] -> m Word32
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Very nice to have something like this in an example.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! Something needs to be addressed in this part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it's perfectly fine.

decodeOne xs = do
let l = length xs
when (l == 0 || l > 5) $
throwError IncompleteSequence
when (l == 5 && head xs > 0b1000_1111) $
throwError TooManyBits
pure $ foldl (\acc x -> (acc `unsafeShiftL` 7) .|. (fromIntegral x .&. 0b0111_1111)) 0 xs

mayDecodeNext :: (MonadState [Word8] m, MonadError DecodeError m) => m (Maybe Word32)
mayDecodeNext =
get >>= \case
[] -> pure Nothing
st
| (highs, rest) <- span ((/= 0) . (.&. 0b1000_0000)) st ->
Just
<$> case rest of
[] -> throwError IncompleteSequence
low : rest' -> do
put rest'
decodeOne (highs <> [low])

decodes :: [Word8] -> Either DecodeError [Word32]
decodes = evalState (runExceptT decodeAll)
where
decodeAll =
mayDecodeNext >>= \case
Nothing -> pure []
Just x -> (x :) <$> decodeAll
23 changes: 23 additions & 0 deletions exercises/practice/variable-length-quantity/package.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
name: variable-length-quantity
version: 0.0.0.1 # TODO: what should this number be?
Copy link
Contributor

@sshine sshine Feb 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question cannot currently be answered.

This track's exercise versioning policy is described in the track README:

https://github.com/exercism/haskell#exercise-versioning

Which originally meant "Go to the exercise's canonical-data.json and look up the version property". But since exercism/problem-specifications#1674, canonical data no longer contains exercise versions. The reason is political and has to do with people disagreeing about what should go into the problem-specifications repository. Some people got upset, the repository was frozen for a year, and the aftermath is that versioning was removed.

Some of the technical argumentation refers to automated test generators:

To prevent breaking changes, the canonical data is currently versioned using SemVer, so in theory test generators could use this to do "safe" updates. In practice though, most test generators always use the latest version to generate their exercises.

Since the Haskell track does not employ an automated test generator, the test generator is a person who does use the latest version, but does so manually. This reasoning does not seem to apply to this track as long as we manually maintain test files.

  • [...]
  • There is no longer any discussion whether a change is a patch, minor or major update.
  • We no longer need the versioning of the canonical data.

tl;dr: Canonical versions were removed, and our exercise versioning policy depends on them, so we need to make a new versioning policy.


dependencies:
- base

library:
exposed-modules: Vlq
source-dirs: src
ghc-options: -Wall
# dependencies:
# - foo # List here the packages you
# - bar # want to use in your solution.

tests:
test:
main: Tests.hs
source-dirs: test
dependencies:
- variable-length-quantity
- hspec
- text
- QuickCheck
19 changes: 19 additions & 0 deletions exercises/practice/variable-length-quantity/src/Vlq.hs
Original file line number Diff line number Diff line change
@@ -0,0 +1,19 @@
module Vlq
( encodes
, decodes
, DecodeError (..)
)
where

import Data.Word

data DecodeError
= IncompleteSequence
| TooManyBits
deriving (Show, Eq)

encodes :: [Word32] -> [Word8]
encodes = error "You need to implement this function."

decodes :: [Word8] -> Either DecodeError [Word32]
decodes = error "You need to implement this function."
1 change: 1 addition & 0 deletions exercises/practice/variable-length-quantity/stack.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1 @@
resolver: lts-16.21
Loading