forked from pjmikkol/bwtc
-
Notifications
You must be signed in to change notification settings - Fork 0
/
Copy pathCompressor.hpp
122 lines (112 loc) · 3.99 KB
/
Compressor.hpp
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
/**
* @file Compressor.hpp
* @author Pekka Mikkola <[email protected]>
*
* @section LICENSE
*
* This file is part of bwtc.
*
* bwtc is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* bwtc is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with bwtc. If not, see <http://www.gnu.org/licenses/>.
*
* @section DESCRIPTION
*
* Header for Compressor-class. The compressor is an abstraction of
* an compression pipeline. It has 3 stages which are illustrated in
* the following diagram:
*
* input --> PRECOMPRESSION --> BWT --> ENTROPY CODING --> ouput
*
* The compressor has a memory limit which gives an upper bound for
* the maximum amount of memory used. Limited by this restriction,
* each phase handles as big blocks of data as possible.
*
* Precompression:
* Precompressor reads as much data as possible into the memory and then
* compresses the data read. After this precompressed data is divided into
* BWT-blocks of maximum size (data has to be divided further since the
* computation of BWT needs more memory than precompression).
*
* BWT:
* Each BWT-block is transformed using BWTManager, which handles the
* selection (and evaluation) of BWT-algorithms.
*
* Entropy coding:
* Each BWT-block is compressed independently using some entropy coder.
* Encoded BWTBlocks are written into the compressed file in the order
* specified by ?TODO?
*
*
* COMPRESSED FILE FORMAT:
*
* At the top level compressed file is divided into file header and several
* precompression blocks.
*
* File header:
* File header contains global information about the compressed file such as
* used entropy coder etc.
*
* Precompression block:
* Precompression blocks are independent of each other. Each precompression
* block corresponds to a varying size of input data. Precompression block
* contains header of the block and 1--n BWT-blocks. Header of Precompression
* block contains metadata needed to uncompress precompression (in the case of
* grammar compression that means grammar) and the number of BWT-blocks.
*
* BWT-block:
* BWT-block contains header, trailer and entropy encoded data, which is
* transformed. Header of BWT-block contains the size of the compressed BWT-
* block and the size of uncompressed BWT-block (before precompression). In
* addition header contains the data necessary to uncompress the block
* (written by entropy encoder).
* Trailer of BWT-block contains the number of starting points used in inverse
* and their positions.
*
*/
#ifndef BWTC_COMPRESSOR_HPP_
#define BWTC_COMPRESSOR_HPP_
#include "bwtransforms/BWTManager.hpp"
#include "preprocessors/Precompressor.hpp"
#include "EntropyCoders.hpp"
#include "Streams.hpp"
#include <string>
namespace bwtc {
struct Options {
Options(size_t memLimit_, char entropyCoder_) :
memLimit(memLimit_), entropyCoder(entropyCoder_) {}
Options(char entropyCoder_) : entropyCoder(entropyCoder_) {}
size_t memLimit;
char entropyCoder;
};
class Compressor {
public:
Compressor(const std::string& in, const std::string& out,
const std::string& preprocessing, size_t memLimit,
char entropyCoder);
Compressor(InStream* in, OutStream* out,
const std::string& preprocessing, size_t memLimit,
char entropyCoder);
~Compressor();
size_t compress(size_t threads);
size_t writeGlobalHeader();
void initializeBwtAlgorithm(char choice, uint32 startingPoints);
private:
InStream *m_in;
OutStream *m_out;
EntropyEncoder *m_coder;
Precompressor m_precompressor;
BWTManager m_bwtmanager;
Options m_options;
};
} //namespace bwtc
#endif