Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a low level block cache #658

Merged
merged 40 commits into from
Nov 5, 2020
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
40 commits
Select commit Hold shift + click to select a range
5cc3797
Integrate file block cache
Jaifroid Oct 17, 2020
df6e498
Add ESLint documentation
Jaifroid Oct 17, 2020
977c2d9
Clarify use of Blob.arrayBuffer()
Jaifroid Oct 17, 2020
c637b7d
Prevent offset error (but still non-functional on split pages)
Jaifroid Oct 17, 2020
cd156b5
Comment cleanup
Jaifroid Oct 17, 2020
0e0d607
Fixed CONCAT (but some images not displaying)
Jaifroid Oct 18, 2020
0ff40b0
Ensure simple reads go through file-split checking
Jaifroid Oct 18, 2020
1531fea
Correct begin and end logic
Jaifroid Oct 18, 2020
440abb5
Add test in case subarray is empty
Jaifroid Oct 18, 2020
7422b6a
Re-organize code
Jaifroid Oct 18, 2020
b20c3e1
Attend to CodeFactor issue (unused Require definition)
Jaifroid Oct 18, 2020
d2cfcab
Document zimfile
Jaifroid Oct 18, 2020
8edc388
Document filecache
Jaifroid Oct 18, 2020
7ab9e51
Change var to const for constants
Jaifroid Oct 18, 2020
2e2ccc7
Changes from self review
Jaifroid Oct 19, 2020
38f5474
Simplify regex
Jaifroid Oct 19, 2020
d58a845
Use latest JSDoc format
Jaifroid Oct 19, 2020
67eb4ba
Normalize spacing
Jaifroid Oct 22, 2020
018676c
Documentation changes as per review
Jaifroid Oct 25, 2020
523f17d
Convert to use Map
Jaifroid Oct 27, 2020
e26e4cd
Strip redundant filename from cache and add reset
Jaifroid Oct 27, 2020
c5c610c
Rename reset to init
Jaifroid Oct 27, 2020
38cb3e7
Add a numeric ZIM file ID
Jaifroid Oct 31, 2020
d1230f8
Typo
Jaifroid Oct 31, 2020
c6f512c
Correct the documentation
Jaifroid Oct 31, 2020
00ac12e
Add typing and change format of cache key
Jaifroid Oct 31, 2020
0f63a4f
Restore separator
Jaifroid Nov 1, 2020
1044a12
Swap order and do not nullify cache._first and cache._last
Jaifroid Nov 1, 2020
14de7f0
Clear some cache properties on reset
Jaifroid Nov 1, 2020
03cbb58
Correct typo, conform to JSDoc API and faster check
Jaifroid Nov 2, 2020
a43d606
Fixed logic by following implementation in description
Jaifroid Nov 2, 2020
331707a
Refactor to use simple Map-based implementation of LRUCache
Jaifroid Nov 3, 2020
f22aaa6
Add support for IE11
Jaifroid Nov 4, 2020
d90cc7d
Delete 10% of cache in one go
Jaifroid Nov 4, 2020
862c09d
Make amount dynamic
Jaifroid Nov 4, 2020
fddb0f1
Raise to 25%
Jaifroid Nov 4, 2020
248923d
Avoid using that = this
Jaifroid Nov 4, 2020
5a7f345
Tidy
Jaifroid Nov 4, 2020
70c3ae5
Added clearer comment on procedure for moving a Cache entry
Jaifroid Nov 4, 2020
0471b86
Comment out metrics
Jaifroid Nov 4, 2020
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
186 changes: 186 additions & 0 deletions www/js/lib/filecache.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,186 @@
/**
* filecache.js: Generic cache for small, frequently read file slices.
* It discards cached blocks according to a least-recently-used algorithm.
* It is used primarily for fast Directory Entry lookup, speeding up binary search.
*
* Copyright 2020 Mossroy, peter-x, jaifroid and contributors
* License GPL v3:
*
* This file is part of Kiwix.
*
* Kiwix JS is free software: you can redistribute it and/or modify
* it under the terms of the GNU General Public License as published by
* the Free Software Foundation, either version 3 of the License, or
* (at your option) any later version.
*
* Kiwix JS is distributed in the hope that it will be useful,
* but WITHOUT ANY WARRANTY; without even the implied warranty of
* MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
* GNU General Public License for more details.
*
* You should have received a copy of the GNU General Public License
* along with Kiwix JS (file LICENSE). If not, see <http://www.gnu.org/licenses/>
*/
'use strict';

define(['q'], function (Q) {
/**
* Set maximum number of cache blocks of BLOCK_SIZE bytes each
* Maximum size of cache in bytes = MAX_CACHE_SIZE * BLOCK_SIZE
* @constant
* @type {Number}
*/
const MAX_CACHE_SIZE = 4000;

/**
* The maximum blocksize to read or store via the block cache (bytes)
* @constant
* @type {Number}
*/
const BLOCK_SIZE = 4096;

/**
* A Block Cache employing a Least Recently Used caching strategy
* @typedef {Object} BlockCache
* @property {Number} capacity The maximum number of entries in the cache
* @property {Map} cache A map to store the cache keys and data
*/

/**
* Creates a new cache with max size limit of MAX_CACHE_SIZE blocks
* LRUCache implemnentation with Map adapted from https://markmurray.co/blog/lru-cache/
*/
function LRUCache() {
/** CACHE TUNING **/
// console.log('Creating cache of size ' + MAX_CACHE_SIZE + ' * ' + BLOCK_SIZE + ' bytes');
// Initialize persistent Cache properties
this.capacity = MAX_CACHE_SIZE;
this.cache = new Map();
}

/**
* Tries to retrieve an element by its id. If it is not present in the cache, returns undefined; if it is present,
* then the value is returned and the entry is moved to the bottom of the cache
* @param {String} key The block cache entry key (file.id + ':' + byte offset)
* @returns {Uint8Array | undefined} The requested cache data or undefined
*/
LRUCache.prototype.get = function (key) {
var entry = this.cache.get(key);
// If the key does not exist, return
if (!entry) return entry;
// Remove the key and re-insert it (this moves the key to the bottom of the Map: bottom = most recent)
this.cache.delete(key);
this.cache.set(key, entry);
// Return the cached data
return entry;
};
Jaifroid marked this conversation as resolved.
Show resolved Hide resolved

/**
* Stores a value in the cache by id and prunes the least recently used entry if the cache is larger than MAX_CACHE_SIZE
* @param {String} key The key under which to store the value (file.id + ':' + byte offset from start of ZIM archive)
* @param {Uint8Array} value The value to store in the cache
*/
LRUCache.prototype.store = function (key, value) {
// We get the existing entry's object for memory-management purposes; if it exists, it will contain identical data
// to <value>, but <entry> is strongly referenced by the Map. (It should be rare that two async Promises attempt to
// store the same data in the Cache, once the Cache is sufficiently populated.)
var entry = this.cache.get(key);
// If the key already exists, delete it and re-insert it, so that it will be added
// to the bottom of the Map (bottom = most recent)
if (entry) this.cache.delete(key);
else entry = value;
Jaifroid marked this conversation as resolved.
Show resolved Hide resolved
this.cache.set(key, entry);
// If we've exceeded the cache capacity, then delete the least recently accessed value,
// which will be the item at the top of the Map, i.e the first position
if (this.cache.size > this.capacity) {
if (this.cache.keys) {
var firstKey = this.cache.keys().next().value;
this.cache.delete(firstKey);
} else {
// IE11 doesn't support the keys iterator, so we have to do a forEach loop through all 4000 entries
// to get the oldest values. To prevent excessive iterations, we delete 25% at a time.
var q = Math.floor(0.25 * this.capacity);
var c = 0;
// console.log('Deleteing ' + q + ' cache entries');
this.cache.forEach(function (v, k, map) {
if (c > q) return;
map.delete(k);
c++;
});
}
}
};

/**
* A new Block Cache
* @type {BlockCache}
*/
var cache = new LRUCache();

/** CACHE TUNING **/
// DEV: Uncomment this block and blocks below marked 'CACHE TUNING' to measure Cache hit and miss rates for different Cache sizes
// var hits = 0;
// var misses = 0;

/**
* Read a certain byte range in the given file, breaking the range into chunks that go through the cache
* If a read of more than BLOCK_SIZE * 2 (bytes) is requested, do not use the cache
* @param {Object} file The requested ZIM archive to read from
* @param {Number} begin The byte from which to start reading
* @param {Number} end The byte at which to stop reading (end will not be read)
* @return {Promise<Uint8Array>} A Promise that resolves to the correctly concatenated data from the cache
* or from the ZIM archive
*/
var read = function (file, begin, end) {
// Read large chunks bypassing the block cache because we would have to
// stitch together too many blocks and would clog the cache
if (end - begin > BLOCK_SIZE * 2) return file._readSplitSlice(begin, end);
var readRequests = [];
var blocks = {};
// Look for the requested data in the blocks: we may need to stitch together data from two or more blocks
for (var id = Math.floor(begin / BLOCK_SIZE) * BLOCK_SIZE; id < end; id += BLOCK_SIZE) {
var block = cache.get(file.id + ':' + id);
if (block === undefined) {
// Data not in cache, so read from archive
/** CACHE TUNING **/
// misses++;
// DEV: This is a self-calling function, i.e. the function is called with an argument of <id> which then
// becomes the <offset> parameter
readRequests.push(function (offset) {
return file._readSplitSlice(offset, offset + BLOCK_SIZE).then(function (result) {
cache.store(file.id + ':' + offset, result);
blocks[offset] = result;
});
}(id));
} else {
/** CACHE TUNING **/
// hits++;
blocks[id] = block;
}
}
/** CACHE TUNING **/
// if (misses + hits > 2000) {
// console.log('** Block cache hit rate: ' + Math.round(hits / (hits + misses) * 1000) / 10 + '% [ hits:' + hits +
// ' / misses:' + misses + ' ] Size: ' + cache.cache.size);
// hits = 0;
// misses = 0;
// }
// Wait for all the blocks to be read either from the cache or from the archive
return Q.all(readRequests).then(function () {
var result = new Uint8Array(end - begin);
var pos = 0;
// Stitch together the data parts in the right order
for (var i = Math.floor(begin / BLOCK_SIZE) * BLOCK_SIZE; i < end; i += BLOCK_SIZE) {
var b = Math.max(i, begin) - i;
var e = Math.min(end, i + BLOCK_SIZE) - i;
if (blocks[i].subarray) result.set(blocks[i].subarray(b, e), pos);
pos += e - b;
}
return result;
});
};

return {
read: read
};
});
33 changes: 21 additions & 12 deletions www/js/lib/util.js
Original file line number Diff line number Diff line change
Expand Up @@ -196,22 +196,31 @@ define(['q'], function(Q) {
}

/**
* Reads a Uint8Array from the given file starting at byte offset begin and
* for given size
* Reads a Uint8Array from the given file starting at byte offset begin until end
* @param {File} file The file object to be read
* @param {Integer} begin The offset in <File> at which to begin reading
* @param {Integer} size The number of bytes to read
* @param {Integer} end The byte at whcih to stop reading (reads up to and including end - 1)
* @returns {Promise<Uint8Array>} A Promise for an array buffer with the read data
*/
function readFileSlice(file, begin, size) {
return Q.Promise(function (resolve, reject) {
var reader = new FileReader();
reader.onload = function (e) {
resolve(new Uint8Array(e.target.result));
};
reader.onerror = reader.onabort = reject;
reader.readAsArrayBuffer(file.slice(begin, begin + size));
});
function readFileSlice(file, begin, end) {
if ('arrayBuffer' in Blob.prototype) {
// DEV: This method uses the native arrayBuffer method of Blob, if available, as it eliminates
// the need to use FileReader and set up event listeners; it also uses the method's native Promise
// rather than setting up potentially hundreds of new Q promises for small byte range reads
Jaifroid marked this conversation as resolved.
Show resolved Hide resolved
return file.slice(begin, end).arrayBuffer().then(function (buffer) {
return new Uint8Array(buffer);
});
} else {
return Q.Promise(function (resolve, reject) {
var reader = new FileReader();
reader.readAsArrayBuffer(file.slice(begin, end));
reader.addEventListener('load', function (e) {
resolve(new Uint8Array(e.target.result));
});
reader.addEventListener('error', reject);
reader.addEventListener('abort', reject);
});
}
}

/**
Expand Down
Loading