-
Notifications
You must be signed in to change notification settings - Fork 30
/
Copy pathREADME-py
79 lines (63 loc) · 3.65 KB
/
README-py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
The implementation of the m-tree is on the module named mtree and the module
mtree.functions contains some support functions.
mtree.MTree - The class that implements the m-tree data structure.
The data objects must be any hashable object and the support functions
(distance and split functions) must understand them.
mtree.MTree(min_node_capacity, max_node_capacity, distance_function, split_function)
All the constructor arguments are optional:
* min_node_capacity - Specifies the minimum number of children for each
node. Must be at least 2, and the default is 50.
* max_node_capacity - Specifies the maximum number of children for each
node. Must be greater than min_node_capacity and the default is
2 * min_node_capacity - 1.
* distance_function - A callable which calculates the distance between
two data objects. The default is mtree.functions.euclidean_distance.
* split_function - A callable used when a node passes its max_node_capacity.
Its arguments are a set of data objects and the distance_function. It
must return a sequence with the following four values:
- First chosen data object.
- Subset with at least min_node_capacity objects based on the first
chosen data object. Must contain the first chosen data object.
- Second chosen data object.
- Subset with at least min_node_capacity objects based on the second
chosen data object. Must contain the second chosen data object.
The default value for split_function is a composition of the functions
mtree.functions.random_promotion and mtree.functions.balanced_partition.
mtree.MTree.add(data)
Adds and indexes a data object. The behavior of the m-tree becomes undefined
if the data was already add.
mtree.MTree.remove(data)
Removes a data object from the index. Raises KeyError if the object is not found.
mtree.MTree.get_nearest(query_data[, range][, limit])
Returns an iterator which yields the results of the query. Each result
is a namedtuple with a data object (result.data) and its distance from
the query_data. The results are returned in non-decreasing distance and
are limited by a range (the maximum distance) and/or a limit (the number
of results). Both range and limit arguments default to infinity.
The results are processed as they are fetched. So, you can iterate on the
results of a query with the default infinity limitations even on a very
big tree and then stop only when satisfied.
mtree.functions.euclidean_distance(data1, data2)
A distance function which calculates the distance between two data objects
representing coordinates. The data objects can be lists, tuples or any iterable
objects.
mtree.functions.random_promotion(data_objects, distance_function)
A promotion function which randomly chooses and returns two data objects.
mtree.functions.balanced_partition(promoted_data1, promoted_data2, data_objects, distance_function)
A partition function which partitions data objects by alternating the promoted
data and choosing the nearest object from data_objects. The algorithm is roughly
this:
partition1 := Empty
partition2 := Empty
Repeat until data_object is empty:
X := The object in data_objects which is the nearest to promoted_data1
Remove X from data_object
Add X to partition1
Y := The object in data_objects which is the nearest to promoted_data2
Remove Y from data_object
Add Y to partition2
Return partition1, partition2
mtree.functions.make_split_function(promotion_function, partition_function)
Returns a new split function composed by the callable arguments.
mtree.functions.make_cached_distance_function(distance_function)
Returns a cached version of the distance_function.