diff --git a/LICENSE b/LICENSE new file mode 100644 index 0000000..5ee07b5 --- /dev/null +++ b/LICENSE @@ -0,0 +1,21 @@ +Copyright (c) 2011-2012 Tiejun Cheng + +Permission is hereby granted, free of charge, to any person +obtaining a copy of this software and associated documentation +files (the "Software"), to deal in the Software without +restriction, including without limitation the rights to use, +copy, modify, merge, publish, distribute, sublicense, and/or sell +copies of the Software, and to permit persons to whom the +Software is furnished to do so, subject to the following conditions: + +The above copyright notice and this permission notice shall be +included in all copies or substantial portions of the Software. + +THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES +OF MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT +HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, +WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING +FROM, OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR +OTHER DEALINGS IN THE SOFTWARE. diff --git a/README.md b/README.md new file mode 100644 index 0000000..58bd481 --- /dev/null +++ b/README.md @@ -0,0 +1,195 @@ +FSelector: a Ruby package for feature selection and ranking +=========================================================== + +**Git**: [https://github.com/need47/fselector](https://github.com/need47/fselector) +**Author**: Tiejun Cheng +**Email**: [need47@gmail.com](mailto:need47@gmail.com) +**Copyright**: 2011-2012 +**License**: MIT License +**Latest Version**: 0.1.0 +**Release Date**: March 1st 2012 + +Synopsis +-------- + +FSelector is an open-access Ruby package that aims to integrate as many +feature selection/ranking algorithms as possible. It enables the +user to perform feature selection by either a single algorithm or by an +ensemble of algorithms. Below is a summary of FSelector's features. + +Feature List +------------ + +**1. available algorithms** + + algorithm alias feature type + ------------------------------------------------------- + Accuracy Acc discrete + AccuracyBalanced Acc2 discrete + BiNormalSeparation BNS discrete + ChiSquaredTest CHI discrete + CorrelationCoefficient CC discrete + DocumentFrequency DF discrete + F1Measure F1 discrete + FishersExactTest FET discrete + GiniIndex GI discrete + GMean GM discrete + GSSCoefficient GSS discrete + InformationGain IG discrete + MatthewsCorrelationCoefficient MCC, PHI discrete + McNemarsTest MNT discrete + OddsRatio OR discrete + OddsRatioNumerator ORN discrete + PhiCoefficient Phi discrete + Power Power discrete + Precision Precision discrete + ProbabilityRatio PR discrete + Random Random discrete + Recall Recall discrete + Relief_d Relief_d discrete + ReliefF_d ReliefF_d discrete + Sensitivity SN, Recall discrete + Specificity SP discrete + PMetric PM continuous + Relief_c Relief_c continuous + ReliefF_c ReliefF_c continuous + TScore TS continuous + +**2. feature selection approaches** + + - by a single algorithm + - by multiple algorithms in a tandem manner + - by multiple algorithms in a consensus manner + +**3. availabe normalization and discretization algorithms for continuous feature** + + algorithm note + -------------------------------------------------------------------- + log normalization by logarithmic transformation + min_max normalization by scaling into [min, max] + zscore normalization by converting into zscore + equal_width discretization by equal width among intervals + equal_frequency discretization by equal frequency among intervals + ChiMerge discretization by ChiMerge method + +**4. supported input/output file types** + + - csv + - libsvm + - weka ARFF + - random (for test purpose) + +Installing +---------- + +To install FSelector, use the following command: + + $ gem install fselector + +Usage +----- + +**1. feature selection by a single algorithm** + + require 'fselector' + + # use InformationGain as a feature ranking algorithm + r1 = FSelector::InformationGain.new + + # read from random data (or csv, libsvm, weka ARFF file) + # no. of samples: 100 + # no. of classes: 2 + # no. of features: 10 + # no. of possible values for each feature: 3 + # allow missing values: true + r1.data_from_random(100, 2, 10, 3, true) + + # number of features before feature selection + puts "# features (before): "+ r1.get_features.size.to_s + + # select the top-ranked features with scores >0.01 + r1.select_data_by_score!('>0.01') + + # number of features before feature selection + puts "# features (after): "+ r1.get_features.size.to_s + + # you can also use multiple alogirithms in a tandem manner + # e.g. use the ChiSquaredTest with Yates' continuity correction + # initialize from r1's data + r2 = FSelector::ChiSquaredTest.new(:yates, r1.get_data) + + # number of features before feature selection + puts "# features (before): "+ r2.get_features.size.to_s + + # select the top-ranked 3 features + r2.select_data_by_rank!('<=3') + + # number of features before feature selection + puts "# features (after): "+ r2.get_features.size.to_s + + # save data to standard ouput as a weka ARFF file (sparse format) + # with selected features only + r2.data_to_weka(:stdout, :sparse) + + +**2. feature selection by an ensemble of algorithms** + + require 'fselector' + + # use both Information and ChiSquaredTest + r1 = FSelector::InformationGain.new + r2 = FSelector::ChiSquaredTest.new + + # ensemble ranker + re = FSelector::Ensemble.new(r1, r2) + + # read random data + re.data_from_random(100, 2, 10, 3, true) + + # number of features before feature selection + puts '# features before feature selection: ' + re.get_features.size.to_s + + # based on the min feature rank among + # ensemble feature selection algorithms + re.ensemble_by_rank(re.method(:by_min)) + + # select the top-ranked 3 features + re.select_data_by_rank!('<=3') + + # number of features before feature selection + puts '# features before feature selection: ' + re.get_features.size.to_s + + + **3. normalization and discretization before feature selection** + + In addition to the algorithms designed for continous feature, one + can apply those deisgned for discrete feature after (optionally + normalization and) discretization + + require 'fselector' + + # for continuous feature + r1 = FSelector::BaseContinuous.new + + # read the Iris data set (under the test/ directory) + r1.data_from_csv(File.expand_path(File.dirname(__FILE__))+'/iris.csv') + + # normalization by log2 (optional) + # r1.normalize_log!(2) + + # discretization by ChiMerge algorithm + # chi-squared value = 4.60 for a three-class problem at alpha=0.10 + r1.discretize_chimerge!(4.60) + + # apply Relief_d for discrete feature + # initialize with discretized data from r1 + r2 = FSelector::ReliefF_d.new(r1.get_sample_size, 10, r1.get_data) + + # print feature ranks + r2.print_feature_ranks + +Copyright +--------- +FSelector © 2011-2012 by [Tiejun Cheng](mailto:need47@gmail.com). +FSelector is licensed under the MIT license. Please see the {file:LICENSE} for +more information. diff --git a/Rakefile b/Rakefile new file mode 100644 index 0000000..549738c --- /dev/null +++ b/Rakefile @@ -0,0 +1,22 @@ +# +# make a ruby gem +# + +task :default => :gem + +task :gem do + Gem::Builder.new(eval(File.read('fselector.gemspec'))).build +end + +# +# test example +# +require 'rake' +require 'rake/testtask.rb' +task :test do + Rake::TestTask.new do |t| + t.libs = ['lib'] + t.test_files = FileList['test/test_*.rb'] + t.verbose = true + end +end \ No newline at end of file diff --git a/doc/Array.html b/doc/Array.html new file mode 100644 index 0000000..755fb68 --- /dev/null +++ b/doc/Array.html @@ -0,0 +1,693 @@ + + + + + + Class: Array + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: Array + + + +

+ +
+ +
Inherits:
+
+ Object + +
    +
  • Object
  • + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/util.rb
+ +
+
+ +

Overview

+
+

add functions to Array class

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + +
+

Instance Method Details

+ + +
+

+ + - (Float) ave + + + + Also known as: + mean + + +

+
+

average (mean)

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Float) + + + + — +

    average (mean)

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+14
+15
+16
+
+
# File 'lib/fselector/util.rb', line 14
+
+def ave
+  self.sum / self.size
+end
+
+
+ +
+

+ + - (Float) sd + + + +

+
+

standard deviation

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Float) + + + + — +

    standard deviation

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+32
+33
+34
+
+
# File 'lib/fselector/util.rb', line 32
+
+def sd
+  Math.sqrt(self.var)
+end
+
+
+ +
+

+ + - (Float) sum + + + +

+
+

summation

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Float) + + + + — +

    sum

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+7
+8
+9
+
+
# File 'lib/fselector/util.rb', line 7
+
+def sum
+  self.inject(0.0) { |s, i| s+i }
+end
+
+
+ +
+

+ + - (Object) to_scale(min = 0.0, max = 1.0) + + + +

+
+

scale to [min, max]

+ + +
+
+
+ + +
+ + + + +
+
+
+
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+
+
# File 'lib/fselector/util.rb', line 38
+
+def to_scale(min=0.0, max=1.0)
+  if (min >= max)
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "min must be smaller than max"
+  end
+  
+  old_min = self.min
+  old_max = self.max
+
+  self.collect do |v|
+    if old_min == old_max
+      max
+    else
+      min + (v-old_min)*(max-min)/(old_max-old_min)
+    end
+  end
+end
+
+
+ +
+

+ + - (Array<Symbol>) to_sym + + + +

+
+

to symbol

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Array<Symbol>) + + + + — +

    converted symbols

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+70
+71
+72
+
+
# File 'lib/fselector/util.rb', line 70
+
+def to_sym
+  self.collect { |x| x.to_sym }
+end
+
+
+ +
+

+ + - (Object) to_zscore + + + +

+
+

convert to zscore

+ +

ref: Wikipedia

+ + +
+
+
+ + +
+ + + + +
+
+
+
+60
+61
+62
+63
+64
+65
+
+
# File 'lib/fselector/util.rb', line 60
+
+def to_zscore
+  ave = self.ave
+  sd = self.sd
+
+  return self.collect { |v| (v-ave)/sd }
+end
+
+
+ +
+

+ + - (Float) var + + + +

+
+

variance

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Float) + + + + — +

    variance

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+22
+23
+24
+25
+26
+27
+
+
# File 'lib/fselector/util.rb', line 22
+
+def var
+  u = self.ave
+  v2 = self.inject(0.0) { |v, i| v+(i-u)*(i-u) }
+  
+  v2/(self.size-1)
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/Discretilizer.html b/doc/Discretilizer.html new file mode 100644 index 0000000..344efbd --- /dev/null +++ b/doc/Discretilizer.html @@ -0,0 +1,670 @@ + + + + + + Module: Discretilizer + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Module: Discretilizer + + + +

+ +
+ + + + + + + +
Included in:
+
FSelector::BaseContinuous
+ + + +
Defined in:
+
lib/fselector/algo_continuous/discretizer.rb
+ +
+
+ +

Overview

+
+

discretilize continous feature

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + +
+

Instance Method Details

+ + +
+

+ + - (Object) discretize_chimerge!(chisq) + + + +

+
+ +
+ Note: +

data structure will be altered

+
+
+ +

discretize by ChiMerge algorithm

+ +

ref: ChiMerge: Discretization of Numberic Attributes

+ +

chi-squared values and associated p values can be looked up at +Wikipedia
+degrees of freedom: one less than number of classes

+ +
chi-squared values vs p values
+degree_of_freedom  p<0.10  p<0.05  p<0.01  p<0.001
+        1          2.71    3.84    6.64    10.83
+        2          4.60    5.99    9.21    13.82
+        3          6.35    7.82    11.34   16.27
+
+ + +
+
+
+

Parameters:

+
    + +
  • + + chisq + + + (Float) + + + + — +

    chi-squared value

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+88
+89
+90
+91
+92
+93
+94
+95
+96
+97
+98
+99
+100
+101
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+124
+125
+126
+127
+128
+129
+130
+131
+132
+133
+134
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+
+
# File 'lib/fselector/algo_continuous/discretizer.rb', line 88
+
+def discretize_chimerge!(chisq)
+  # chisq = 4.60 # for iris::Sepal.Length
+  # for intialization
+  hzero = {}
+  each_class do |k|
+    hzero[k] = 0.0
+  end
+  
+  # determine the final boundaries for each feature
+  f2bs = {}
+  each_feature do |f|
+    #f = "Sepal.Length"
+    # 1a. initialize boundaries
+    bs, cs, qs = [], [], []
+    fvs = get_feature_values(f).sort.uniq
+    fvs.each_with_index do |v, i|
+      if i+1 < fvs.size
+        bs << (v+fvs[i+1])/2.0
+        cs << hzero.dup
+        qs << 0.0
+      end
+    end
+    bs << fvs.max+1.0 # add the rightmost boundary
+    cs << hzero.dup
+    
+    # 1b. initialize counts for each interval
+    each_sample do |k, s|
+      next if not s.has_key? f
+      bs.each_with_index do |b, i|
+        if s[f] < b
+          cs[i][k] += 1.0
+          break
+        end
+      end
+    end
+    
+    # 1c. initialize chi-squared values between two adjacent intervals
+    cs.each_with_index do |c, i|
+      if i+1 < cs.size
+        qs[i] = calc_chisq(c, cs[i+1])
+      end
+    end
+    
+    # 2. iteratively merge intervals
+    until qs.empty? or qs.min > chisq
+      qs.each_with_index do |q, i|
+        if q == qs.min
+          #pp "i: #{i}"
+          #pp bs.join(',')
+          #pp qs.join(',')
+          
+          # update cs for merged two intervals
+          cm = {}
+          each_class do |k|
+            cm[k] = cs[i][k]+cs[i+1][k]
+          end
+          
+          # update qs if necessary
+          # before merged intervals
+          if i-1 >= 0
+            qs[i-1] = calc_chisq(cs[i-1], cm)
+          end
+          # after merged intervals
+          if i+1 < qs.size
+            qs[i+1] = calc_chisq(cm, cs[i+2])
+          end
+          
+          # merge
+          bs = bs[0...i] + bs[i+1...bs.size]
+          cs = cs[0...i] + [cm] + cs[i+2...cs.size]
+          qs = qs[0...i] + qs[i+1...qs.size]
+          
+          #pp bs.join(',')
+          #pp qs.join(',')
+          
+          # break out
+          break
+          
+        end
+      end
+    end
+    
+    # 3. record the final boundaries
+    f2bs[f] = bs
+  end
+
+  # discretize according to each feature's boundaries
+  each_sample do |k, s|
+    s.keys.each do |f|
+      s[f] = get_index(s[f], f2bs[f])
+    end
+  end
+  
+end
+
+
+ +
+

+ + - (Object) discretize_equal_frequency!(n_interval) + + + +

+
+ +
+ Note: +

data structure will be altered

+
+
+ +

discretize by equal-frequency intervals

+ + +
+
+
+

Parameters:

+
    + +
  • + + n_interval + + + (Integer) + + + + — +

    desired number of intervals

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+57
+58
+59
+60
+61
+62
+63
+64
+65
+66
+
+
# File 'lib/fselector/algo_continuous/discretizer.rb', line 42
+
+def discretize_equal_frequency!(n_interval)
+  n_interval = 1 if n_interval < 1 # at least one interval
+  
+  # first determine the boundaries
+  f2bs = Hash.new { |h,k| h[k] = [] }
+  each_feature do |f|
+    fvs = get_feature_values(f).sort
+    # number of samples in each interval
+    ns = (fvs.size.to_f/n_interval).round
+    fvs.each_with_index do |v, i|
+      if (i+1)%ns == 0 and (i+1)<fvs.size
+        f2bs[f] << (v+fvs[i+1])/2.0
+      end
+    end
+    f2bs[f] << fvs.max+1.0 # add the rightmost boundary
+  end
+  
+  # then discretize
+  each_sample do |k, s|
+    s.keys.each do |f|
+      s[f] = get_index(s[f], f2bs[f])
+    end
+  end
+  
+end
+
+
+ +
+

+ + - (Object) discretize_equal_width!(n_interval) + + + +

+
+ +
+ Note: +

data structure will be altered

+
+
+ +

discretize by equal-width intervals

+ + +
+
+
+

Parameters:

+
    + +
  • + + n_interval + + + (Integer) + + + + — +

    desired number of intervals

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+10
+11
+12
+13
+14
+15
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+
+
# File 'lib/fselector/algo_continuous/discretizer.rb', line 10
+
+def discretize_equal_width!(n_interval)
+  n_interval = 1 if n_interval < 1 # at least one interval
+  
+  # first determine min and max for each feature
+  f2min_max = {}
+  each_feature do |f|
+    fvs = get_feature_values(f)
+    f2min_max[f] = [fvs.min, fvs.max]
+  end
+  
+  # then discretize
+  each_sample do |k, s|
+    s.keys.each do |f|
+      min_v, max_v = f2min_max[f]
+      if min_v == max_v
+        wn = 0
+      else
+        wn = ((s[f]-min_v)*n_interval.to_f / (max_v-min_v)).to_i
+      end
+      
+      s[f] = (wn<n_interval) ? wn : n_interval-1
+    end
+  end
+  
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/FRank.html b/doc/FRank.html new file mode 100644 index 0000000..cd6221a --- /dev/null +++ b/doc/FRank.html @@ -0,0 +1,504 @@ + + + + + + Module: FRank + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Module: FRank + + + +

+ +
+ + + + + + + + +
Defined in:
+
lib/frank/base.rb,
+ lib/frank/ensemble.rb,
lib/frank/algo_discrete/GMean.rb,
lib/frank/algo_discrete/Power.rb,
lib/frank/algo_discrete/Random.rb,
lib/frank/algo_discrete/Recall.rb,
lib/frank/algo_mixed/base_mixed.rb,
lib/frank/algo_discrete/Accuracy.rb,
lib/frank/algo_discrete/Relief_d.rb,
lib/frank/algo_continuous/TScore.rb,
lib/frank/algo_discrete/F1Measure.rb,
lib/frank/algo_discrete/OddsRatio.rb,
lib/frank/algo_continuous/PMetric.rb,
lib/frank/algo_discrete/GiniIndex.rb,
lib/frank/algo_discrete/Precision.rb,
lib/frank/algo_discrete/ReliefF_d.rb,
lib/frank/algo_continuous/Relief_c.rb,
lib/frank/algo_continuous/ReliefF_c.rb,
lib/frank/algo_discrete/Sensitivity.rb,
lib/frank/algo_discrete/Specificity.rb,
lib/frank/algo_discrete/McNemarsTest.rb,
lib/frank/algo_discrete/base_discrete.rb,
lib/frank/algo_discrete/ChiSquaredTest.rb,
lib/frank/algo_discrete/PhiCoefficient.rb,
lib/frank/algo_discrete/GSSCoefficient.rb,
lib/frank/algo_discrete/InformationGain.rb,
lib/frank/algo_discrete/AccuracyBalanced.rb,
lib/frank/algo_discrete/ProbabilityRatio.rb,
lib/frank/algo_discrete/FishersExactTest.rb,
lib/frank/algo_discrete/DocumentFrequency.rb,
lib/frank/algo_discrete/InformationGain_d.rb,
lib/frank/algo_discrete/MutualInformation.rb,
lib/frank/algo_continuous/base_continuous.rb,
lib/frank/algo_discrete/OddsRatioNumerator.rb,
lib/frank/algo_discrete/BiNormalSeparation.rb,
lib/frank/algo_discrete/CorrelationCoefficient.rb,
lib/frank/algo_discrete/MatthewsCorrelationCoefficient.rb
+
+ +
+
+ +

Overview

+
+

FRank: a ruby package for feature selection and ranking

+ + +
+
+
+ + +

Defined Under Namespace

+

+ + + + + Classes: Accuracy, AccuracyBalanced, Base, BaseContinuous, BaseDiscrete, BaseMixed, BiNormalSeparation, ChiSquaredTest, CorrelationCoefficient, DocumentFrequency, Ensemble, F1Measure, FishersExactTest, GMean, GSSCoefficient, GiniIndex, InformationGain, InformationGain_d, MatthewsCorrelationCoefficient, McNemarsTest, MutualInformation, OddsRatio, OddsRatioNumerator, PMetric, Power, Precision, ProbabilityRatio, Random, ReliefF_c, ReliefF_d, Relief_c, Relief_d, Sensitivity, Specificity, TScore + + +

+ +

Constant Summary

+ +
+ +
GM = +
+
+

shortcut so that you can use FRank::GM instead of FRank::GMean

+ + +
+
+
+ + +
+
+
GMean
+ +
Recall = +
+
+

Recall, also known as Sensitivity. +shortcut so that you can use FRank::Recall

+ + +
+
+
+ + +
+
+
Sensitivity
+ +
Acc = +
+
+

shortcut so that you can use FRank::Acc instead of FRank::Accuracy

+ + +
+
+
+ + +
+
+
Accuracy
+ +
TS = +
+
+

shortcut so that you can use FRank::TS instead of FRank::TScore

+ + +
+
+
+ + +
+
+
TScore
+ +
F1 = +
+
+

shortcut so that you can use FRank::F1 instead of FRank::F1Measure

+ + +
+
+
+ + +
+
+
F1Measure
+ +
Odd = +
+
+

shortcut so that you can use FRank::Odd instead of FRank::OddsRatio

+ + +
+
+
+ + +
+
+
OddsRatio
+ +
PM = +
+
+

shortcut so that you can use FRank::PM instead of FRank::PMetric

+ + +
+
+
+ + +
+
+
PMetric
+ +
GI = +
+
+

shortcut so that you can use FRank::GI instead of FRank::GiniIndex

+ + +
+
+
+ + +
+
+
GiniIndex
+ +
SN = +
+
+

shortcut so that you can use FRank::SN instead of FRank::Sensitivity

+ + +
+
+
+ + +
+
+
Sensitivity
+ +
SP = +
+
+

shortcut so that you can use FRank::SP instead of FRank::Specificity

+ + +
+
+
+ + +
+
+
Specificity
+ +
MNT = +
+
+

shortcut so that you can use FRank::MNT instead of FRank::McNemarsTest

+ + +
+
+
+ + +
+
+
McNemarsTest
+ +
CHI = +
+
+

shortcut so that you can use FRank::CHI instead of FRank::ChiSquaredTest

+ + +
+
+
+ + +
+
+
ChiSquaredTest
+ +
PHI = +
+
+

Phi coefficient, also known as Matthews correlation coefficient. +shortcut so that you can use FRank::PHI

+ + +
+
+
+ + +
+
+
MatthewsCorrelationCoefficient
+ +
GSS = +
+
+

shortcut so that you can use FRank::GSS instead of FRank::GSSCoefficient

+ + +
+
+
+ + +
+
+
GSSCoefficient
+ +
IG = +
+
+

shortcut so that you can use FRank::IG instead of FRank::InformationGain

+ + +
+
+
+ + +
+
+
InformationGain
+ +
Acc2 = +
+
+

shortcut so that you can use FRank::Acc2 instead of FRank::AccuracyBalanced

+ + +
+
+
+ + +
+
+
AccuracyBalanced
+ +
PR = +
+
+

shortcut so that you can use FRank::PR instead of FRank::ProbabilityRatio

+ + +
+
+
+ + +
+
+
ProbabilityRatio
+ +
FET = +
+
+

shortcut so that you can use FRank::FET instead of FRank::FishersExactTest

+ + +
+
+
+ + +
+
+
FishersExactTest
+ +
DF = +
+
+

shortcut so that you can use FRank::DF instead of FRank::DocumentFrequency

+ + +
+
+
+ + +
+
+
DocumentFrequency
+ +
IG_d = +
+
+

shortcut so that you can use FRank::IG_d instead of FRank::InformationGain_d

+ + +
+
+
+ + +
+
+
InformationGain_d
+ +
MI = +
+
+

shortcut so that you can use FRank::MI instead of FRank::MutualInformation

+ + +
+
+
+ + +
+
+
MutualInformation
+ +
OddN = +
+
+

shortcut so that you can use FRank::OddN instead of FRank::OddsRatioNumerator

+ + +
+
+
+ + +
+
+
OddsRatioNumerator
+ +
BNS = +
+
+

shortcut so that you can use FRank::BNS instead of FRank::BiNormalSeparation

+ + +
+
+
+ + +
+
+
BiNormalSeparation
+ +
CC = +
+
+

shortcut so that you can use FRank::CC instead of FRank::CorrelationCoefficient

+ + +
+
+
+ + +
+
+
CorrelationCoefficient
+ +
MCC = +
+
+

shortcut so that you can use FRank::MCC instead of FRank::MatthewsCorrelationCoefficient

+ + +
+
+
+ + +
+
+
MatthewsCorrelationCoefficient
+ +
+ + + + + + + + + + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Accuracy.html b/doc/FRank/Accuracy.html new file mode 100644 index 0000000..99e38cf --- /dev/null +++ b/doc/FRank/Accuracy.html @@ -0,0 +1,168 @@ + + + + + + Class: FRank::Accuracy + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Accuracy + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/Accuracy.rb
+ +
+
+ +

Overview

+
+

Accuracy (Acc)

+ +
          tp+tn          A+D
+Acc = ------------- = ---------
+       tp+fn+tn+fp     A+B+C+D
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/AccuracyBalanced.html b/doc/FRank/AccuracyBalanced.html new file mode 100644 index 0000000..363bf80 --- /dev/null +++ b/doc/FRank/AccuracyBalanced.html @@ -0,0 +1,169 @@ + + + + + + Class: FRank::AccuracyBalanced + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::AccuracyBalanced + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/AccuracyBalanced.rb
+ +
+
+ +

Overview

+
+

Accuracy Balanced (Acc2)

+ +
Acc2 = |tpr - fpr| = |A/(A+C) - B/(B+D)|
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Base.html b/doc/FRank/Base.html new file mode 100644 index 0000000..661f7bb --- /dev/null +++ b/doc/FRank/Base.html @@ -0,0 +1,1891 @@ + + + + + + Class: FRank::Base + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Base + + + +

+ +
+ +
Inherits:
+
+ Object + +
    +
  • Object
  • + + + +
+ show all + +
+ + + + + + +
Includes:
+
FileIO
+ + + + + +
Defined in:
+
lib/frank/base.rb
+ +
+
+ +

Overview

+
+

base ranking algorithm

+ + +
+
+
+ + +
+

Direct Known Subclasses

+

BaseContinuous, BaseDiscrete, BaseMixed, Ensemble

+
+ + + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Base) initialize(data = nil) + + + +

+
+

initialize from an existing data structure

+ + +
+
+
+ + +
+ + + + +
+
+
+
+13
+14
+15
+16
+
+
# File 'lib/frank/base.rb', line 13
+
+def initialize(data=nil)
+  @data = data
+  @opts = {} # store non-data information
+end
+
+
+ +
+ + +
+

Instance Method Details

+ + +
+

+ + - (Object) each_class + + + +

+
+

iterator for each class

+ +
e.g.
+self.each_class do |k|
+  puts k
+end
+
+ + +
+
+
+ + +
+ + + + +
+
+
+
+27
+28
+29
+30
+31
+32
+33
+34
+
+
# File 'lib/frank/base.rb', line 27
+
+def each_class
+  if not block_given?
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "block must be given!"
+  else
+    get_classes.each { |k| yield k }
+  end
+end
+
+
+ +
+

+ + - (Object) each_feature + + + +

+
+

iterator for each feature

+ +
e.g.
+self.each_feature do |f|
+  puts f
+end
+
+ + +
+
+
+ + +
+ + + + +
+
+
+
+45
+46
+47
+48
+49
+50
+51
+52
+
+
# File 'lib/frank/base.rb', line 45
+
+def each_feature
+  if not block_given?
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "block must be given!"
+  else
+    get_features.each { |f| yield f }
+  end
+end
+
+
+ +
+

+ + - (Object) each_sample + + + +

+
+

iterator for each sample with class label

+ +
e.g.
+self.each_sample do |k, s|
+  print k
+  s.each { |f, v| ' '+v }
+  puts
+end
+
+ + +
+
+
+ + +
+ + + + +
+
+
+
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+
+
# File 'lib/frank/base.rb', line 65
+
+def each_sample
+  if not block_given?
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          " block must be given!"
+  else      
+    get_data.each do |k, samples|
+      samples.each { |s| yield k, s }
+    end
+  end
+end
+
+
+ +
+

+ + - (Object) get_classes + + + +

+
+

get classes

+ + +
+
+
+ + +
+ + + + +
+
+
+
+78
+79
+80
+
+
# File 'lib/frank/base.rb', line 78
+
+def get_classes
+  @classes ||= @data.keys
+end
+
+
+ +
+

+ + - (Object) get_data + + + +

+
+

get data

+ + +
+
+
+ + +
+ + + + +
+
+
+
+130
+131
+132
+
+
# File 'lib/frank/base.rb', line 130
+
+def get_data
+  @data
+end
+
+
+ +
+

+ + - (Hash) get_feature_ranks + + + +

+
+

get the ranked features based on their best scores

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Hash) + + + + — +

    feature ranks

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+236
+237
+238
+239
+240
+241
+242
+243
+244
+245
+246
+247
+248
+249
+250
+251
+252
+253
+254
+
+
# File 'lib/frank/base.rb', line 236
+
+def get_feature_ranks
+  return @ranks if @ranks # already done
+  
+  scores = get_feature_scores
+  
+  # get the ranked features
+  @ranks = {} # feature => rank
+  
+  # the larger, the better
+  sorted_features = scores.keys.sort do |x,y|
+    scores[y][:BEST] <=> scores[x][:BEST]
+  end
+  
+  sorted_features.each_with_index do |sf, si|
+    @ranks[sf] = si+1
+  end
+  
+  @ranks
+end
+
+
+ +
+

+ + - (Hash) get_feature_scores + + + +

+
+

get scores of all features for all classes

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Hash) + + + + — +

    { feature => +{ class1 => score1, class2 => score2, :BEST => score_best } }

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
+
+
# File 'lib/frank/base.rb', line 206
+
+def get_feature_scores
+  return @scores if @scores # already done
+  
+  each_feature do |f|
+    calc_contribution(f)
+  end
+  
+  # best score for feature
+  @scores.each do |f, ks|
+    # the larger, the better
+    @scores[f][:BEST] = ks.values.max
+  end
+  #@scores.each { |x,v| puts "#{x} => #{v[:BEST]}" }
+  
+  @scores
+end
+
+
+ +
+

+ + - (Object) get_feature_values(f) + + + +

+
+

get feature values

+ + +
+
+
+

Parameters:

+
    + +
  • + + f + + + (Symbol) + + + + — +

    feature of interest

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+
+
# File 'lib/frank/base.rb', line 105
+
+def get_feature_values(f)
+  @fvs ||= {}
+  
+  if not @fvs.has_key? f
+    @fvs[f] = []
+    each_sample do |k, s|
+      @fvs[f] << s[f] if s.has_key? f
+    end
+  end
+  
+  @fvs[f]
+end
+
+
+ +
+

+ + - (Object) get_features + + + +

+
+

get unique features

+ + +
+
+
+ + +
+ + + + +
+
+
+
+95
+96
+97
+
+
# File 'lib/frank/base.rb', line 95
+
+def get_features
+  @features ||= @data.map { |x| x[1].map { |y| y.keys } }.flatten.uniq
+end
+
+
+ +
+

+ + - (Object) get_opt(key) + + + +

+
+

get non-data information

+ + +
+
+
+ + +
+ + + + +
+
+
+
+149
+150
+151
+
+
# File 'lib/frank/base.rb', line 149
+
+def get_opt(key)
+  @opts.has_key?(key) ? @opts[key] : nil
+end
+
+
+ +
+

+ + - (Object) get_sample_size + + + +

+
+

number of samples

+ + +
+
+
+ + +
+ + + + +
+
+
+
+161
+162
+163
+
+
# File 'lib/frank/base.rb', line 161
+
+def get_sample_size
+  @sz ||= get_data.values.flatten.size
+end
+
+
+ +
+
+
+

print feature ranks

+ + +
+
+
+ + +
+ + + + +
+
+
+
+191
+192
+193
+194
+195
+196
+197
+
+
# File 'lib/frank/base.rb', line 191
+
+def print_feature_ranks
+  ranks = get_feature_ranks
+  
+  ranks.each do |f, r|
+    puts "#{f} => #{r}"
+  end
+end
+
+
+ +
+
+
+

print feature scores

+ + +
+
+
+

Parameters:

+
    + +
  • + + kclass + + + (String) + + + (defaults to: nil) + + + — +

    class of interest

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+
+
# File 'lib/frank/base.rb', line 171
+
+def print_feature_scores(feat=nil, kclass=nil)
+  scores = get_feature_scores
+  
+  scores.each do |f, ks|
+    next if feat and feat != f
+    
+    print "#{f} =>"
+    ks.each do |k, s|
+      if kclass
+        print " #{k}->#{s}" if k == kclass
+      else
+        print " #{k}->#{s}"
+      end
+    end
+    puts
+  end
+end
+
+
+ +
+

+ + - (Hash) select_data_by_rank!(criterion, my_ranks = nil) + + + +

+
+ +
+ Note: +

data structure will be altered

+
+
+ +

reconstruct data by rank

+ + +
+
+
+

Parameters:

+
    + +
  • + + criterion + + + (String) + + + + — +

    valid criterion can be '>11', '>= 10', '==1', '<=10' or '<20'

    +
    + +
  • + +
  • + + my_ranks + + + (Hash) + + + (defaults to: nil) + + + — +

    user customized feature ranks

    +
    + +
  • + +
+ +

Returns:

+
    + +
  • + + + (Hash) + + + + — +

    data after feature selection

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+298
+299
+300
+301
+302
+303
+304
+305
+306
+307
+308
+309
+310
+311
+312
+313
+314
+315
+316
+
+
# File 'lib/frank/base.rb', line 298
+
+def select_data_by_rank!(criterion, my_ranks=nil)
+  # user ranks or internal ranks
+  ranks = my_ranks || get_feature_ranks
+  
+  my_data = {}
+  
+  each_sample do |k, s|
+    my_data[k] ||= []
+    my_s = {}
+    
+    s.each do |f,v|
+      my_s[f] = v if eval("#{ranks[f]} #{criterion}")
+    end
+    
+    my_data[k] << my_s if not my_s.empty?
+  end
+  
+  set_data(my_data)
+end
+
+
+ +
+

+ + - (Hash) select_data_by_score!(criterion, my_scores = nil) + + + +

+
+ +
+ Note: +

data structure will be altered

+
+
+ +

reconstruct data with feature scores satisfying cutoff

+ + +
+
+
+

Parameters:

+
    + +
  • + + criterion + + + (String) + + + + — +

    valid criterion can be '>0.5', '>= 0.4', '==2', '<=1' or '<0.2'

    +
    + +
  • + +
  • + + my_scores + + + (Hash) + + + (defaults to: nil) + + + — +

    user customized feature scores

    +
    + +
  • + +
+ +

Returns:

+
    + +
  • + + + (Hash) + + + + — +

    data after feature selection

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+267
+268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+
+
# File 'lib/frank/base.rb', line 267
+
+def select_data_by_score!(criterion, my_scores=nil)
+  # user scores or internal scores
+  scores = my_scores || get_feature_scores
+  
+  my_data = {}
+  
+  each_sample do |k, s|
+    my_data[k] ||= []
+    my_s = {}
+    
+    s.each do |f, v|
+      my_s[f] = v if eval("#{scores[f][:BEST]} #{criterion}")
+    end
+    
+    my_data[k] << my_s if not my_s.empty?
+  end
+      
+  set_data(my_data)
+end
+
+
+ +
+

+ + - (Object) set_classes(classes) + + + +

+
+

set classes

+ + +
+
+
+ + +
+ + + + +
+
+
+
+84
+85
+86
+87
+88
+89
+90
+91
+
+
# File 'lib/frank/base.rb', line 84
+
+def set_classes(classes)
+  if classes and classes.class == Array
+    @classes = classes
+  else
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "classes must be a Array object!"
+  end
+end
+
+
+ +
+

+ + - (Object) set_data(data) + + + +

+
+

set data

+ + +
+
+
+ + +
+ + + + +
+
+
+
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+
+
# File 'lib/frank/base.rb', line 135
+
+def set_data(data)
+  if data and data.class == Hash
+    @data = data
+    # clear
+    @classes, @features, @fvs = nil, nil, nil
+    @scores, @ranks, @sz = nil, nil, nil
+  else
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "data must be a Hash object!"
+  end
+end
+
+
+ +
+

+ + - (Object) set_feature_score(f, k, s) + + + +

+
+

set feature (f) score (f) for class (k)

+ + +
+
+
+ + +
+ + + + +
+
+
+
+225
+226
+227
+228
+229
+
+
# File 'lib/frank/base.rb', line 225
+
+def set_feature_score(f, k, s)
+  @scores ||= {}
+  @scores[f] ||= {}
+  @scores[f][k] = s
+end
+
+
+ +
+

+ + - (Object) set_features(features) + + + +

+
+

set features

+ + +
+
+
+ + +
+ + + + +
+
+
+
+119
+120
+121
+122
+123
+124
+125
+126
+
+
# File 'lib/frank/base.rb', line 119
+
+def set_features(features)
+  if features and features.class == Array
+    @features = features
+  else
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "features must be a Array object!"
+  end
+end
+
+
+ +
+

+ + - (Object) set_opt(key, value) + + + +

+
+

set non-data information as a key-value pair

+ + +
+
+
+ + +
+ + + + +
+
+
+
+155
+156
+157
+
+
# File 'lib/frank/base.rb', line 155
+
+def set_opt(key, value)
+  @opts[key] = value
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/BaseContinuous.html b/doc/FRank/BaseContinuous.html new file mode 100644 index 0000000..0710019 --- /dev/null +++ b/doc/FRank/BaseContinuous.html @@ -0,0 +1,250 @@ + + + + + + Class: FRank::BaseContinuous + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::BaseContinuous + + + +

+ +
+ +
Inherits:
+
+ Base + +
    +
  • Object
  • + + + + + +
+ show all + +
+ + + + + + +
Includes:
+
Discretilizer, Normalizer
+ + + + + +
Defined in:
+
lib/frank/algo_continuous/base_continuous.rb
+ +
+
+ +

Overview

+
+

base ranking algorithm for handling continous feature

+ + +
+
+
+ + +
+

Direct Known Subclasses

+

PMetric, ReliefF_c, Relief_c, TScore

+
+ + + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (BaseContinuous) initialize(data = nil) + + + +

+
+

initialize from an existing data structure

+ + +
+
+
+ + +
+ + + + +
+
+
+
+17
+18
+19
+
+
# File 'lib/frank/algo_continuous/base_continuous.rb', line 17
+
+def initialize(data=nil)
+  super(data)
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/BaseDiscrete.html b/doc/FRank/BaseDiscrete.html new file mode 100644 index 0000000..7a0032a --- /dev/null +++ b/doc/FRank/BaseDiscrete.html @@ -0,0 +1,246 @@ + + + + + + Class: FRank::BaseDiscrete + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::BaseDiscrete + + + +

+ +
+ +
Inherits:
+
+ Base + +
    +
  • Object
  • + + + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/base_discrete.rb
+ +
+
+ +

Overview

+
+

base ranking alogrithm for handling discrete feature

+ +
2 x 2 contingency table
+
+      c   c'
+    ---------
+ f  | A | B | A+B
+    |---|---| 
+ f' | C | D | C+D
+    ---------
+     A+C B+D  N = A+B+C+D
+
+ P(f)     = (A+B)/N
+ P(f')    = (C+D)/N
+ P(c)     = (A+C)/N
+ P(c')    = (B+D)/N
+ P(f,c)   = A/N
+ P(f,c')  = B/N
+ P(f',c)  = C/N
+ P(f',c') = D/N
+
+ + +
+
+
+ + +
+

Direct Known Subclasses

+

Accuracy, AccuracyBalanced, BiNormalSeparation, ChiSquaredTest, CorrelationCoefficient, DocumentFrequency, F1Measure, FishersExactTest, GMean, GSSCoefficient, GiniIndex, InformationGain, InformationGain_d, MatthewsCorrelationCoefficient, McNemarsTest, MutualInformation, OddsRatio, OddsRatioNumerator, Power, Precision, ProbabilityRatio, Random, ReliefF_d, Relief_d, Sensitivity, Specificity

+
+ + + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (BaseDiscrete) initialize(data = nil) + + + +

+
+

initialize from an existing data structure

+ + +
+
+
+ + +
+ + + + +
+
+
+
+29
+30
+31
+
+
# File 'lib/frank/algo_discrete/base_discrete.rb', line 29
+
+def initialize(data=nil)
+  super(data)
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/BaseMixed.html b/doc/FRank/BaseMixed.html new file mode 100644 index 0000000..b12c1ed --- /dev/null +++ b/doc/FRank/BaseMixed.html @@ -0,0 +1,222 @@ + + + + + + Class: FRank::BaseMixed + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::BaseMixed + + + +

+ +
+ +
Inherits:
+
+ Base + +
    +
  • Object
  • + + + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_mixed/base_mixed.rb
+ +
+
+ +

Overview

+
+

base class for handling feature of mixed data

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (BaseMixed) initialize(data = nil) + + + +

+
+

initialize from an existing data structure

+ + +
+
+
+ + +
+ + + + +
+
+
+
+10
+11
+12
+
+
# File 'lib/frank/algo_mixed/base_mixed.rb', line 10
+
+def initialize(data=nil)
+  super(data)
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/BiNormalSeparation.html b/doc/FRank/BiNormalSeparation.html new file mode 100644 index 0000000..66d06eb --- /dev/null +++ b/doc/FRank/BiNormalSeparation.html @@ -0,0 +1,191 @@ + + + + + + Class: FRank::BiNormalSeparation + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::BiNormalSeparation + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + +
Includes:
+
Rubystats
+ + + + + +
Defined in:
+
lib/frank/algo_discrete/BiNormalSeparation.rb
+ +
+
+ +

Overview

+
+

Bi-Normal Separation (BNS)

+ +
BNS = |F'(tpr) - F'(fpr)|
+
+where F' is normal inverse cumulative distribution function
+R executable is required to calculate qnorm, i.e. F'(x)
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification + and Rubystats

+ + +
+
+
+ + +
+

Constant Summary

+ + + + +

Constant Summary

+ +

Constants included + from Rubystats

+

Rubystats::MAX_VALUE, Rubystats::SQRT2, Rubystats::SQRT2PI, Rubystats::TWO_PI

+ + + + + + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/ChiSquaredTest.html b/doc/FRank/ChiSquaredTest.html new file mode 100644 index 0000000..3180c84 --- /dev/null +++ b/doc/FRank/ChiSquaredTest.html @@ -0,0 +1,269 @@ + + + + + + Class: FRank::ChiSquaredTest + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::ChiSquaredTest + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/ChiSquaredTest.rb
+ +
+
+ +

Overview

+
+

Chi-Squared test (CHI)

+ +
             N * ( P(f,c) * P(f',c') - P(f,c') * P(f',c) )^2
+ CHI(f,c) = -------------------------------------------------
+                      P(f) * P(f') * P(c) * P(c')
+
+                   N * (A*D - B*C)^2
+          = -------------------------------
+             (A+B) * (C+D) * (A+C) * (B+D)
+
+ +

suitable for large samples and +none of the values of (A, B, C, D) < 5

+ +

ref: Wikipedia + and A Comparative Study on Feature Selection Methods for + Drug Discovery

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (ChiSquaredTest) initialize(correction = nil, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + correction + + + (Boolean) + + + (defaults to: nil) + + + — +

    Yates's continuity correction
    +:yates, Yates's continuity correction

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+30
+31
+32
+33
+
+
# File 'lib/frank/algo_discrete/ChiSquaredTest.rb', line 30
+
+def initialize(correction=nil, data=nil)
+  super(data)
+  @correction = (correction==:yates) ? true : false
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/CorrelationCoefficient.html b/doc/FRank/CorrelationCoefficient.html new file mode 100644 index 0000000..c4fbd7b --- /dev/null +++ b/doc/FRank/CorrelationCoefficient.html @@ -0,0 +1,172 @@ + + + + + + Class: FRank::CorrelationCoefficient + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::CorrelationCoefficient + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/CorrelationCoefficient.rb
+ +
+
+ +

Overview

+
+

Correlation Coefficient (CC), a variant of CHI, +which can be viewed as a one-sided chi-squared metric

+ +
                  sqrt(N) * (A*D - B*C)
+CC(f,c) = --------------------------------------
+           sqrt( (A+B) * (C+D) * (A+C) * (B+D) )
+
+ +

ref: Optimally Combining Positive and Negative Features for + Text Categorization

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/DocumentFrequency.html b/doc/FRank/DocumentFrequency.html new file mode 100644 index 0000000..8b17f5f --- /dev/null +++ b/doc/FRank/DocumentFrequency.html @@ -0,0 +1,169 @@ + + + + + + Class: FRank::DocumentFrequency + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::DocumentFrequency + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/DocumentFrequency.rb
+ +
+
+ +

Overview

+
+

Document Frequency (DF)

+ +
DF = tp+fp = (A+B)
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Ensemble.html b/doc/FRank/Ensemble.html new file mode 100644 index 0000000..bd893ac --- /dev/null +++ b/doc/FRank/Ensemble.html @@ -0,0 +1,911 @@ + + + + + + Class: FRank::Ensemble + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Ensemble + + + +

+ +
+ +
Inherits:
+
+ Base + +
    +
  • Object
  • + + + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/ensemble.rb
+ +
+
+ +

Overview

+
+

select feature by an ensemble of ranking algorithms

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Ensemble) initialize(*algos) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + rankers + + + (Array) + + + + — +

    multiple feature ranking algorithms

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+10
+11
+12
+13
+14
+15
+16
+17
+
+
# File 'lib/frank/ensemble.rb', line 10
+
+def initialize(*algos)
+  super(nil)
+  
+  @algos = []
+  algos.each do |r|
+    @algos << r
+  end
+end
+
+
+ +
+ + +
+

Instance Method Details

+ + +
+

+ + - (Object) by_ave(arr) + + + +

+
+

by average value of an array

+ + +
+
+
+ + +
+ + + + +
+
+
+
+125
+126
+127
+
+
# File 'lib/frank/ensemble.rb', line 125
+
+def by_ave(arr)
+  arr.ave if arr.class == Array
+end
+
+
+ +
+

+ + - (Object) by_max(arr) + + + +

+
+

by max value of an array

+ + +
+
+
+ + +
+ + + + +
+
+
+
+137
+138
+139
+
+
# File 'lib/frank/ensemble.rb', line 137
+
+def by_max(arr)
+  arr.max if arr.class == Array
+end
+
+
+ +
+

+ + - (Object) by_min(arr) + + + +

+
+

by min value of an array

+ + +
+
+
+ + +
+ + + + +
+
+
+
+131
+132
+133
+
+
# File 'lib/frank/ensemble.rb', line 131
+
+def by_min(arr)
+  arr.min if arr.class == Array
+end
+
+
+ +
+

+ + - (Object) ensemble_by_rank(by_what = method(:by_min)) + + + +

+
+

ensemble based on rank

+ + +
+
+
+

Parameters:

+
    + +
  • + + by_what + + + (Method) + + + (defaults to: method(:by_min)) + + + — +

    by what criterion that ensemble +rank should be obtained from those of individual algorithms
    +allowed values are:
    +method(:by_min) # by min rank
    +method(:by_max) # by max rank
    +method(:by_ave) # by ave rank

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+102
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+
+
# File 'lib/frank/ensemble.rb', line 102
+
+def ensemble_by_rank(by_what=method(:by_min))
+  ranks = {}
+       
+  each_feature do |f|
+    ranks[f] = by_what.call(
+      @algos.collect { |r| r.get_feature_ranks[f] }
+    )
+  end
+  
+  new_ranks = {}
+  
+  sorted_features = ranks.keys.sort do |x, y|
+    ranks[x] <=> ranks[y]
+  end
+  sorted_features.each_with_index do |sf, si|
+    new_ranks[sf] = si+1
+  end
+  
+  @ranks = new_ranks
+end
+
+
+ +
+

+ + - (Object) ensemble_by_score(by_what = method(:by_max), norm = :min_max) + + + +

+
+ +
+ Note: +

scores from different algos are usually incompatible with +each other, we have to normalize it first

+
+
+ +

ensemble based on score

+ + +
+
+
+

Parameters:

+
    + +
  • + + by_what + + + (Method) + + + (defaults to: method(:by_max)) + + + — +

    by what criterion that ensemble +score should be obtained from those of individual algorithms
    +allowed values are:
    +receiver.method(:by_min) # by min rank
    +receiver.method(:by_max) # by max rank
    +receiver.method(:by_ave) # by ave rank

    +
    + +
  • + +
  • + + norm + + + (Integer) + + + (defaults to: :min_max) + + + — +

    normalization
    +:min_max, score scaled to [0, 1]
    +:zscore, score converted to zscore

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
+
+
# File 'lib/frank/ensemble.rb', line 70
+
+def ensemble_by_score(by_what=method(:by_max), norm=:min_max)
+  @algos.each do |r|
+    if norm == :min_max
+      normalize_min_max!(r)
+    elsif norm == :zscore
+      normalize_zscore!(r)
+    else
+      abort "[#{__FILE__}@#{__LINE__}]: "+
+          "invalid normalizer, only :min_max and :zscore supported!"
+    end
+  end
+  
+  @scores = {}
+  
+  each_feature do |f|
+    @scores[f] = {}
+    @scores[f][:BEST] = by_what.call(
+      @algos.collect { |r| r.get_feature_scores[f][:BEST] }
+    )
+  end      
+end
+
+
+ +
+

+ + - (Object) get_feature_ranks + + + +

+
+

reload get_feature_ranks

+ + +
+
+
+ + +
+ + + + +
+
+
+
+47
+48
+49
+50
+51
+52
+
+
# File 'lib/frank/ensemble.rb', line 47
+
+def get_feature_ranks
+  return @ranks if @ranks
+  
+  abort "[#{__FILE__}@#{__LINE__}]: "+
+          "please call one consensus ranking method first!"
+end
+
+
+ +
+

+ + - (Object) get_feature_scores + + + +

+
+

reload get_feature_scores

+ + +
+
+
+ + +
+ + + + +
+
+
+
+36
+37
+38
+39
+40
+41
+
+
# File 'lib/frank/ensemble.rb', line 36
+
+def get_feature_scores
+  return @scores if @scores
+  
+  abort "[#{__FILE__}@#{__LINE__}]: "+
+          "please call one consensus scoring method first!"
+end
+
+
+ +
+

+ + - (Object) set_data(data) + + + +

+
+ +
+ Note: +

all algos share the same data structure

+
+
+ +

reload set_data

+ + +
+
+
+ + +
+ + + + +
+
+
+
+25
+26
+27
+28
+29
+30
+
+
# File 'lib/frank/ensemble.rb', line 25
+
+def set_data(data)
+  @data = data
+  @algos.each do |r|
+    r.set_data(data)
+  end
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/F1Measure.html b/doc/FRank/F1Measure.html new file mode 100644 index 0000000..6499964 --- /dev/null +++ b/doc/FRank/F1Measure.html @@ -0,0 +1,175 @@ + + + + + + Class: FRank::F1Measure + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::F1Measure + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/F1Measure.rb
+ +
+
+ +

Overview

+
+

F1-Measure (F1)

+ +
      2 * recall * precision
+F1 = ------------------------
+         recall + precison
+
+           2 * tp               2 * A
+   = ------------------- = --------------
+      tp + fn + tp + fp     A + C + A + B
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/FishersExactTest.html b/doc/FRank/FishersExactTest.html new file mode 100644 index 0000000..f380eef --- /dev/null +++ b/doc/FRank/FishersExactTest.html @@ -0,0 +1,191 @@ + + + + + + Class: FRank::FishersExactTest + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::FishersExactTest + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + +
Includes:
+
Rubystats
+ + + + + +
Defined in:
+
lib/frank/algo_discrete/FishersExactTest.rb
+ +
+
+ +

Overview

+
+

(two-sided) Fisher's Exact Test (FET)

+ +
     (A+B)! * (C+D)! * (A+C)! * (B+D)!  
+p =  -----------------------------------
+             A! * B! * C! * D!
+
+for FET, the smaller, the better, but we intentionally negate it
+so that the larger is always the better (consistent with other algorithms)
+
+ +

ref: Wikipedia and Rubystats

+ + +
+
+
+ + +
+

Constant Summary

+ + + + +

Constant Summary

+ +

Constants included + from Rubystats

+

Rubystats::MAX_VALUE, Rubystats::SQRT2, Rubystats::SQRT2PI, Rubystats::TWO_PI

+ + + + + + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/GMean.html b/doc/FRank/GMean.html new file mode 100644 index 0000000..d192e2e --- /dev/null +++ b/doc/FRank/GMean.html @@ -0,0 +1,170 @@ + + + + + + Class: FRank::GMean + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::GMean + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/GMean.rb
+ +
+
+ +

Overview

+
+

GMean (GM)

+ +
GM = sqrt(Sensitivity * Specificity)
+
+                 TP*TN                     A*D
+   = sqrt(------------------) = sqrt(---------------)
+           (TP+FN) * (TN+FP)          (A+C) * (B+D)
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/GSSCoefficient.html b/doc/FRank/GSSCoefficient.html new file mode 100644 index 0000000..200614f --- /dev/null +++ b/doc/FRank/GSSCoefficient.html @@ -0,0 +1,175 @@ + + + + + + Class: FRank::GSSCoefficient + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::GSSCoefficient + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/GSSCoefficient.rb
+ +
+
+ +

Overview

+
+

GSS coefficient (GSS), a simplified variant of Chi-Squared +proposed by Galavotti

+ +
GSS(f,c) = P(f,c) * P(f',c') - P(f,c') * P(f',c)
+
+         = A/N * D/N - B/N * C/N
+
+ +

suitable for large samples and +none of the values of (A, B, C, D) < 5

+ +

ref: A Comparative Study on Feature Selection Methods for Drug + Discovery

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/GiniIndex.html b/doc/FRank/GiniIndex.html new file mode 100644 index 0000000..e0e3b7d --- /dev/null +++ b/doc/FRank/GiniIndex.html @@ -0,0 +1,172 @@ + + + + + + Class: FRank::GiniIndex + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::GiniIndex + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/GiniIndex.rb
+ +
+
+ +

Overview

+
+

Gini Index (GI), generalized for multi-class problem

+ +
GI(f) = 1 - sigma(c)(P(c|f)^2)
+
+ +

for GI, the smaller, the better, but we intentionally negate it +so that the larger is always the better (consistent with other algorithms)

+ +

ref: Advancing Feaure Selection Research - + ASU Feature Selection Repository

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/InformationGain.html b/doc/FRank/InformationGain.html new file mode 100644 index 0000000..d3205ee --- /dev/null +++ b/doc/FRank/InformationGain.html @@ -0,0 +1,172 @@ + + + + + + Class: FRank::InformationGain + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::InformationGain + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/InformationGain.rb
+ +
+
+ +

Overview

+
+

InformationGain (IG), generalized for multi-class problem

+ +
IG(f) = -1 * sigma(c)(P(c)logP(c))
+        + P(f)sigma(c)(P(c|f)logP(c|f))
+        + P(f')sigma(c)(P(c|f')logP(c|f'))
+
+where c = c, c'
+
+ +

ref: A Comparative Study on Feature Selection in Text Categorization

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/InformationGain_d.html b/doc/FRank/InformationGain_d.html new file mode 100644 index 0000000..e0fb68d --- /dev/null +++ b/doc/FRank/InformationGain_d.html @@ -0,0 +1,173 @@ + + + + + + Class: FRank::InformationGain_d + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::InformationGain_d + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/InformationGain_d.rb
+ +
+
+ +

Overview

+
+

Information Gain for feature with discrete data (IG_d)

+ +
IG_d(c,f) = H(c) - H(c|f)
+
+where H(c) = -1 * sigma_i (P(ci) logP(ci))
+      H(c|f) = sigma_j (P(fj)*H(c|fj))
+      H(c|fj) = -1 * sigma_k (P(ck|fj) logP(ck|fj))
+
+ +

ref: Using Information Gain to Analyze and Fine Tune + the Performance of Supply Chain Trading Agents

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/MatthewsCorrelationCoefficient.html b/doc/FRank/MatthewsCorrelationCoefficient.html new file mode 100644 index 0000000..225862f --- /dev/null +++ b/doc/FRank/MatthewsCorrelationCoefficient.html @@ -0,0 +1,174 @@ + + + + + + Class: FRank::MatthewsCorrelationCoefficient + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::MatthewsCorrelationCoefficient + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + +
    +
  • Object
  • + + + + + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/MatthewsCorrelationCoefficient.rb
+ +
+
+ +

Overview

+
+

Matthews Correlation Coefficient (MCC), also known as Phi coefficient

+ +
                       tp*tn - fp*fn
+MCC = ---------------------------------------------- = PHI = sqrt(CHI/N)
+       sqrt((tp+fp) * (tp+fn) * (tn+fp) * (tn+fn) )
+
+                     A*D - B*C
+    = -------------------------------------
+      sqrt((A+B) * (A+C) * (B+D) * (C+D))
+
+ +

ref: Wikipedia

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/McNemarsTest.html b/doc/FRank/McNemarsTest.html new file mode 100644 index 0000000..18fbde1 --- /dev/null +++ b/doc/FRank/McNemarsTest.html @@ -0,0 +1,262 @@ + + + + + + Class: FRank::McNemarsTest + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::McNemarsTest + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/McNemarsTest.rb
+ +
+
+ +

Overview

+
+

McNemar's test (MN), based on Chi-Squared test

+ +
            (B-C)^2
+MN(f, c) = ---------
+             B+C
+
+ +

suitable for large samples and B+C >= 25

+ +

ref: Wikipedia

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (McNemarsTest) initialize(correction = nil, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + correction + + + (Boolean) + + + (defaults to: nil) + + + — +

    correction Yates's continuity correction
    +:yates, Yates's continuity correction

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+22
+23
+24
+25
+
+
# File 'lib/frank/algo_discrete/McNemarsTest.rb', line 22
+
+def initialize(correction=nil, data=nil)
+  super(data)
+  @correction = (correction==:yates) ? true : false
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/MutualInformation.html b/doc/FRank/MutualInformation.html new file mode 100644 index 0000000..149aa3c --- /dev/null +++ b/doc/FRank/MutualInformation.html @@ -0,0 +1,175 @@ + + + + + + Class: FRank::MutualInformation + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::MutualInformation + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/MutualInformation.rb
+ +
+
+ +

Overview

+
+

Mutual Information (MI)

+ +
                  P(f, c)
+MI(f,c) = log2 -------------
+                P(f) * P(c)
+
+                    A * N
+        = log2 ---------------
+                (A+B) * (A+C)
+
+ +

ref: A Comparative Study on Feature Selection Methods for Drug + Discovery

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/OddsRatio.html b/doc/FRank/OddsRatio.html new file mode 100644 index 0000000..93f598f --- /dev/null +++ b/doc/FRank/OddsRatio.html @@ -0,0 +1,176 @@ + + + + + + Class: FRank::OddsRatio + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::OddsRatio + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/OddsRatio.rb
+ +
+
+ +

Overview

+
+

Odds Ratio (Odd)

+ +
           P(f|c) * (1 - P(f|c'))     tpr * (1-fpr)
+Odd(f,c) = ----------------------- = ---------------
+           (1 - P(f|c)) * P(f|c')     (1-tpr) * fpr
+
+            A*D
+         = -----
+            B*C
+
+ +

ref: Wikipedia and An extensive empirical study of feature selection + metrics for text classification and Optimally Combining Positive + and Negative Features for Text Categorization

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/OddsRatioNumerator.html b/doc/FRank/OddsRatioNumerator.html new file mode 100644 index 0000000..250bfe1 --- /dev/null +++ b/doc/FRank/OddsRatioNumerator.html @@ -0,0 +1,173 @@ + + + + + + Class: FRank::OddsRatioNumerator + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::OddsRatioNumerator + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/OddsRatioNumerator.rb
+ +
+
+ +

Overview

+
+

Odds Ratio Numerator (OddN)

+ +
OddN(f,c) = P(f|c) * (1 - P(f|c')) =  tpr * (1-fpr)
+
+              A           B           A*D
+          = ---- * (1 - ----) = ---------------
+             A+C         B+D     (A+C) * (B+D)
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/PMetric.html b/doc/FRank/PMetric.html new file mode 100644 index 0000000..b035700 --- /dev/null +++ b/doc/FRank/PMetric.html @@ -0,0 +1,197 @@ + + + + + + Class: FRank::PMetric + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::PMetric + + + +

+ +
+ +
Inherits:
+
+ BaseContinuous + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_continuous/PMetric.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

PM applicable only to two-class problems

+
+
+ +

P-Metric (PM) for continous feature

+ +
            |u1 - u2|
+PM(f) = -----------------
+         sigma1 + sigma2
+
+ +

ref: Filter versus wrapper gene selection approaches

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseContinuous

+

#initialize

+ + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseContinuous

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Power.html b/doc/FRank/Power.html new file mode 100644 index 0000000..5076a55 --- /dev/null +++ b/doc/FRank/Power.html @@ -0,0 +1,262 @@ + + + + + + Class: FRank::Power + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Power + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/Power.rb
+ +
+
+ +

Overview

+
+

Power (pow)

+ +
Pow = (1-fpr)^k - (1-tpr)^k
+
+    = (1-B/(B+D))^k - (1-A/(A+C))^k
+
+    = (D/(B+D))^k - (C/(A+C))^k
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Power) initialize(k = 5, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + k + + + (Integer) + + + (defaults to: 5) + + + — +

    power

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+24
+25
+26
+27
+
+
# File 'lib/frank/algo_discrete/Power.rb', line 24
+
+def initialize(k=5, data=nil)
+  super(data)
+  @k = k
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Precision.html b/doc/FRank/Precision.html new file mode 100644 index 0000000..d95165d --- /dev/null +++ b/doc/FRank/Precision.html @@ -0,0 +1,168 @@ + + + + + + Class: FRank::Precision + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Precision + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/Precision.rb
+ +
+
+ +

Overview

+
+

Precision

+ +
              TP        A
+Precision = ------- = -----
+             TP+FP     A+B
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/ProbabilityRatio.html b/doc/FRank/ProbabilityRatio.html new file mode 100644 index 0000000..98460d4 --- /dev/null +++ b/doc/FRank/ProbabilityRatio.html @@ -0,0 +1,173 @@ + + + + + + Class: FRank::ProbabilityRatio + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::ProbabilityRatio + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/ProbabilityRatio.rb
+ +
+
+ +

Overview

+
+

Probability Ratio (PR)

+ +
PR = tpr / fpr
+
+      A/(A+C)    A * (B+D)
+   = -------- = -----------
+      B/(B+D)    (A+C) * B
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Random.html b/doc/FRank/Random.html new file mode 100644 index 0000000..708d72c --- /dev/null +++ b/doc/FRank/Random.html @@ -0,0 +1,259 @@ + + + + + + Class: FRank::Random + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Random + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/Random.rb
+ +
+
+ +

Overview

+
+

Random (Rand), no pratical use but can be used as a baseline

+ +

Rand = rand numbers within [0..1)

+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Random) initialize(seed = nil, data = nil) + + + +

+
+

initialize from an existing data structure

+ + +
+
+
+

Parameters:

+
    + +
  • + + seed + + + (Integer) + + + (defaults to: nil) + + + — +

    seed form random number +generator. provided for reproducible results, +otherwise use current time as seed

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+22
+23
+24
+25
+
+
# File 'lib/frank/algo_discrete/Random.rb', line 22
+
+def initialize(seed=nil, data=nil)
+  super(data)
+  srand(seed) if seed
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/ReliefF_c.html b/doc/FRank/ReliefF_c.html new file mode 100644 index 0000000..35872e2 --- /dev/null +++ b/doc/FRank/ReliefF_c.html @@ -0,0 +1,319 @@ + + + + + + Class: FRank::ReliefF_c + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::ReliefF_c + + + +

+ +
+ +
Inherits:
+
+ BaseContinuous + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_continuous/ReliefF_c.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

applicable to multi-class problem with missing data

+
+
+ +

extended Relief algorithm for continuous feature (ReliefF_c)

+ +

ref: Estimating Attributes: Analysis and Extensions of RELIEF

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (ReliefF_c) initialize(m = nil, k = 10, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + m + + + (Integer) + + + (defaults to: nil) + + + — +

    number of samples to be used +for estimating feature contribution. max can be +the number of training samples

    +
    + +
  • + +
  • + + k + + + (Integer) + + + (defaults to: 10) + + + — +

    number of k-nearest neighbor

    +
    + +
  • + +
  • + + data + + + (Hash) + + + (defaults to: nil) + + + — +

    existing data structure

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+23
+24
+25
+26
+27
+
+
# File 'lib/frank/algo_continuous/ReliefF_c.rb', line 23
+
+def initialize(m=nil, k=10, data=nil)
+  super(data)
+  @m = m # use all samples
+  @k = (k || 10)  # default 10
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/ReliefF_d.html b/doc/FRank/ReliefF_d.html new file mode 100644 index 0000000..2d91a7c --- /dev/null +++ b/doc/FRank/ReliefF_d.html @@ -0,0 +1,299 @@ + + + + + + Class: FRank::ReliefF_d + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::ReliefF_d + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/ReliefF_d.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

applicable to multi-class problem with missing data

+
+
+ +

extended Relief algorithm for discrete feature (ReliefF_d)

+ +

ref: Estimating Attributes: Analysis and Extensions of RELIEF

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (ReliefF_d) initialize(m = nil, k = 10, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + m + + + (Integer) + + + (defaults to: nil) + + + — +

    number of samples to be used +for estimating feature contribution. max can be +the number of training samples

    +
    + +
  • + +
  • + + k + + + (Integer) + + + (defaults to: 10) + + + — +

    number of k-nearest neighbor

    +
    + +
  • + +
  • + + data + + + (Hash) + + + (defaults to: nil) + + + — +

    existing data structure

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+22
+23
+24
+25
+26
+
+
# File 'lib/frank/algo_discrete/ReliefF_d.rb', line 22
+
+def initialize(m=nil, k=10, data=nil)
+  super(data)
+  @m = m # use all samples
+  @k = (k || 10)  # default 10
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Relief_c.html b/doc/FRank/Relief_c.html new file mode 100644 index 0000000..d49095f --- /dev/null +++ b/doc/FRank/Relief_c.html @@ -0,0 +1,301 @@ + + + + + + Class: FRank::Relief_c + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Relief_c + + + +

+ +
+ +
Inherits:
+
+ BaseContinuous + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_continuous/Relief_c.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

Relief applicable only to two-class problem without missing data

+
+
+ +

Relief algorithm for continuous feature (Relief_c)

+ +

ref: The Feature Selection Problem: Traditional Methods + and a New Algorithm

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Relief_c) initialize(m = nil, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + m + + + (Integer) + + + (defaults to: nil) + + + — +

    number of samples to be used +for estimating feature contribution. max can be +the number of training samples

    +
    + +
  • + +
  • + + data + + + (Hash) + + + (defaults to: nil) + + + — +

    existing data structure

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+23
+24
+25
+26
+
+
# File 'lib/frank/algo_continuous/Relief_c.rb', line 23
+
+def initialize(m=nil, data=nil)
+  super(data)
+  @m = m # default use all samples
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Relief_d.html b/doc/FRank/Relief_d.html new file mode 100644 index 0000000..4f746d5 --- /dev/null +++ b/doc/FRank/Relief_d.html @@ -0,0 +1,281 @@ + + + + + + Class: FRank::Relief_d + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Relief_d + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/Relief_d.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

Relief applicable only to two-class problem without missing data

+
+
+ +

Relief algorithm for discrete feature (Relief_d)

+ +

ref: The Feature Selection Problem: Traditional Methods + and a New Algorithm

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Relief_d) initialize(m = nil, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + m + + + (Integer) + + + (defaults to: nil) + + + — +

    number of samples to be used +for estimating feature contribution. max can be +the number of training samples

    +
    + +
  • + +
  • + + data + + + (Hash) + + + (defaults to: nil) + + + — +

    existing data structure

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+23
+24
+25
+26
+
+
# File 'lib/frank/algo_discrete/Relief_d.rb', line 23
+
+def initialize(m=nil, data=nil)
+  super(data)
+  @m = m # default use all samples
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Sensitivity.html b/doc/FRank/Sensitivity.html new file mode 100644 index 0000000..df62ded --- /dev/null +++ b/doc/FRank/Sensitivity.html @@ -0,0 +1,168 @@ + + + + + + Class: FRank::Sensitivity + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Sensitivity + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/Sensitivity.rb
+ +
+
+ +

Overview

+
+

Sensitivity (SN), also known as Recall

+ +
        TP        A
+SN  = ------- = -----
+       TP+FN     A+C
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/Specificity.html b/doc/FRank/Specificity.html new file mode 100644 index 0000000..0393cf2 --- /dev/null +++ b/doc/FRank/Specificity.html @@ -0,0 +1,168 @@ + + + + + + Class: FRank::Specificity + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::Specificity + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_discrete/Specificity.rb
+ +
+
+ +

Overview

+
+

Specificity (SP)

+ +
        TN        D
+SP  = ------- = -----
+       TN+FP     B+D
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FRank/TScore.html b/doc/FRank/TScore.html new file mode 100644 index 0000000..6ce3387 --- /dev/null +++ b/doc/FRank/TScore.html @@ -0,0 +1,197 @@ + + + + + + Class: FRank::TScore + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FRank::TScore + + + +

+ +
+ +
Inherits:
+
+ BaseContinuous + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/frank/algo_continuous/TScore.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

TS applicable only to two-class problems

+
+
+ +

t-score (TS) based on Student's t-test for continous feature

+ +
                       |u1 - u2|
+TS(f) = --------------------------------------------
+         sqrt((n1*sigma1^2 + n_2*sigma2^2)/(n1+n2))
+
+ +

ref: Filter versus wrapper gene selection approaches

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseContinuous

+

#initialize

+ + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FRank::BaseContinuous

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector.html b/doc/FSelector.html new file mode 100644 index 0000000..70bf6ab --- /dev/null +++ b/doc/FSelector.html @@ -0,0 +1,502 @@ + + + + + + Module: FSelector + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Module: FSelector + + + +

+ +
+ + + + + + + + +
Defined in:
+
lib/fselector.rb,
+ lib/fselector/base.rb,
lib/fselector/ensemble.rb,
lib/fselector/base_discrete.rb,
lib/fselector/base_continuous.rb,
lib/fselector/algo_discrete/GMean.rb,
lib/fselector/algo_discrete/Power.rb,
lib/fselector/algo_discrete/Random.rb,
lib/fselector/algo_discrete/Accuracy.rb,
lib/fselector/algo_continuous/TScore.rb,
lib/fselector/algo_discrete/Relief_d.rb,
lib/fselector/algo_continuous/PMetric.rb,
lib/fselector/algo_discrete/GiniIndex.rb,
lib/fselector/algo_discrete/ReliefF_d.rb,
lib/fselector/algo_discrete/OddsRatio.rb,
lib/fselector/algo_discrete/Precision.rb,
lib/fselector/algo_discrete/F1Measure.rb,
lib/fselector/algo_continuous/Relief_c.rb,
lib/fselector/algo_discrete/Specificity.rb,
lib/fselector/algo_discrete/Sensitivity.rb,
lib/fselector/algo_continuous/ReliefF_c.rb,
lib/fselector/algo_discrete/McNemarsTest.rb,
lib/fselector/algo_discrete/ChiSquaredTest.rb,
lib/fselector/algo_discrete/GSSCoefficient.rb,
lib/fselector/algo_discrete/InformationGain.rb,
lib/fselector/algo_discrete/FishersExactTest.rb,
lib/fselector/algo_discrete/ProbabilityRatio.rb,
lib/fselector/algo_discrete/AccuracyBalanced.rb,
lib/fselector/algo_discrete/DocumentFrequency.rb,
lib/fselector/algo_discrete/MutualInformation.rb,
lib/fselector/algo_discrete/OddsRatioNumerator.rb,
lib/fselector/algo_discrete/BiNormalSeparation.rb,
lib/fselector/algo_discrete/CorrelationCoefficient.rb,
lib/fselector/algo_discrete/MatthewsCorrelationCoefficient.rb
+
+ +
+
+ +

Overview

+
+

FSelector: a Ruby gem for feature selection and ranking

+ + +
+
+
+ + +

Defined Under Namespace

+

+ + + + + Classes: Accuracy, AccuracyBalanced, Base, BaseContinuous, BaseDiscrete, BiNormalSeparation, ChiSquaredTest, CorrelationCoefficient, DocumentFrequency, Ensemble, F1Measure, FishersExactTest, GMean, GSSCoefficient, GiniIndex, InformationGain, MatthewsCorrelationCoefficient, McNemarsTest, MutualInformation, OddsRatio, OddsRatioNumerator, PMetric, Power, Precision, ProbabilityRatio, Random, ReliefF_c, ReliefF_d, Relief_c, Relief_d, Sensitivity, Specificity, TScore + + +

+ +

Constant Summary

+ +
+ +
VERSION = +
+
+

module version

+ + +
+
+
+ + +
+
+
'0.1.0'
+ +
GM = +
+
+

shortcut so that you can use FSelector::GM instead of FSelector::GMean

+ + +
+
+
+ + +
+
+
GMean
+ +
Acc = +
+
+

shortcut so that you can use FSelector::Acc instead of FSelector::Accuracy

+ + +
+
+
+ + +
+
+
Accuracy
+ +
TS = +
+
+

shortcut so that you can use FSelector::TS instead of FSelector::TScore

+ + +
+
+
+ + +
+
+
TScore
+ +
PM = +
+
+

shortcut so that you can use FSelector::PM instead of FSelector::PMetric

+ + +
+
+
+ + +
+
+
PMetric
+ +
GI = +
+
+

shortcut so that you can use FSelector::GI instead of FSelector::GiniIndex

+ + +
+
+
+ + +
+
+
GiniIndex
+ +
Odd = +
+
+

shortcut so that you can use FSelector::Odd instead of FSelector::OddsRatio

+ + +
+
+
+ + +
+
+
OddsRatio
+ +
F1 = +
+
+

shortcut so that you can use FSelector::F1 instead of FSelector::F1Measure

+ + +
+
+
+ + +
+
+
F1Measure
+ +
SP = +
+
+

shortcut so that you can use FSelector::SP instead of FSelector::Specificity

+ + +
+
+
+ + +
+
+
Specificity
+ +
SN = +
+
+

shortcut so that you can use FSelector::SN instead of FSelector::Sensitivity

+ + +
+
+
+ + +
+
+
Sensitivity
+ +
Recall = +
+
+

Sensitivity, also known as Recall

+ + +
+
+
+ + +
+
+
Sensitivity
+ +
MNT = +
+
+

shortcut so that you can use FSelector::MNT instead of FSelector::McNemarsTest

+ + +
+
+
+ + +
+
+
McNemarsTest
+ +
CHI = +
+
+

shortcut so that you can use FSelector::CHI instead of FSelector::ChiSquaredTest

+ + +
+
+
+ + +
+
+
ChiSquaredTest
+ +
GSS = +
+
+

shortcut so that you can use FSelector::GSS instead of FSelector::GSSCoefficient

+ + +
+
+
+ + +
+
+
GSSCoefficient
+ +
IG = +
+
+

shortcut so that you can use FSelector::IG instead of FSelector::InformationGain

+ + +
+
+
+ + +
+
+
InformationGain
+ +
FET = +
+
+

shortcut so that you can use FSelector::FET instead of FSelector::FishersExactTest

+ + +
+
+
+ + +
+
+
FishersExactTest
+ +
PR = +
+
+

shortcut so that you can use FSelector::PR instead of FSelector::ProbabilityRatio

+ + +
+
+
+ + +
+
+
ProbabilityRatio
+ +
Acc2 = +
+
+

shortcut so that you can use FSelector::Acc2 instead of FSelector::AccuracyBalanced

+ + +
+
+
+ + +
+
+
AccuracyBalanced
+ +
DF = +
+
+

shortcut so that you can use FSelector::DF instead of FSelector::DocumentFrequency

+ + +
+
+
+ + +
+
+
DocumentFrequency
+ +
MI = +
+
+

shortcut so that you can use FSelector::MI instead of FSelector::MutualInformation

+ + +
+
+
+ + +
+
+
MutualInformation
+ +
OddN = +
+
+

shortcut so that you can use FSelector::OddN instead of FSelector::OddsRatioNumerator

+ + +
+
+
+ + +
+
+
OddsRatioNumerator
+ +
BNS = +
+
+

shortcut so that you can use FSelector::BNS instead of FSelector::BiNormalSeparation

+ + +
+
+
+ + +
+
+
BiNormalSeparation
+ +
CC = +
+
+

shortcut so that you can use FSelector::CC instead of FSelector::CorrelationCoefficient

+ + +
+
+
+ + +
+
+
CorrelationCoefficient
+ +
MCC = +
+
+

shortcut so that you can use FSelector::MCC instead of FSelector::MatthewsCorrelationCoefficient

+ + +
+
+
+ + +
+
+
MatthewsCorrelationCoefficient
+ +
PHI = +
+
+

Matthews Correlation Coefficient (MCC), also known as Phi coefficient

+ + +
+
+
+ + +
+
+
MatthewsCorrelationCoefficient
+ +
+ + + + + + + + + + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Accuracy.html b/doc/FSelector/Accuracy.html new file mode 100644 index 0000000..ae524b0 --- /dev/null +++ b/doc/FSelector/Accuracy.html @@ -0,0 +1,168 @@ + + + + + + Class: FSelector::Accuracy + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Accuracy + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/Accuracy.rb
+ +
+
+ +

Overview

+
+

Accuracy (Acc)

+ +
          tp+tn          A+D
+Acc = ------------- = ---------
+       tp+fn+tn+fp     A+B+C+D
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/AccuracyBalanced.html b/doc/FSelector/AccuracyBalanced.html new file mode 100644 index 0000000..fe2bf3a --- /dev/null +++ b/doc/FSelector/AccuracyBalanced.html @@ -0,0 +1,169 @@ + + + + + + Class: FSelector::AccuracyBalanced + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::AccuracyBalanced + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/AccuracyBalanced.rb
+ +
+
+ +

Overview

+
+

Accuracy Balanced (Acc2)

+ +
Acc2 = |tpr - fpr| = |A/(A+C) - B/(B+D)|
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Base.html b/doc/FSelector/Base.html new file mode 100644 index 0000000..8a5cd0b --- /dev/null +++ b/doc/FSelector/Base.html @@ -0,0 +1,1891 @@ + + + + + + Class: FSelector::Base + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Base + + + +

+ +
+ +
Inherits:
+
+ Object + +
    +
  • Object
  • + + + +
+ show all + +
+ + + + + + +
Includes:
+
FileIO
+ + + + + +
Defined in:
+
lib/fselector/base.rb
+ +
+
+ +

Overview

+
+

base ranking algorithm

+ + +
+
+
+ + +
+

Direct Known Subclasses

+

BaseContinuous, BaseDiscrete, Ensemble

+
+ + + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Base) initialize(data = nil) + + + +

+
+

initialize from an existing data structure

+ + +
+
+
+ + +
+ + + + +
+
+
+
+13
+14
+15
+16
+
+
# File 'lib/fselector/base.rb', line 13
+
+def initialize(data=nil)
+  @data = data
+  @opts = {} # store non-data information
+end
+
+
+ +
+ + +
+

Instance Method Details

+ + +
+

+ + - (Object) each_class + + + +

+
+

iterator for each class

+ +
e.g.
+self.each_class do |k|
+  puts k
+end
+
+ + +
+
+
+ + +
+ + + + +
+
+
+
+27
+28
+29
+30
+31
+32
+33
+34
+
+
# File 'lib/fselector/base.rb', line 27
+
+def each_class
+  if not block_given?
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "block must be given!"
+  else
+    get_classes.each { |k| yield k }
+  end
+end
+
+
+ +
+

+ + - (Object) each_feature + + + +

+
+

iterator for each feature

+ +
e.g.
+self.each_feature do |f|
+  puts f
+end
+
+ + +
+
+
+ + +
+ + + + +
+
+
+
+45
+46
+47
+48
+49
+50
+51
+52
+
+
# File 'lib/fselector/base.rb', line 45
+
+def each_feature
+  if not block_given?
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "block must be given!"
+  else
+    get_features.each { |f| yield f }
+  end
+end
+
+
+ +
+

+ + - (Object) each_sample + + + +

+
+

iterator for each sample with class label

+ +
e.g.
+self.each_sample do |k, s|
+  print k
+  s.each { |f, v| ' '+v }
+  puts
+end
+
+ + +
+
+
+ + +
+ + + + +
+
+
+
+65
+66
+67
+68
+69
+70
+71
+72
+73
+74
+
+
# File 'lib/fselector/base.rb', line 65
+
+def each_sample
+  if not block_given?
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          " block must be given!"
+  else      
+    get_data.each do |k, samples|
+      samples.each { |s| yield k, s }
+    end
+  end
+end
+
+
+ +
+

+ + - (Object) get_classes + + + +

+
+

get classes

+ + +
+
+
+ + +
+ + + + +
+
+
+
+78
+79
+80
+
+
# File 'lib/fselector/base.rb', line 78
+
+def get_classes
+  @classes ||= @data.keys
+end
+
+
+ +
+

+ + - (Object) get_data + + + +

+
+

get data

+ + +
+
+
+ + +
+ + + + +
+
+
+
+130
+131
+132
+
+
# File 'lib/fselector/base.rb', line 130
+
+def get_data
+  @data
+end
+
+
+ +
+

+ + - (Hash) get_feature_ranks + + + +

+
+

get the ranked features based on their best scores

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Hash) + + + + — +

    feature ranks

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+236
+237
+238
+239
+240
+241
+242
+243
+244
+245
+246
+247
+248
+249
+250
+251
+252
+253
+254
+
+
# File 'lib/fselector/base.rb', line 236
+
+def get_feature_ranks
+  return @ranks if @ranks # already done
+  
+  scores = get_feature_scores
+  
+  # get the ranked features
+  @ranks = {} # feature => rank
+  
+  # the larger, the better
+  sorted_features = scores.keys.sort do |x,y|
+    scores[y][:BEST] <=> scores[x][:BEST]
+  end
+  
+  sorted_features.each_with_index do |sf, si|
+    @ranks[sf] = si+1
+  end
+  
+  @ranks
+end
+
+
+ +
+

+ + - (Hash) get_feature_scores + + + +

+
+

get scores of all features for all classes

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Hash) + + + + — +

    { feature => +{ class1 => score1, class2 => score2, :BEST => score_best } }

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+206
+207
+208
+209
+210
+211
+212
+213
+214
+215
+216
+217
+218
+219
+220
+221
+
+
# File 'lib/fselector/base.rb', line 206
+
+def get_feature_scores
+  return @scores if @scores # already done
+  
+  each_feature do |f|
+    calc_contribution(f)
+  end
+  
+  # best score for feature
+  @scores.each do |f, ks|
+    # the larger, the better
+    @scores[f][:BEST] = ks.values.max
+  end
+  #@scores.each { |x,v| puts "#{x} => #{v[:BEST]}" }
+  
+  @scores
+end
+
+
+ +
+

+ + - (Object) get_feature_values(f) + + + +

+
+

get feature values

+ + +
+
+
+

Parameters:

+
    + +
  • + + f + + + (Symbol) + + + + — +

    feature of interest

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+
+
# File 'lib/fselector/base.rb', line 105
+
+def get_feature_values(f)
+  @fvs ||= {}
+  
+  if not @fvs.has_key? f
+    @fvs[f] = []
+    each_sample do |k, s|
+      @fvs[f] << s[f] if s.has_key? f
+    end
+  end
+  
+  @fvs[f]
+end
+
+
+ +
+

+ + - (Object) get_features + + + +

+
+

get unique features

+ + +
+
+
+ + +
+ + + + +
+
+
+
+95
+96
+97
+
+
# File 'lib/fselector/base.rb', line 95
+
+def get_features
+  @features ||= @data.map { |x| x[1].map { |y| y.keys } }.flatten.uniq
+end
+
+
+ +
+

+ + - (Object) get_opt(key) + + + +

+
+

get non-data information

+ + +
+
+
+ + +
+ + + + +
+
+
+
+149
+150
+151
+
+
# File 'lib/fselector/base.rb', line 149
+
+def get_opt(key)
+  @opts.has_key?(key) ? @opts[key] : nil
+end
+
+
+ +
+

+ + - (Object) get_sample_size + + + +

+
+

number of samples

+ + +
+
+
+ + +
+ + + + +
+
+
+
+161
+162
+163
+
+
# File 'lib/fselector/base.rb', line 161
+
+def get_sample_size
+  @sz ||= get_data.values.flatten.size
+end
+
+
+ +
+
+
+

print feature ranks

+ + +
+
+
+ + +
+ + + + +
+
+
+
+191
+192
+193
+194
+195
+196
+197
+
+
# File 'lib/fselector/base.rb', line 191
+
+def print_feature_ranks
+  ranks = get_feature_ranks
+  
+  ranks.each do |f, r|
+    puts "#{f} => #{r}"
+  end
+end
+
+
+ +
+
+
+

print feature scores

+ + +
+
+
+

Parameters:

+
    + +
  • + + kclass + + + (String) + + + (defaults to: nil) + + + — +

    class of interest

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+
+
# File 'lib/fselector/base.rb', line 171
+
+def print_feature_scores(feat=nil, kclass=nil)
+  scores = get_feature_scores
+  
+  scores.each do |f, ks|
+    next if feat and feat != f
+    
+    print "#{f} =>"
+    ks.each do |k, s|
+      if kclass
+        print " #{k}->#{s}" if k == kclass
+      else
+        print " #{k}->#{s}"
+      end
+    end
+    puts
+  end
+end
+
+
+ +
+

+ + - (Hash) select_data_by_rank!(criterion, my_ranks = nil) + + + +

+
+ +
+ Note: +

data structure will be altered

+
+
+ +

reconstruct data by rank

+ + +
+
+
+

Parameters:

+
    + +
  • + + criterion + + + (String) + + + + — +

    valid criterion can be '>11', '>= 10', '==1', '<=10' or '<20'

    +
    + +
  • + +
  • + + my_ranks + + + (Hash) + + + (defaults to: nil) + + + — +

    user customized feature ranks

    +
    + +
  • + +
+ +

Returns:

+
    + +
  • + + + (Hash) + + + + — +

    data after feature selection

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+298
+299
+300
+301
+302
+303
+304
+305
+306
+307
+308
+309
+310
+311
+312
+313
+314
+315
+316
+
+
# File 'lib/fselector/base.rb', line 298
+
+def select_data_by_rank!(criterion, my_ranks=nil)
+  # user ranks or internal ranks
+  ranks = my_ranks || get_feature_ranks
+  
+  my_data = {}
+  
+  each_sample do |k, s|
+    my_data[k] ||= []
+    my_s = {}
+    
+    s.each do |f,v|
+      my_s[f] = v if eval("#{ranks[f]} #{criterion}")
+    end
+    
+    my_data[k] << my_s if not my_s.empty?
+  end
+  
+  set_data(my_data)
+end
+
+
+ +
+

+ + - (Hash) select_data_by_score!(criterion, my_scores = nil) + + + +

+
+ +
+ Note: +

data structure will be altered

+
+
+ +

reconstruct data with feature scores satisfying cutoff

+ + +
+
+
+

Parameters:

+
    + +
  • + + criterion + + + (String) + + + + — +

    valid criterion can be '>0.5', '>= 0.4', '==2', '<=1' or '<0.2'

    +
    + +
  • + +
  • + + my_scores + + + (Hash) + + + (defaults to: nil) + + + — +

    user customized feature scores

    +
    + +
  • + +
+ +

Returns:

+
    + +
  • + + + (Hash) + + + + — +

    data after feature selection

    +
    + +
  • + +
+ +
+ + + + +
+
+
+
+267
+268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+
+
# File 'lib/fselector/base.rb', line 267
+
+def select_data_by_score!(criterion, my_scores=nil)
+  # user scores or internal scores
+  scores = my_scores || get_feature_scores
+  
+  my_data = {}
+  
+  each_sample do |k, s|
+    my_data[k] ||= []
+    my_s = {}
+    
+    s.each do |f, v|
+      my_s[f] = v if eval("#{scores[f][:BEST]} #{criterion}")
+    end
+    
+    my_data[k] << my_s if not my_s.empty?
+  end
+      
+  set_data(my_data)
+end
+
+
+ +
+

+ + - (Object) set_classes(classes) + + + +

+
+

set classes

+ + +
+
+
+ + +
+ + + + +
+
+
+
+84
+85
+86
+87
+88
+89
+90
+91
+
+
# File 'lib/fselector/base.rb', line 84
+
+def set_classes(classes)
+  if classes and classes.class == Array
+    @classes = classes
+  else
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "classes must be a Array object!"
+  end
+end
+
+
+ +
+

+ + - (Object) set_data(data) + + + +

+
+

set data

+ + +
+
+
+ + +
+ + + + +
+
+
+
+135
+136
+137
+138
+139
+140
+141
+142
+143
+144
+145
+
+
# File 'lib/fselector/base.rb', line 135
+
+def set_data(data)
+  if data and data.class == Hash
+    @data = data
+    # clear
+    @classes, @features, @fvs = nil, nil, nil
+    @scores, @ranks, @sz = nil, nil, nil
+  else
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "data must be a Hash object!"
+  end
+end
+
+
+ +
+

+ + - (Object) set_feature_score(f, k, s) + + + +

+
+

set feature (f) score (f) for class (k)

+ + +
+
+
+ + +
+ + + + +
+
+
+
+225
+226
+227
+228
+229
+
+
# File 'lib/fselector/base.rb', line 225
+
+def set_feature_score(f, k, s)
+  @scores ||= {}
+  @scores[f] ||= {}
+  @scores[f][k] = s
+end
+
+
+ +
+

+ + - (Object) set_features(features) + + + +

+
+

set features

+ + +
+
+
+ + +
+ + + + +
+
+
+
+119
+120
+121
+122
+123
+124
+125
+126
+
+
# File 'lib/fselector/base.rb', line 119
+
+def set_features(features)
+  if features and features.class == Array
+    @features = features
+  else
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "features must be a Array object!"
+  end
+end
+
+
+ +
+

+ + - (Object) set_opt(key, value) + + + +

+
+

set non-data information as a key-value pair

+ + +
+
+
+ + +
+ + + + +
+
+
+
+155
+156
+157
+
+
# File 'lib/fselector/base.rb', line 155
+
+def set_opt(key, value)
+  @opts[key] = value
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/BaseContinuous.html b/doc/FSelector/BaseContinuous.html new file mode 100644 index 0000000..50f1cb8 --- /dev/null +++ b/doc/FSelector/BaseContinuous.html @@ -0,0 +1,250 @@ + + + + + + Class: FSelector::BaseContinuous + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::BaseContinuous + + + +

+ +
+ +
Inherits:
+
+ Base + +
    +
  • Object
  • + + + + + +
+ show all + +
+ + + + + + +
Includes:
+
Discretilizer, Normalizer
+ + + + + +
Defined in:
+
lib/fselector/base_continuous.rb
+ +
+
+ +

Overview

+
+

base ranking algorithm for handling continous feature

+ + +
+
+
+ + +
+

Direct Known Subclasses

+

PMetric, ReliefF_c, Relief_c, TScore

+
+ + + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (BaseContinuous) initialize(data = nil) + + + +

+
+

initialize from an existing data structure

+ + +
+
+
+ + +
+ + + + +
+
+
+
+17
+18
+19
+
+
# File 'lib/fselector/base_continuous.rb', line 17
+
+def initialize(data=nil)
+  super(data)
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/BaseDiscrete.html b/doc/FSelector/BaseDiscrete.html new file mode 100644 index 0000000..2116fd0 --- /dev/null +++ b/doc/FSelector/BaseDiscrete.html @@ -0,0 +1,246 @@ + + + + + + Class: FSelector::BaseDiscrete + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::BaseDiscrete + + + +

+ +
+ +
Inherits:
+
+ Base + +
    +
  • Object
  • + + + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/base_discrete.rb
+ +
+
+ +

Overview

+
+

base ranking alogrithm for handling discrete feature

+ +
2 x 2 contingency table
+
+      c   c'
+    ---------
+ f  | A | B | A+B
+    |---|---| 
+ f' | C | D | C+D
+    ---------
+     A+C B+D  N = A+B+C+D
+
+ P(f)     = (A+B)/N
+ P(f')    = (C+D)/N
+ P(c)     = (A+C)/N
+ P(c')    = (B+D)/N
+ P(f,c)   = A/N
+ P(f,c')  = B/N
+ P(f',c)  = C/N
+ P(f',c') = D/N
+
+ + +
+
+
+ + +
+

Direct Known Subclasses

+

Accuracy, AccuracyBalanced, BiNormalSeparation, ChiSquaredTest, CorrelationCoefficient, DocumentFrequency, F1Measure, FishersExactTest, GMean, GSSCoefficient, GiniIndex, InformationGain, MatthewsCorrelationCoefficient, McNemarsTest, MutualInformation, OddsRatio, OddsRatioNumerator, Power, Precision, ProbabilityRatio, Random, ReliefF_d, Relief_d, Sensitivity, Specificity

+
+ + + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (BaseDiscrete) initialize(data = nil) + + + +

+
+

initialize from an existing data structure

+ + +
+
+
+ + +
+ + + + +
+
+
+
+29
+30
+31
+
+
# File 'lib/fselector/base_discrete.rb', line 29
+
+def initialize(data=nil)
+  super(data)
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/BiNormalSeparation.html b/doc/FSelector/BiNormalSeparation.html new file mode 100644 index 0000000..edf9a31 --- /dev/null +++ b/doc/FSelector/BiNormalSeparation.html @@ -0,0 +1,191 @@ + + + + + + Class: FSelector::BiNormalSeparation + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::BiNormalSeparation + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + +
Includes:
+
Rubystats
+ + + + + +
Defined in:
+
lib/fselector/algo_discrete/BiNormalSeparation.rb
+ +
+
+ +

Overview

+
+

Bi-Normal Separation (BNS)

+ +
BNS = |F'(tpr) - F'(fpr)|
+
+where F' is normal inverse cumulative distribution function
+R executable is required to calculate qnorm, i.e. F'(x)
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification + and Rubystats

+ + +
+
+
+ + +
+

Constant Summary

+ + + + +

Constant Summary

+ +

Constants included + from Rubystats

+

Rubystats::MAX_VALUE, Rubystats::SQRT2, Rubystats::SQRT2PI, Rubystats::TWO_PI

+ + + + + + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/ChiSquaredTest.html b/doc/FSelector/ChiSquaredTest.html new file mode 100644 index 0000000..a96140c --- /dev/null +++ b/doc/FSelector/ChiSquaredTest.html @@ -0,0 +1,269 @@ + + + + + + Class: FSelector::ChiSquaredTest + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::ChiSquaredTest + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/ChiSquaredTest.rb
+ +
+
+ +

Overview

+
+

Chi-Squared test (CHI)

+ +
             N * ( P(f,c) * P(f',c') - P(f,c') * P(f',c) )^2
+ CHI(f,c) = -------------------------------------------------
+                      P(f) * P(f') * P(c) * P(c')
+
+                   N * (A*D - B*C)^2
+          = -------------------------------
+             (A+B) * (C+D) * (A+C) * (B+D)
+
+ +

suitable for large samples and +none of the values of (A, B, C, D) < 5

+ +

ref: Wikipedia + and A Comparative Study on Feature Selection Methods for + Drug Discovery

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (ChiSquaredTest) initialize(correction = nil, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + correction + + + (Boolean) + + + (defaults to: nil) + + + — +

    Yates's continuity correction
    +:yates, Yates's continuity correction

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+30
+31
+32
+33
+
+
# File 'lib/fselector/algo_discrete/ChiSquaredTest.rb', line 30
+
+def initialize(correction=nil, data=nil)
+  super(data)
+  @correction = (correction==:yates) ? true : false
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/CorrelationCoefficient.html b/doc/FSelector/CorrelationCoefficient.html new file mode 100644 index 0000000..8e8df6c --- /dev/null +++ b/doc/FSelector/CorrelationCoefficient.html @@ -0,0 +1,172 @@ + + + + + + Class: FSelector::CorrelationCoefficient + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::CorrelationCoefficient + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/CorrelationCoefficient.rb
+ +
+
+ +

Overview

+
+

Correlation Coefficient (CC), a variant of CHI, +which can be viewed as a one-sided chi-squared metric

+ +
                  sqrt(N) * (A*D - B*C)
+CC(f,c) = --------------------------------------
+           sqrt( (A+B) * (C+D) * (A+C) * (B+D) )
+
+ +

ref: Optimally Combining Positive and Negative Features for + Text Categorization

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/DocumentFrequency.html b/doc/FSelector/DocumentFrequency.html new file mode 100644 index 0000000..a9f5116 --- /dev/null +++ b/doc/FSelector/DocumentFrequency.html @@ -0,0 +1,169 @@ + + + + + + Class: FSelector::DocumentFrequency + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::DocumentFrequency + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/DocumentFrequency.rb
+ +
+
+ +

Overview

+
+

Document Frequency (DF)

+ +
DF = tp+fp = (A+B)
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Ensemble.html b/doc/FSelector/Ensemble.html new file mode 100644 index 0000000..effbd4f --- /dev/null +++ b/doc/FSelector/Ensemble.html @@ -0,0 +1,913 @@ + + + + + + Class: FSelector::Ensemble + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Ensemble + + + +

+ +
+ +
Inherits:
+
+ Base + +
    +
  • Object
  • + + + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/ensemble.rb
+ +
+
+ +

Overview

+
+

select feature by an ensemble of ranking algorithms

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Ensemble) initialize(*algos) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + rankers + + + (Array) + + + + — +

    multiple feature ranking algorithms

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+10
+11
+12
+13
+14
+15
+16
+17
+
+
# File 'lib/fselector/ensemble.rb', line 10
+
+def initialize(*algos)
+  super(nil)
+  
+  @algos = []
+  algos.each do |r|
+    @algos << r
+  end
+end
+
+
+ +
+ + +
+

Instance Method Details

+ + +
+

+ + - (Object) by_ave(arr) + + + +

+
+

by average value of an array

+ + +
+
+
+ + +
+ + + + +
+
+
+
+126
+127
+128
+
+
# File 'lib/fselector/ensemble.rb', line 126
+
+def by_ave(arr)
+  arr.ave if arr.class == Array
+end
+
+
+ +
+

+ + - (Object) by_max(arr) + + + +

+
+

by max value of an array

+ + +
+
+
+ + +
+ + + + +
+
+
+
+138
+139
+140
+
+
# File 'lib/fselector/ensemble.rb', line 138
+
+def by_max(arr)
+  arr.max if arr.class == Array
+end
+
+
+ +
+

+ + - (Object) by_min(arr) + + + +

+
+

by min value of an array

+ + +
+
+
+ + +
+ + + + +
+
+
+
+132
+133
+134
+
+
# File 'lib/fselector/ensemble.rb', line 132
+
+def by_min(arr)
+  arr.min if arr.class == Array
+end
+
+
+ +
+

+ + - (Object) ensemble_by_rank(by_what = method(:by_min)) + + + +

+
+

ensemble based on rank

+ + +
+
+
+

Parameters:

+
    + +
  • + + by_what + + + (Method) + + + (defaults to: method(:by_min)) + + + — +

    by what criterion that ensemble +rank should be obtained from those of individual algorithms
    +allowed values are:
    +method(:by_min) # by min rank
    +method(:by_max) # by max rank
    +method(:by_ave) # by ave rank

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+103
+104
+105
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+
+
# File 'lib/fselector/ensemble.rb', line 103
+
+def ensemble_by_rank(by_what=method(:by_min))
+  ranks = {}
+       
+  each_feature do |f|
+    ranks[f] = by_what.call(
+      @algos.collect { |r| r.get_feature_ranks[f] }
+    )
+  end
+  
+  new_ranks = {}
+  
+  sorted_features = ranks.keys.sort do |x, y|
+    ranks[x] <=> ranks[y]
+  end
+  sorted_features.each_with_index do |sf, si|
+    new_ranks[sf] = si+1
+  end
+  
+  @ranks = new_ranks
+end
+
+
+ +
+

+ + - (Object) ensemble_by_score(by_what = method(:by_max), norm = :min_max) + + + +

+
+ +
+ Note: +

scores from different algos are usually incompatible with +each other, we have to normalize it first

+
+
+ +

ensemble based on score

+ + +
+
+
+

Parameters:

+
    + +
  • + + by_what + + + (Method) + + + (defaults to: method(:by_max)) + + + — +

    by what criterion that ensemble +score should be obtained from those of individual algorithms
    +allowed values are:
    +receiver.method(:by_min) # by min rank
    +receiver.method(:by_max) # by max rank
    +receiver.method(:by_ave) # by ave rank

    +
    + +
  • + +
  • + + norm + + + (Integer) + + + (defaults to: :min_max) + + + — +

    normalization
    +:min_max, score scaled to [0, 1]
    +:zscore, score converted to zscore

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
+91
+
+
# File 'lib/fselector/ensemble.rb', line 71
+
+def ensemble_by_score(by_what=method(:by_max), norm=:min_max)
+  @algos.each do |r|
+    if norm == :min_max
+      normalize_min_max!(r)
+    elsif norm == :zscore
+      normalize_zscore!(r)
+    else
+      abort "[#{__FILE__}@#{__LINE__}]: "+
+          "invalid normalizer, only :min_max and :zscore supported!"
+    end
+  end
+  
+  @scores = {}
+  
+  each_feature do |f|
+    @scores[f] = {}
+    @scores[f][:BEST] = by_what.call(
+      @algos.collect { |r| r.get_feature_scores[f][:BEST] }
+    )
+  end      
+end
+
+
+ +
+

+ + - (Object) get_feature_ranks + + + +

+
+

reload get_feature_ranks

+ + +
+
+
+ + +
+ + + + +
+
+
+
+48
+49
+50
+51
+52
+53
+
+
# File 'lib/fselector/ensemble.rb', line 48
+
+def get_feature_ranks
+  return @ranks if @ranks
+  
+  abort "[#{__FILE__}@#{__LINE__}]: "+
+          "please call one consensus ranking method first!"
+end
+
+
+ +
+

+ + - (Object) get_feature_scores + + + +

+
+

reload get_feature_scores

+ + +
+
+
+ + +
+ + + + +
+
+
+
+37
+38
+39
+40
+41
+42
+
+
# File 'lib/fselector/ensemble.rb', line 37
+
+def get_feature_scores
+  return @scores if @scores
+  
+  abort "[#{__FILE__}@#{__LINE__}]: "+
+          "please call one consensus scoring method first!"
+end
+
+
+ +
+

+ + - (Object) set_data(data) + + + +

+
+ +
+ Note: +

all algos share the same data structure

+
+
+ +

reload set_data

+ + +
+
+
+ + +
+ + + + +
+
+
+
+25
+26
+27
+28
+29
+30
+31
+
+
# File 'lib/fselector/ensemble.rb', line 25
+
+def set_data(data)
+  super
+  
+  @algos.each do |r|
+    r.set_data(data)
+  end
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/F1Measure.html b/doc/FSelector/F1Measure.html new file mode 100644 index 0000000..1a1df53 --- /dev/null +++ b/doc/FSelector/F1Measure.html @@ -0,0 +1,175 @@ + + + + + + Class: FSelector::F1Measure + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::F1Measure + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/F1Measure.rb
+ +
+
+ +

Overview

+
+

F1-Measure (F1)

+ +
      2 * recall * precision
+F1 = ------------------------
+         recall + precison
+
+           2 * tp               2 * A
+   = ------------------- = --------------
+      tp + fn + tp + fp     A + C + A + B
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/FishersExactTest.html b/doc/FSelector/FishersExactTest.html new file mode 100644 index 0000000..c5f3222 --- /dev/null +++ b/doc/FSelector/FishersExactTest.html @@ -0,0 +1,191 @@ + + + + + + Class: FSelector::FishersExactTest + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::FishersExactTest + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + +
Includes:
+
Rubystats
+ + + + + +
Defined in:
+
lib/fselector/algo_discrete/FishersExactTest.rb
+ +
+
+ +

Overview

+
+

(two-sided) Fisher's Exact Test (FET)

+ +
     (A+B)! * (C+D)! * (A+C)! * (B+D)!  
+p =  -----------------------------------
+             A! * B! * C! * D!
+
+for FET, the smaller, the better, but we intentionally negate it
+so that the larger is always the better (consistent with other algorithms)
+
+ +

ref: Wikipedia and Rubystats

+ + +
+
+
+ + +
+

Constant Summary

+ + + + +

Constant Summary

+ +

Constants included + from Rubystats

+

Rubystats::MAX_VALUE, Rubystats::SQRT2, Rubystats::SQRT2PI, Rubystats::TWO_PI

+ + + + + + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/GMean.html b/doc/FSelector/GMean.html new file mode 100644 index 0000000..445e781 --- /dev/null +++ b/doc/FSelector/GMean.html @@ -0,0 +1,170 @@ + + + + + + Class: FSelector::GMean + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::GMean + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/GMean.rb
+ +
+
+ +

Overview

+
+

GMean (GM)

+ +
GM = sqrt(Sensitivity * Specificity)
+
+                 TP*TN                     A*D
+   = sqrt(------------------) = sqrt(---------------)
+           (TP+FN) * (TN+FP)          (A+C) * (B+D)
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/GSSCoefficient.html b/doc/FSelector/GSSCoefficient.html new file mode 100644 index 0000000..cf781c6 --- /dev/null +++ b/doc/FSelector/GSSCoefficient.html @@ -0,0 +1,175 @@ + + + + + + Class: FSelector::GSSCoefficient + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::GSSCoefficient + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/GSSCoefficient.rb
+ +
+
+ +

Overview

+
+

GSS coefficient (GSS), a simplified variant of Chi-Squared +proposed by Galavotti

+ +
GSS(f,c) = P(f,c) * P(f',c') - P(f,c') * P(f',c)
+
+         = A/N * D/N - B/N * C/N
+
+ +

suitable for large samples and +none of the values of (A, B, C, D) < 5

+ +

ref: A Comparative Study on Feature Selection Methods for Drug + Discovery

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/GiniIndex.html b/doc/FSelector/GiniIndex.html new file mode 100644 index 0000000..901d24a --- /dev/null +++ b/doc/FSelector/GiniIndex.html @@ -0,0 +1,172 @@ + + + + + + Class: FSelector::GiniIndex + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::GiniIndex + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/GiniIndex.rb
+ +
+
+ +

Overview

+
+

Gini Index (GI), generalized for multi-class problem

+ +
GI(f) = 1 - sigma(c)(P(c|f)^2)
+
+ +

for GI, the smaller, the better, but we intentionally negate it +so that the larger is always the better (consistent with other algorithms)

+ +

ref: Advancing Feaure Selection Research - + ASU Feature Selection Repository

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/InformationGain.html b/doc/FSelector/InformationGain.html new file mode 100644 index 0000000..70bc20b --- /dev/null +++ b/doc/FSelector/InformationGain.html @@ -0,0 +1,173 @@ + + + + + + Class: FSelector::InformationGain + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::InformationGain + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/InformationGain.rb
+ +
+
+ +

Overview

+
+

Information Gain for feature with discrete data (IG)

+ +
IG_d(c,f) = H(c) - H(c|f)
+
+where H(c) = -1 * sigma_i (P(ci) logP(ci))
+      H(c|f) = sigma_j (P(fj)*H(c|fj))
+      H(c|fj) = -1 * sigma_k (P(ck|fj) logP(ck|fj))
+
+ +

ref: Using Information Gain to Analyze and Fine Tune + the Performance of Supply Chain Trading Agents

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/MatthewsCorrelationCoefficient.html b/doc/FSelector/MatthewsCorrelationCoefficient.html new file mode 100644 index 0000000..065d253 --- /dev/null +++ b/doc/FSelector/MatthewsCorrelationCoefficient.html @@ -0,0 +1,174 @@ + + + + + + Class: FSelector::MatthewsCorrelationCoefficient + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::MatthewsCorrelationCoefficient + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + +
    +
  • Object
  • + + + + + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/MatthewsCorrelationCoefficient.rb
+ +
+
+ +

Overview

+
+

Matthews Correlation Coefficient (MCC)

+ +
                       tp*tn - fp*fn
+MCC = ---------------------------------------------- = PHI = sqrt(CHI/N)
+       sqrt((tp+fp) * (tp+fn) * (tn+fp) * (tn+fn) )
+
+                     A*D - B*C
+    = -------------------------------------
+      sqrt((A+B) * (A+C) * (B+D) * (C+D))
+
+ +

ref: Wikipedia

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/McNemarsTest.html b/doc/FSelector/McNemarsTest.html new file mode 100644 index 0000000..37d4630 --- /dev/null +++ b/doc/FSelector/McNemarsTest.html @@ -0,0 +1,262 @@ + + + + + + Class: FSelector::McNemarsTest + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::McNemarsTest + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/McNemarsTest.rb
+ +
+
+ +

Overview

+
+

McNemar's test (MN), based on Chi-Squared test

+ +
            (B-C)^2
+MN(f, c) = ---------
+             B+C
+
+ +

suitable for large samples and B+C >= 25

+ +

ref: Wikipedia

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (McNemarsTest) initialize(correction = nil, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + correction + + + (Boolean) + + + (defaults to: nil) + + + — +

    correction Yates's continuity correction
    +:yates, Yates's continuity correction

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+22
+23
+24
+25
+
+
# File 'lib/fselector/algo_discrete/McNemarsTest.rb', line 22
+
+def initialize(correction=nil, data=nil)
+  super(data)
+  @correction = (correction==:yates) ? true : false
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/MutualInformation.html b/doc/FSelector/MutualInformation.html new file mode 100644 index 0000000..d6d9a6b --- /dev/null +++ b/doc/FSelector/MutualInformation.html @@ -0,0 +1,175 @@ + + + + + + Class: FSelector::MutualInformation + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::MutualInformation + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/MutualInformation.rb
+ +
+
+ +

Overview

+
+

Mutual Information (MI)

+ +
                  P(f, c)
+MI(f,c) = log2 -------------
+                P(f) * P(c)
+
+                    A * N
+        = log2 ---------------
+                (A+B) * (A+C)
+
+ +

ref: A Comparative Study on Feature Selection Methods for Drug + Discovery

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/OddsRatio.html b/doc/FSelector/OddsRatio.html new file mode 100644 index 0000000..bc6caa1 --- /dev/null +++ b/doc/FSelector/OddsRatio.html @@ -0,0 +1,176 @@ + + + + + + Class: FSelector::OddsRatio + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::OddsRatio + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/OddsRatio.rb
+ +
+
+ +

Overview

+
+

Odds Ratio (Odd)

+ +
           P(f|c) * (1 - P(f|c'))     tpr * (1-fpr)
+Odd(f,c) = ----------------------- = ---------------
+           (1 - P(f|c)) * P(f|c')     (1-tpr) * fpr
+
+            A*D
+         = -----
+            B*C
+
+ +

ref: Wikipedia and An extensive empirical study of feature selection + metrics for text classification and Optimally Combining Positive + and Negative Features for Text Categorization

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/OddsRatioNumerator.html b/doc/FSelector/OddsRatioNumerator.html new file mode 100644 index 0000000..52c4c6f --- /dev/null +++ b/doc/FSelector/OddsRatioNumerator.html @@ -0,0 +1,173 @@ + + + + + + Class: FSelector::OddsRatioNumerator + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::OddsRatioNumerator + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/OddsRatioNumerator.rb
+ +
+
+ +

Overview

+
+

Odds Ratio Numerator (OddN)

+ +
OddN(f,c) = P(f|c) * (1 - P(f|c')) =  tpr * (1-fpr)
+
+              A           B           A*D
+          = ---- * (1 - ----) = ---------------
+             A+C         B+D     (A+C) * (B+D)
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/PMetric.html b/doc/FSelector/PMetric.html new file mode 100644 index 0000000..b8b1e99 --- /dev/null +++ b/doc/FSelector/PMetric.html @@ -0,0 +1,197 @@ + + + + + + Class: FSelector::PMetric + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::PMetric + + + +

+ +
+ +
Inherits:
+
+ BaseContinuous + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_continuous/PMetric.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

PM applicable only to two-class problems

+
+
+ +

P-Metric (PM) for continous feature

+ +
            |u1 - u2|
+PM(f) = -----------------
+         sigma1 + sigma2
+
+ +

ref: Filter versus wrapper gene selection approaches

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseContinuous

+

#initialize

+ + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseContinuous

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Power.html b/doc/FSelector/Power.html new file mode 100644 index 0000000..2fc0dd5 --- /dev/null +++ b/doc/FSelector/Power.html @@ -0,0 +1,262 @@ + + + + + + Class: FSelector::Power + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Power + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/Power.rb
+ +
+
+ +

Overview

+
+

Power (pow)

+ +
Pow = (1-fpr)^k - (1-tpr)^k
+
+    = (1-B/(B+D))^k - (1-A/(A+C))^k
+
+    = (D/(B+D))^k - (C/(A+C))^k
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Power) initialize(k = 5, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + k + + + (Integer) + + + (defaults to: 5) + + + — +

    power

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+24
+25
+26
+27
+
+
# File 'lib/fselector/algo_discrete/Power.rb', line 24
+
+def initialize(k=5, data=nil)
+  super(data)
+  @k = k
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Precision.html b/doc/FSelector/Precision.html new file mode 100644 index 0000000..dded2b0 --- /dev/null +++ b/doc/FSelector/Precision.html @@ -0,0 +1,168 @@ + + + + + + Class: FSelector::Precision + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Precision + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/Precision.rb
+ +
+
+ +

Overview

+
+

Precision

+ +
              TP        A
+Precision = ------- = -----
+             TP+FP     A+B
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/ProbabilityRatio.html b/doc/FSelector/ProbabilityRatio.html new file mode 100644 index 0000000..95d9aaa --- /dev/null +++ b/doc/FSelector/ProbabilityRatio.html @@ -0,0 +1,173 @@ + + + + + + Class: FSelector::ProbabilityRatio + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::ProbabilityRatio + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/ProbabilityRatio.rb
+ +
+
+ +

Overview

+
+

Probability Ratio (PR)

+ +
PR = tpr / fpr
+
+      A/(A+C)    A * (B+D)
+   = -------- = -----------
+      B/(B+D)    (A+C) * B
+
+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Random.html b/doc/FSelector/Random.html new file mode 100644 index 0000000..ed75d24 --- /dev/null +++ b/doc/FSelector/Random.html @@ -0,0 +1,259 @@ + + + + + + Class: FSelector::Random + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Random + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/Random.rb
+ +
+
+ +

Overview

+
+

Random (Rand), no pratical use but can be used as a baseline

+ +

Rand = rand numbers within [0..1)

+ +

ref: An extensive empirical study of feature selection metrics + for text classification

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Random) initialize(seed = nil, data = nil) + + + +

+
+

initialize from an existing data structure

+ + +
+
+
+

Parameters:

+
    + +
  • + + seed + + + (Integer) + + + (defaults to: nil) + + + — +

    seed form random number +generator. provided for reproducible results, +otherwise use current time as seed

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+22
+23
+24
+25
+
+
# File 'lib/fselector/algo_discrete/Random.rb', line 22
+
+def initialize(seed=nil, data=nil)
+  super(data)
+  srand(seed) if seed
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/ReliefF_c.html b/doc/FSelector/ReliefF_c.html new file mode 100644 index 0000000..eda1d66 --- /dev/null +++ b/doc/FSelector/ReliefF_c.html @@ -0,0 +1,319 @@ + + + + + + Class: FSelector::ReliefF_c + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::ReliefF_c + + + +

+ +
+ +
Inherits:
+
+ BaseContinuous + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_continuous/ReliefF_c.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

applicable to multi-class problem with missing data

+
+
+ +

extended Relief algorithm for continuous feature (ReliefF_c)

+ +

ref: Estimating Attributes: Analysis and Extensions of RELIEF

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (ReliefF_c) initialize(m = nil, k = 10, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + m + + + (Integer) + + + (defaults to: nil) + + + — +

    number of samples to be used +for estimating feature contribution. max can be +the number of training samples

    +
    + +
  • + +
  • + + k + + + (Integer) + + + (defaults to: 10) + + + — +

    number of k-nearest neighbor

    +
    + +
  • + +
  • + + data + + + (Hash) + + + (defaults to: nil) + + + — +

    existing data structure

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+23
+24
+25
+26
+27
+
+
# File 'lib/fselector/algo_continuous/ReliefF_c.rb', line 23
+
+def initialize(m=nil, k=10, data=nil)
+  super(data)
+  @m = m # use all samples
+  @k = (k || 10)  # default 10
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/ReliefF_d.html b/doc/FSelector/ReliefF_d.html new file mode 100644 index 0000000..122f346 --- /dev/null +++ b/doc/FSelector/ReliefF_d.html @@ -0,0 +1,299 @@ + + + + + + Class: FSelector::ReliefF_d + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::ReliefF_d + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/ReliefF_d.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

applicable to multi-class problem with missing data

+
+
+ +

extended Relief algorithm for discrete feature (ReliefF_d)

+ +

ref: Estimating Attributes: Analysis and Extensions of RELIEF

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (ReliefF_d) initialize(m = nil, k = 10, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + m + + + (Integer) + + + (defaults to: nil) + + + — +

    number of samples to be used +for estimating feature contribution. max can be +the number of training samples

    +
    + +
  • + +
  • + + k + + + (Integer) + + + (defaults to: 10) + + + — +

    number of k-nearest neighbor

    +
    + +
  • + +
  • + + data + + + (Hash) + + + (defaults to: nil) + + + — +

    existing data structure

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+22
+23
+24
+25
+26
+
+
# File 'lib/fselector/algo_discrete/ReliefF_d.rb', line 22
+
+def initialize(m=nil, k=10, data=nil)
+  super(data)
+  @m = m # use all samples
+  @k = (k || 10)  # default 10
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Relief_c.html b/doc/FSelector/Relief_c.html new file mode 100644 index 0000000..089502a --- /dev/null +++ b/doc/FSelector/Relief_c.html @@ -0,0 +1,301 @@ + + + + + + Class: FSelector::Relief_c + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Relief_c + + + +

+ +
+ +
Inherits:
+
+ BaseContinuous + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_continuous/Relief_c.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

Relief applicable only to two-class problem without missing data

+
+
+ +

Relief algorithm for continuous feature (Relief_c)

+ +

ref: The Feature Selection Problem: Traditional Methods + and a New Algorithm

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Relief_c) initialize(m = nil, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + m + + + (Integer) + + + (defaults to: nil) + + + — +

    number of samples to be used +for estimating feature contribution. max can be +the number of training samples

    +
    + +
  • + +
  • + + data + + + (Hash) + + + (defaults to: nil) + + + — +

    existing data structure

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+23
+24
+25
+26
+
+
# File 'lib/fselector/algo_continuous/Relief_c.rb', line 23
+
+def initialize(m=nil, data=nil)
+  super(data)
+  @m = m # default use all samples
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Relief_d.html b/doc/FSelector/Relief_d.html new file mode 100644 index 0000000..724c523 --- /dev/null +++ b/doc/FSelector/Relief_d.html @@ -0,0 +1,281 @@ + + + + + + Class: FSelector::Relief_d + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Relief_d + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/Relief_d.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

Relief applicable only to two-class problem without missing data

+
+
+ +

Relief algorithm for discrete feature (Relief_d)

+ +

ref: The Feature Selection Problem: Traditional Methods + and a New Algorithm

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + + + + + + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +
+

+ + - (Relief_d) initialize(m = nil, data = nil) + + + +

+
+

new()

+ + +
+
+
+

Parameters:

+
    + +
  • + + m + + + (Integer) + + + (defaults to: nil) + + + — +

    number of samples to be used +for estimating feature contribution. max can be +the number of training samples

    +
    + +
  • + +
  • + + data + + + (Hash) + + + (defaults to: nil) + + + — +

    existing data structure

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+23
+24
+25
+26
+
+
# File 'lib/fselector/algo_discrete/Relief_d.rb', line 23
+
+def initialize(m=nil, data=nil)
+  super(data)
+  @m = m # default use all samples
+end
+
+
+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Sensitivity.html b/doc/FSelector/Sensitivity.html new file mode 100644 index 0000000..638b50d --- /dev/null +++ b/doc/FSelector/Sensitivity.html @@ -0,0 +1,168 @@ + + + + + + Class: FSelector::Sensitivity + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Sensitivity + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/Sensitivity.rb
+ +
+
+ +

Overview

+
+

Sensitivity (SN)

+ +
        TP        A
+SN = ------- = -----
+       TP+FN     A+C
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/Specificity.html b/doc/FSelector/Specificity.html new file mode 100644 index 0000000..345dd1b --- /dev/null +++ b/doc/FSelector/Specificity.html @@ -0,0 +1,168 @@ + + + + + + Class: FSelector::Specificity + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::Specificity + + + +

+ +
+ +
Inherits:
+
+ BaseDiscrete + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_discrete/Specificity.rb
+ +
+
+ +

Overview

+
+

Specificity (SP)

+ +
        TN        D
+SP  = ------- = -----
+       TN+FP     B+D
+
+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseDiscrete

+

#initialize

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseDiscrete

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FSelector/TScore.html b/doc/FSelector/TScore.html new file mode 100644 index 0000000..c7284e1 --- /dev/null +++ b/doc/FSelector/TScore.html @@ -0,0 +1,197 @@ + + + + + + Class: FSelector::TScore + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: FSelector::TScore + + + +

+ +
+ +
Inherits:
+
+ BaseContinuous + + + show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/algo_continuous/TScore.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

TS applicable only to two-class problems

+
+
+ +

t-score (TS) based on Student's t-test for continous feature

+ +
                       |u1 - u2|
+TS(f) = --------------------------------------------
+         sqrt((n1*sigma1^2 + n_2*sigma2^2)/(n1+n2))
+
+ +

ref: Filter versus wrapper gene selection approaches

+ + +
+
+
+ + +
+ + + + + + + + + + + + +

Method Summary

+ +

Methods inherited from BaseContinuous

+

#initialize

+ + + + + + + + +

Methods included from Discretilizer

+

#discretize_chimerge!, #discretize_equal_frequency!, #discretize_equal_width!

+ + + + + + + + +

Methods included from Normalizer

+

#normalize_log!, #normalize_min_max!, #normalize_zscore!

+ + + + + + + + +

Methods inherited from Base

+

#each_class, #each_feature, #each_sample, #get_classes, #get_data, #get_feature_ranks, #get_feature_scores, #get_feature_values, #get_features, #get_opt, #get_sample_size, #initialize, #print_feature_ranks, #print_feature_scores, #select_data_by_rank!, #select_data_by_score!, #set_classes, #set_data, #set_feature_score, #set_features, #set_opt

+ + + + + + + + +

Methods included from FileIO

+

#data_from_csv, #data_from_libsvm, #data_from_random, #data_from_weka, #data_to_csv, #data_to_libsvm, #data_to_weka

+
+

Constructor Details

+ +

This class inherits a constructor from FSelector::BaseContinuous

+ +
+ + +
+ + + + + \ No newline at end of file diff --git a/doc/FileIO.html b/doc/FileIO.html new file mode 100644 index 0000000..5a131f2 --- /dev/null +++ b/doc/FileIO.html @@ -0,0 +1,1472 @@ + + + + + + Module: FileIO + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Module: FileIO + + + +

+ +
+ + + + + + + +
Included in:
+
FSelector::Base
+ + + +
Defined in:
+
lib/fselector/fileio.rb
+ +
+
+ +

Overview

+
+ +
+ Note: +

class labels and features are treated as symbols, +e.g. length => :length

+
+
+ +

read and write various file formats

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + +
+

Instance Method Details

+ + +
+

+ + - (Object) data_from_csv(fname = :stdin) + + + +

+
+ +
+ Note: +

missing values allowed

+
+
+ +

read from csv

+ +

file should have the format with the first two rows +specifying features and their data types e.g.
+feat1,feat2,...,featn
+data_type1,data_type2,...,data_typen

+ +

and the remaing rows showing data e.g.
+class_label,feat_value1,feat_value2,...,feat_value3
+...

+ +

allowed data types are:
+INTEGER, REAL, NUMERIC, CONTINUOUS, STRING, NOMINAL, CATEGORICAL

+ + +
+
+
+

Parameters:

+
    + +
  • + + fname + + + (String) + + + (defaults to: :stdin) + + + — +

    file to read from
    +:stdin => read from standard input instead of file

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+146
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+163
+164
+165
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+182
+183
+184
+185
+186
+187
+188
+189
+190
+191
+192
+193
+194
+195
+196
+197
+198
+199
+200
+201
+202
+203
+204
+205
+206
+207
+208
+
+
# File 'lib/fselector/fileio.rb', line 146
+
+def data_from_csv(fname=:stdin)
+  data = {}
+  
+  if fname == :stdin
+    ifs = $stdin
+  elsif not File.exists? fname
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "File '#{fname}' does not exist!"
+  else
+    ifs = File.open(fname)
+  end
+  
+  first_row, second_row = true, true
+  feats, types = [], []
+  
+  ifs.each_line do |ln|
+    if first_row # first row
+      first_row = false
+      *feats = ln.chomp.split(/,/).to_sym
+    elsif second_row # second row
+      second_row = false
+      *types = ln.chomp.split(/,/)
+      if types.size == feats.size
+        types.each_with_index do |t, i|
+          set_opt(feats[i], t.upcase) # record data type
+        end
+      else
+        abort "[#{__FILE__}@#{__LINE__}]: "+
+              "1st and 2nd row must have same fields"
+      end
+    else # data rows
+      label, *fvs = ln.chomp.split(/,/)
+      label = label.to_sym
+      data[label] = [] if not data.has_key? label
+      
+      fs = {}
+      fvs.each_with_index do |v, i|
+        next if v.empty? # missing value
+        data_type = get_opt(feats[i])
+        if data_type == 'INTEGER'
+          v = v.to_i
+        elsif ['REAL', 'NUMERIC', 'CONTINUOUS'].include? data_type
+          v = v.to_f
+        elsif ['STRING', 'NOMINAL', 'CATEGORICAL'].include? data_type
+          #
+        else
+          abort "[#{__FILE__}@#{__LINE__}]: "+
+                "please specify correct data type "+
+                "for each feature in the 2nd row"
+        end
+        
+        fs[feats[i]] = v
+      end
+      
+      data[label] << fs
+    end
+  end
+  
+  # close file
+  ifs.close if not ifs == $stdin
+  
+  set_data(data)
+end
+
+
+ +
+

+ + - (Object) data_from_libsvm(fname = :stdin) + + + +

+
+

read from libsvm

+ +

file has the following format
++1 2:1 4:1 ...
+-1 3:1 4:1 ...
+....

+ + +
+
+
+

Parameters:

+
    + +
  • + + fname + + + (String) + + + (defaults to: :stdin) + + + — +

    file to read from
    +:stdin => read from standard input instead of file

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+67
+68
+69
+70
+71
+72
+73
+74
+75
+76
+77
+78
+79
+80
+81
+82
+83
+84
+85
+86
+87
+88
+89
+90
+91
+92
+93
+94
+95
+96
+97
+
+
# File 'lib/fselector/fileio.rb', line 67
+
+def data_from_libsvm(fname=:stdin)
+  data = {}
+  
+  if fname == :stdin
+    ifs = $stdin
+  elsif not File.exists? fname
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "File '#{fname}' does not exist!"
+  else
+    ifs = File.open(fname)
+  end
+  
+  ifs.each_line do |ln|
+    label, *features = ln.chomp.split(/\s+/)
+    label = label.to_sym
+    data[label] = [] if not data.has_key? label
+    
+    feats = {}
+    features.each do |fv|
+      f, v = fv.split(/:/)
+      feats[f.to_sym] = v.to_f
+    end
+    
+    data[label] << feats
+  end
+  
+  # close file
+  ifs.close if not ifs == $stdin
+  
+  set_data(data)
+end
+
+
+ +
+

+ + - (Object) data_from_random(nsample = 100, nclass = 2, nfeature = 10, ncategory = 2, allow_mv = true) + + + +

+
+

read from random data (for test)

+ + +
+
+
+

Parameters:

+
    + +
  • + + nsample + + + (Integer) + + + (defaults to: 100) + + + — +

    number of total samples

    +
    + +
  • + +
  • + + nclass + + + (Integer) + + + (defaults to: 2) + + + — +

    number of classes

    +
    + +
  • + +
  • + + nfeature + + + (Integer) + + + (defaults to: 10) + + + — +

    number of features

    +
    + +
  • + +
  • + + ncategory + + + (Integer) + + + (defaults to: 2) + + + — +

    number of categories for each feature
    +1 => binary feature with only on bit
    +>1 => discrete feature with multiple values
    +otherwise => continuous feature with vaule in the range of [0, 1)

    +
    + +
  • + +
  • + + allow_mv + + + (true, false) + + + (defaults to: true) + + + — +

    whether missing value of feature is alowed or not

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+33
+34
+35
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+
+
# File 'lib/fselector/fileio.rb', line 20
+
+def data_from_random(nsample=100, nclass=2, nfeature=10, ncategory=2, allow_mv=true)
+  data = {}
+
+  nsample.times do
+    k = "c#{rand(nclass)}".to_sym
+    
+    data[k] = [] if not data.has_key? k
+    
+    feats = {}
+    fs = (1..nfeature).to_a
+    
+    if allow_mv
+      (rand(nfeature)).times do
+        v = fs[rand(fs.size)]
+        fs.delete(v)
+      end
+    end
+    
+    fs.sort.each do |i|
+      f = "f#{i}".to_sym
+      if ncategory == 1
+        feats[f] = 1
+      elsif ncategory > 1
+        feats[f] = rand(ncategory)
+      else
+        feats[f] = rand
+      end
+    end
+    
+    data[k] << feats
+  end
+
+  set_data(data)
+end
+
+
+ +
+

+ + - (Object) data_from_weka(fname = :stdin, quote_char = '"') + + + +

+
+ +
+ Note: +

it's ok if string containes spaces quoted by quote_char

+
+
+ +

read from WEKA ARFF file

+ + +
+
+
+

Parameters:

+
    + +
  • + + fname + + + (String) + + + (defaults to: :stdin) + + + — +

    file to read from
    +:stdin => read from standard input instead of file

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+257
+258
+259
+260
+261
+262
+263
+264
+265
+266
+267
+268
+269
+270
+271
+272
+273
+274
+275
+276
+277
+278
+279
+280
+281
+282
+283
+284
+285
+286
+287
+288
+289
+290
+291
+292
+293
+294
+295
+296
+297
+298
+299
+300
+301
+302
+303
+304
+305
+306
+307
+308
+309
+310
+311
+312
+313
+314
+315
+316
+317
+318
+319
+320
+321
+322
+323
+324
+325
+326
+327
+328
+329
+330
+331
+332
+333
+334
+335
+336
+337
+338
+339
+340
+341
+342
+343
+344
+345
+346
+347
+348
+349
+350
+
+
# File 'lib/fselector/fileio.rb', line 257
+
+def data_from_weka(fname=:stdin, quote_char='"')
+  data = {}
+  
+  if fname == :stdin
+    ifs = $stdin
+  elsif not File.exists? fname
+    abort "[#{__FILE__}@#{__LINE__}]: "+
+          "File '#{fname}' does not exist!"
+  else
+    ifs = File.open(fname)
+  end
+  
+  features, classes, comments = [], [], []
+  has_class, has_data = false, false
+  
+  ifs.each_line do |ln|
+    next if ln.blank? # blank lines
+    
+    ln = ln.chomp
+    
+    # comment line
+    if ln.comment?('%')
+      comments << ln
+    # relation
+    elsif ln =~ /^@RELATION/i
+      tmp, relation = ln.split_me(/\s+/, quote_char)
+      set_opt('@RELATION', relation)
+    # class attribute
+    elsif ln =~ /^@ATTRIBUTE\s+class\s+{(.+)}/i
+      has_class = true
+      classes = $1.split_me(/,\s*/, quote_char).to_sym
+      classes.each { |k| data[k] = [] }
+    # feature attribute (nominal)
+    elsif ln =~ /^@ATTRIBUTE\s+(\S+)\s+{(.+)}/i
+      f = $1.to_sym
+      features << f
+      #$2.split_me(/,\s*/, quote_char) # feature nominal values
+      set_opt(f, 'NOMINAL')
+    # feature attribute (integer, real, numeric, string, date)
+    elsif ln =~ /^@ATTRIBUTE/i
+      tmp, v1, v2 = ln.split_me(/\s+/, quote_char)
+      f = v1.to_sym
+      features << f
+      set_opt(f, v2.upcase) # record feature data type
+    # data header
+    elsif ln =~ /^@DATA/i
+      has_data = true
+    # data
+    elsif has_data and has_class
+      # read data section
+      if ln =~ /^{(.+)}$/ # sparse ARFF
+        feats = $1.split_me(/,\s*/, quote_char)
+        label = feats.pop.split_me(/\s+/, quote_char)[1]
+        label = label.to_sym
+        
+        fs = {}
+        nonzero_fi = []
+        feats.each do |fi_fv|
+          fi, fv = fi_fv.split_me(/\s+/, quote_char)
+          fi = fi.to_i             
+          add_feature_weka(fs, features[fi], fv)
+          nonzero_fi << fi
+        end
+        
+        # feature with zero value
+        features.each_with_index do |f0, i|
+          add_feature_weka(fs, f0, 0) if not nonzero_fi.include?(i)
+        end
+        
+        data[label] << fs
+      else # regular ARFF
+        feats = ln.split_me(/,\s*/, quote_char)
+        label = feats.pop.to_sym
+        
+        fs = {}
+        feats.each_with_index do |fv, i|
+          add_feature_weka(fs, features[i], fv)
+        end
+        
+        data[label] << fs if label
+      end
+    else
+      next
+    end
+  end
+  
+  # close file
+  ifs.close if not ifs == $stdin
+  
+  set_data(data)
+  set_classes(classes)
+  set_features(features)
+  set_opt('COMMENTS', comments) if not comments.empty?
+end
+
+
+ +
+

+ + - (Object) data_to_csv(fname = :stdout) + + + +

+
+

write to csv

+ +

file has the format with the first two rows +specifying features and their data types +and the remaing rows showing data

+ + +
+
+
+

Parameters:

+
    + +
  • + + fname + + + (String) + + + (defaults to: :stdout) + + + — +

    file to write
    +:stdout => write to standard ouput instead of file

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+221
+222
+223
+224
+225
+226
+227
+228
+229
+230
+231
+232
+233
+234
+235
+236
+237
+238
+239
+240
+241
+242
+243
+244
+245
+246
+247
+
+
# File 'lib/fselector/fileio.rb', line 221
+
+def data_to_csv(fname=:stdout)
+  if fname == :stdout
+    ofs = $stdout
+  else
+    ofs = File.open(fname, 'w')
+  end
+   
+  ofs.puts get_features.join(',')
+  ofs.puts get_features.collect { |f| 
+    get_opt(f) || 'STRING'
+  }.join(',')
+  
+  each_sample do |k, s|
+    ofs.print "#{k}"
+    each_feature do |f|
+      if s.has_key? f
+        ofs.print ",#{s[f]}"
+      else
+        ofs.print ","
+      end
+    end
+    ofs.puts
+  end
+  
+  # close file
+  ofs.close if not ofs == $stdout    
+end
+
+
+ +
+

+ + - (Object) data_to_libsvm(fname = :stdout) + + + +

+
+

write to libsvm

+ + +
+
+
+

Parameters:

+
    + +
  • + + fname + + + (String) + + + (defaults to: :stdout) + + + — +

    file to write
    +:stdout => write to standard ouput instead of file

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+106
+107
+108
+109
+110
+111
+112
+113
+114
+115
+116
+117
+118
+119
+120
+121
+122
+123
+
+
# File 'lib/fselector/fileio.rb', line 106
+
+def data_to_libsvm(fname=:stdout)
+  if fname == :stdout
+    ofs = $stdout
+  else
+    ofs = File.open(fname, 'w')
+  end
+  
+  each_sample do |k, s|
+    ofs.print "#{k} "
+    s.keys.sort { |x, y| x.to_s.to_i <=> y.to_s.to_i }.each do |i|
+      ofs.print " #{i}:#{s[i]}" if not s[i].zero?
+    end
+    ofs.puts
+  end
+  
+  # close file
+  ofs.close if not ofs == $stdout
+end
+
+
+ +
+

+ + - (Object) data_to_weka(fname = :stdout, format = nil) + + + +

+
+

write to WEKA ARFF file

+ + +
+
+
+

Parameters:

+
    + +
  • + + fname + + + (String) + + + (defaults to: :stdout) + + + — +

    file to write
    +:stdout => write to standard ouput instead of file

    +
    + +
  • + +
  • + + format + + + (Symbol) + + + (defaults to: nil) + + + — +

    sparse or regular ARFF
    +:sparse => sparse ARFF, otherwise regular ARFF

    +
    + +
  • + +
+ + +
+ + + + +
+
+
+
+361
+362
+363
+364
+365
+366
+367
+368
+369
+370
+371
+372
+373
+374
+375
+376
+377
+378
+379
+380
+381
+382
+383
+384
+385
+386
+387
+388
+389
+390
+391
+392
+393
+394
+395
+396
+397
+398
+399
+400
+401
+402
+403
+404
+405
+406
+407
+408
+409
+410
+411
+412
+413
+414
+415
+416
+417
+418
+419
+420
+421
+422
+423
+424
+425
+426
+427
+428
+429
+430
+431
+432
+433
+
+
# File 'lib/fselector/fileio.rb', line 361
+
+def data_to_weka(fname=:stdout, format=nil)
+  if fname == :stdout
+    ofs = $stdout
+  else
+    ofs = File.open(fname, 'w')
+  end
+  
+  # comments
+  comments = get_opt('COMMENTS')
+  if comments
+    ofs.puts comments.join("\n")
+    ofs.puts
+  end         
+  
+  # relation
+  relation = get_opt('@RELATION')
+  if relation
+    ofs.puts "@RELATION #{relation}"
+  else
+    ofs.puts "@RELATION data_gen_by_FSelector"
+  end
+  
+  ofs.puts
+  
+  # feature attribute
+  each_feature do |f|
+    ofs.print "@ATTRIBUTE #{f} "
+    type = get_opt(f)
+    if type
+      if type == 'NOMINAL'
+        ofs.puts "{#{get_feature_values(f).uniq.sort.join(',')}}"
+      else
+        ofs.puts type
+      end
+    else # treat all other data types as string
+      ofs.puts "STRING"
+    end
+  end
+  
+  # class attribute
+  ofs.puts "@ATTRIBUTE class {#{get_classes.join(',')}}"
+  
+  ofs.puts
+  
+  # data header
+  ofs.puts "@DATA"
+  each_sample do |k, s|
+    if format == :sparse # sparse ARFF
+      ofs.print "{"
+      get_features.each_with_index do |f, i|
+        if s.has_key? f
+          ofs.print "#{i} #{s[f]}," if not s[f].zero?
+        else # missing value
+          ofs.print "#{i} ?,"
+        end
+      end
+      ofs.print "#{get_features.size} #{k}"
+      ofs.puts "}"
+    else
+      each_feature do |f|
+        if s.has_key? f
+          ofs.print "#{s[f]},"
+        else # missing value
+          ofs.print "?,"
+        end
+      end
+      ofs.puts "#{k}"
+    end
+  end
+  
+  # close file
+  ofs.close if not ofs == $stdout
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/Normalizer.html b/doc/Normalizer.html new file mode 100644 index 0000000..d117bb4 --- /dev/null +++ b/doc/Normalizer.html @@ -0,0 +1,381 @@ + + + + + + Module: Normalizer + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Module: Normalizer + + + +

+ +
+ + + + + + + +
Included in:
+
FSelector::BaseContinuous
+ + + +
Defined in:
+
lib/fselector/algo_continuous/normalizer.rb
+ +
+
+ +

Overview

+
+

normalize continuous feature

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + +
+

Instance Method Details

+ + +
+

+ + - (Object) normalize_log!(base = 10) + + + +

+
+

log transformation, requires positive feature values

+ + +
+
+
+ + +
+ + + + +
+
+
+
+6
+7
+8
+9
+10
+11
+12
+
+
# File 'lib/fselector/algo_continuous/normalizer.rb', line 6
+
+def normalize_log!(base=10)
+  each_sample do |k, s|
+    s.keys.each do |f|
+      s[f] = Math.log(s[f], base) if s[f] > 0.0
+    end
+  end
+end
+
+
+ +
+

+ + - (Object) normalize_min_max!(min = 0.0, max = 1.0) + + + +

+
+

scale to [min,max], max > min

+ + +
+
+
+ + +
+ + + + +
+
+
+
+16
+17
+18
+19
+20
+21
+22
+23
+24
+25
+26
+27
+28
+29
+30
+31
+32
+
+
# File 'lib/fselector/algo_continuous/normalizer.rb', line 16
+
+def normalize_min_max!(min=0.0, max=1.0)
+  # first determine min and max for each feature
+  f2min_max = {}
+       
+  each_feature do |f|
+    fvs = get_feature_values(f)
+    f2min_max[f] = [fvs.min, fvs.max]
+  end
+  
+  # then normalize
+  each_sample do |k, s|
+    s.keys.each do |f|
+      min_v, max_v = f2min_max[f]
+      s[f] = min + (s[f]-min_v) * (max-min) / (max_v-min_v)
+    end
+  end
+end
+
+
+ +
+

+ + - (Object) normalize_zscore! + + + +

+
+

by z-score

+ + +
+
+
+ + +
+ + + + +
+
+
+
+36
+37
+38
+39
+40
+41
+42
+43
+44
+45
+46
+47
+48
+49
+50
+51
+52
+53
+54
+55
+56
+
+
# File 'lib/fselector/algo_continuous/normalizer.rb', line 36
+
+def normalize_zscore!
+  # first determine mean and sd for each feature
+  f2mean_sd = {}
+  
+  each_feature do |f|
+    fvs = get_feature_values(f)
+    f2mean_sd[f] = fvs.mean, fvs.sd
+  end
+  
+  # then normalize
+  each_sample do |k, s|
+    s.keys.each do |f|
+      mean, sd = f2mean_sd[f]
+      if sd.zero?
+        s[f] = 0.0
+      else
+        s[f] = (s[f]-mean)/sd
+      end
+    end
+  end
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/Rubystats.html b/doc/Rubystats.html new file mode 100644 index 0000000..03e902d --- /dev/null +++ b/doc/Rubystats.html @@ -0,0 +1,158 @@ + + + + + + Module: Rubystats + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Module: Rubystats + + + +

+ +
+ + + + + + + +
Included in:
+
FSelector::BiNormalSeparation, FSelector::FishersExactTest
+ + + +
Defined in:
+
lib/fselector/util.rb
+ +
+
+ +

Overview

+
+

adapted from the Ruby statistics libraries -- +Rubystats

+ +
    +
  • for Fisher's exact test (Rubystats::FishersExactTest.calculate()) +used by algo_binary/FishersExactText.rb
  • +
  • for inverse cumulative normal distribution function (Rubystats::NormalDistribution.get_icdf()) +used by algo_binary/BiNormalSeparation.rb. note the original get_icdf() function is a private +one, so we have to open it up and that's why the codes here.
  • +
+ + +
+
+
+ + +

Defined Under Namespace

+

+ + + + + Classes: FishersExactTest, NormalDistribution + + +

+ +

Constant Summary

+ +
+ +
MAX_VALUE = + +
+
1.2e290
+ +
SQRT2PI = + +
+
2.5066282746310005024157652848110452530069867406099
+ +
SQRT2 = + +
+
1.4142135623730950488016887242096980785696718753769
+ +
TWO_PI = + +
+
6.2831853071795864769252867665590057683943387987502
+ +
+ + + + + + + + + + +
+ + + + + \ No newline at end of file diff --git a/doc/Rubystats/FishersExactTest.html b/doc/Rubystats/FishersExactTest.html new file mode 100644 index 0000000..f89d293 --- /dev/null +++ b/doc/Rubystats/FishersExactTest.html @@ -0,0 +1,318 @@ + + + + + + Class: Rubystats::FishersExactTest + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: Rubystats::FishersExactTest + + + +

+ +
+ +
Inherits:
+
+ Object + +
    +
  • Object
  • + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/util.rb
+ +
+
+ +

Overview

+
+

Fisher's exact test calculator

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + +
+

Constructor Details

+ +
+

+ + - (FishersExactTest) initialize + + + +

+
+

new()

+ + +
+
+
+ + +
+ + + + +
+
+
+
+147
+148
+149
+150
+151
+152
+153
+154
+155
+156
+157
+158
+159
+160
+161
+162
+
+
# File 'lib/fselector/util.rb', line 147
+
+def initialize
+  @sn11    = 0.0
+  @sn1_    = 0.0
+  @sn_1    = 0.0
+  @sn      = 0.0
+  @sprob   = 0.0
+
+  @sleft   = 0.0
+  @sright  = 0.0 
+  @sless   = 0.0 
+  @slarg   = 0.0
+
+  @left    = 0.0
+  @right   = 0.0
+  @twotail = 0.0
+end
+
+
+ +
+ + +
+

Instance Method Details

+ + +
+

+ + - (Object) calculate(n11_, n12_, n21_, n22_) + + + +

+
+

Fisher's exact test

+ + +
+
+
+ + +
+ + + + +
+
+
+
+166
+167
+168
+169
+170
+171
+172
+173
+174
+175
+176
+177
+178
+179
+180
+181
+
+
# File 'lib/fselector/util.rb', line 166
+
+def calculate(n11_,n12_,n21_,n22_)
+  n11_ *= -1 if n11_ < 0
+  n12_ *= -1 if n12_ < 0
+  n21_ *= -1 if n21_ < 0 
+  n22_ *= -1 if n22_ < 0 
+  n1_     = n11_ + n12_
+  n_1     = n11_ + n21_
+  n       = n11_ + n12_ + n21_ + n22_
+  prob    = exact(n11_,n1_,n_1,n)
+  left    = @sless
+  right   = @slarg
+  twotail = @sleft + @sright
+  twotail = 1 if twotail > 1
+  values_hash = { :left =>left, :right =>right, :twotail =>twotail }
+  return values_hash
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/Rubystats/NormalDistribution.html b/doc/Rubystats/NormalDistribution.html new file mode 100644 index 0000000..8bc0d79 --- /dev/null +++ b/doc/Rubystats/NormalDistribution.html @@ -0,0 +1,461 @@ + + + + + + Class: Rubystats::NormalDistribution + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: Rubystats::NormalDistribution + + + +

+ +
+ +
Inherits:
+
+ Object + +
    +
  • Object
  • + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/util.rb
+ +
+
+ +

Overview

+
+

Normal distribution

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + +
+

Constructor Details

+ +
+

+ + - (NormalDistribution) initialize(mu = 0.0, sigma = 1.0) + + + +

+
+

Constructs a normal distribution (defaults to zero mean and +unity variance)

+ + +
+
+
+ + +
+ + + + +
+
+
+
+321
+322
+323
+324
+325
+326
+327
+328
+329
+330
+
+
# File 'lib/fselector/util.rb', line 321
+
+def initialize(mu=0.0, sigma=1.0)
+  @mean = mu
+  if sigma <= 0.0
+    return "error"
+  end
+  @stdev = sigma
+  @variance = sigma**2
+  @pdf_denominator = SQRT2PI * Math.sqrt(@variance)
+  @cdf_denominator = SQRT2   * Math.sqrt(@variance)
+end
+
+
+ +
+ + +
+

Instance Method Details

+ + +
+

+ + - (Object) get_cdf(x) + + + +

+
+

Obtain single CDF value +Returns the probability that a stochastic variable x is less than X, +i.e. P(x<X)

+ + +
+
+
+ + +
+ + + + +
+
+
+
+344
+345
+346
+
+
# File 'lib/fselector/util.rb', line 344
+
+def get_cdf(x)
+  complementary_error( -(x - @mean) / @cdf_denominator) / 2
+end
+
+
+ +
+

+ + - (Object) get_icdf(p) + + + +

+
+

Obtain single inverse CDF value. +returns the value X for which P(x<X).

+ + +
+
+
+ + +
+ + + + +
+
+
+
+351
+352
+353
+354
+355
+356
+357
+358
+359
+360
+361
+362
+363
+364
+365
+366
+367
+368
+369
+370
+371
+372
+373
+374
+375
+376
+377
+378
+
+
# File 'lib/fselector/util.rb', line 351
+
+def get_icdf(p)
+  check_range(p)
+  if p == 0.0
+    return -MAX_VALUE
+  end
+  if p == 1.0
+    return MAX_VALUE
+  end
+  if p == 0.5
+  return @mean
+  end
+
+  mean_save = @mean
+  var_save = @variance
+  pdf_D_save = @pdf_denominator
+  cdf_D_save = @cdf_denominator
+  @mean = 0.0
+  @variance = 1.0
+  @pdf_denominator = Math.sqrt(TWO_PI)
+  @cdf_denominator = SQRT2
+  x = find_root(p, 0.0, -100.0, 100.0)
+  #scale back
+  @mean = mean_save
+  @variance = var_save
+  @pdf_denominator = pdf_D_save
+  @cdf_denominator = cdf_D_save
+  return x * Math.sqrt(@variance) + @mean
+end
+
+
+ +
+

+ + - (Object) get_pdf(x) + + + +

+
+

Obtain single PDF value +Returns the probability that a stochastic variable x has the value X, +i.e. P(x=X)

+ + +
+
+
+ + +
+ + + + +
+
+
+
+336
+337
+338
+
+
# File 'lib/fselector/util.rb', line 336
+
+def get_pdf(x)
+  Math.exp( -((x-@mean)**2) / (2 * @variance)) / @pdf_denominator
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/String.html b/doc/String.html new file mode 100644 index 0000000..ac45887 --- /dev/null +++ b/doc/String.html @@ -0,0 +1,430 @@ + + + + + + Class: String + + — Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Class: String + + + +

+ +
+ +
Inherits:
+
+ Object + +
    +
  • Object
  • + + + +
+ show all + +
+ + + + + + + + + +
Defined in:
+
lib/fselector/util.rb
+ +
+
+ +

Overview

+
+

add functions to String class

+ + +
+
+
+ + +
+ + + + + + + +

+ Instance Method Summary + (collapse) +

+ + + + + + +
+

Instance Method Details

+ + +
+

+ + - (Boolean) blank? + + + +

+
+

blank line?

+ + +
+
+
+ +

Returns:

+
    + +
  • + + + (Boolean) + + + +
  • + +
+ +
+ + + + +
+
+
+
+91
+92
+93
+
+
# File 'lib/fselector/util.rb', line 91
+
+def blank?
+  return self =~ /^\s*$/
+end
+
+
+ +
+

+ + - (Boolean) comment?(char = '#') + + + +

+
+

comment line?

+ + +
+
+
+

Parameters:

+
    + +
  • + + char + + + (String) + + + (defaults to: '#') + + + — +

    line beginning char

    +
    + +
  • + +
+ +

Returns:

+
    + +
  • + + + (Boolean) + + + +
  • + +
+ +
+ + + + +
+
+
+
+85
+86
+87
+
+
# File 'lib/fselector/util.rb', line 85
+
+def comment?(char='#')
+  return self =~ /^#{char}/
+end
+
+
+ +
+

+ + - (Array<String>) split_me(delim_regex, quote_char = "'") + + + +

+
+

Enhanced String.split with escape char, which means +string included in a pair of escape char is considered as a whole +even if it matches the split regular expression. this is especially +useful to parse CSV file that contains comma in a doube-quoted string +e.g. 'a,"b, c",d'.split_me(/,/, '"') => [a, 'b, c', d]

+ + +
+
+
+

Parameters:

+
    + +
  • + + delim_regex + + + (Regex) + + + + — +

    regular expression for split

    +
    + +
  • + +
  • + + quote + + + (String) + + + + — +

    quote char such as ' and "

    +
    + +
  • + +
+ +

Returns:

+ + +
+ + + + +
+
+
+
+107
+108
+109
+110
+111
+112
+113
+114
+
+
# File 'lib/fselector/util.rb', line 107
+
+def split_me(delim_regex, quote_char="'")
+  d, q = delim_regex, quote_char
+  if not self.count(q) % 2 == 0
+    $stderr.puts "unpaired char of #{q} found, return nil"
+    return nil
+  end
+  self.split(/#{d.source} (?=(?:[^#{q}]* #{q} [^#{q}]* #{q})* [^#{q}]*$) /x)
+end
+
+
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/_index.html b/doc/_index.html new file mode 100644 index 0000000..d4f355d --- /dev/null +++ b/doc/_index.html @@ -0,0 +1,499 @@ + + + + + + Documentation by YARD 0.7.5 + + + + + + + + + + + + + + + + + + + + + + +

Documentation by YARD 0.7.5

+
+

Alphabetic Index

+ +

File Listing

+ + +
+

Namespace Listing A-Z

+ + + + + + + + +
+ + + + + + + + + + + + + + +
    +
  • E
  • +
      + +
    • + Ensemble + + (FSelector) + +
    • + +
    +
+ + + + + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + +
    +
  • T
  • +
      + +
    • + TScore + + (FSelector) + +
    • + +
    +
+ +
+ +
+ +
+ + + + + \ No newline at end of file diff --git a/doc/class_list.html b/doc/class_list.html new file mode 100644 index 0000000..4f9fe84 --- /dev/null +++ b/doc/class_list.html @@ -0,0 +1,47 @@ + + + + + + + + + + + + + + + + + + + + +
+

Class List

+ + + + +
+ + diff --git a/doc/css/common.css b/doc/css/common.css new file mode 100644 index 0000000..cf25c45 --- /dev/null +++ b/doc/css/common.css @@ -0,0 +1 @@ +/* Override this file with custom rules */ \ No newline at end of file diff --git a/doc/css/full_list.css b/doc/css/full_list.css new file mode 100644 index 0000000..3c03296 --- /dev/null +++ b/doc/css/full_list.css @@ -0,0 +1,55 @@ +body { + margin: 0; + font-family: "Lucida Sans", "Lucida Grande", Verdana, Arial, sans-serif; + font-size: 13px; + height: 101%; + overflow-x: hidden; +} + +h1 { padding: 12px 10px; padding-bottom: 0; margin: 0; font-size: 1.4em; } +.clear { clear: both; } +#search { position: absolute; right: 5px; top: 9px; padding-left: 24px; } +#content.insearch #search, #content.insearch #noresults { background: url(data:image/gif;base64,R0lGODlhEAAQAPYAAP///wAAAPr6+pKSkoiIiO7u7sjIyNjY2J6engAAAI6OjsbGxjIyMlJSUuzs7KamppSUlPLy8oKCghwcHLKysqSkpJqamvT09Pj4+KioqM7OzkRERAwMDGBgYN7e3ujo6Ly8vCoqKjY2NkZGRtTU1MTExDw8PE5OTj4+PkhISNDQ0MrKylpaWrS0tOrq6nBwcKysrLi4uLq6ul5eXlxcXGJiYoaGhuDg4H5+fvz8/KKiohgYGCwsLFZWVgQEBFBQUMzMzDg4OFhYWBoaGvDw8NbW1pycnOLi4ubm5kBAQKqqqiQkJCAgIK6urnJyckpKSjQ0NGpqatLS0sDAwCYmJnx8fEJCQlRUVAoKCggICLCwsOTk5ExMTPb29ra2tmZmZmhoaNzc3KCgoBISEiIiIgAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAACH/C05FVFNDQVBFMi4wAwEAAAAh/hpDcmVhdGVkIHdpdGggYWpheGxvYWQuaW5mbwAh+QQJCAAAACwAAAAAEAAQAAAHaIAAgoMgIiYlg4kACxIaACEJCSiKggYMCRselwkpghGJBJEcFgsjJyoAGBmfggcNEx0flBiKDhQFlIoCCA+5lAORFb4AJIihCRbDxQAFChAXw9HSqb60iREZ1omqrIPdJCTe0SWI09GBACH5BAkIAAAALAAAAAAQABAAAAdrgACCgwc0NTeDiYozCQkvOTo9GTmDKy8aFy+NOBA7CTswgywJDTIuEjYFIY0JNYMtKTEFiRU8Pjwygy4ws4owPyCKwsMAJSTEgiQlgsbIAMrO0dKDGMTViREZ14kYGRGK38nHguHEJcvTyIEAIfkECQgAAAAsAAAAABAAEAAAB2iAAIKDAggPg4iJAAMJCRUAJRIqiRGCBI0WQEEJJkWDERkYAAUKEBc4Po1GiKKJHkJDNEeKig4URLS0ICImJZAkuQAhjSi/wQyNKcGDCyMnk8u5rYrTgqDVghgZlYjcACTA1sslvtHRgQAh+QQJCAAAACwAAAAAEAAQAAAHZ4AAgoOEhYaCJSWHgxGDJCQARAtOUoQRGRiFD0kJUYWZhUhKT1OLhR8wBaaFBzQ1NwAlkIszCQkvsbOHL7Y4q4IuEjaqq0ZQD5+GEEsJTDCMmIUhtgk1lo6QFUwJVDKLiYJNUd6/hoEAIfkECQgAAAAsAAAAABAAEAAAB2iAAIKDhIWGgiUlh4MRgyQkjIURGRiGGBmNhJWHm4uen4ICCA+IkIsDCQkVACWmhwSpFqAABQoQF6ALTkWFnYMrVlhWvIKTlSAiJiVVPqlGhJkhqShHV1lCW4cMqSkAR1ofiwsjJyqGgQAh+QQJCAAAACwAAAAAEAAQAAAHZ4AAgoOEhYaCJSWHgxGDJCSMhREZGIYYGY2ElYebi56fhyWQniSKAKKfpaCLFlAPhl0gXYNGEwkhGYREUywag1wJwSkHNDU3D0kJYIMZQwk8MjPBLx9eXwuETVEyAC/BOKsuEjYFhoEAIfkECQgAAAAsAAAAABAAEAAAB2eAAIKDhIWGgiUlh4MRgyQkjIURGRiGGBmNhJWHm4ueICImip6CIQkJKJ4kigynKaqKCyMnKqSEK05StgAGQRxPYZaENqccFgIID4KXmQBhXFkzDgOnFYLNgltaSAAEpxa7BQoQF4aBACH5BAkIAAAALAAAAAAQABAAAAdogACCg4SFggJiPUqCJSWGgkZjCUwZACQkgxGEXAmdT4UYGZqCGWQ+IjKGGIUwPzGPhAc0NTewhDOdL7Ykji+dOLuOLhI2BbaFETICx4MlQitdqoUsCQ2vhKGjglNfU0SWmILaj43M5oEAOwAAAAAAAAAAAA==) no-repeat center left; } +#full_list { padding: 0; list-style: none; margin-left: 0; } +#full_list ul { padding: 0; } +#full_list li { padding: 5px; padding-left: 12px; margin: 0; font-size: 1.1em; list-style: none; } +#noresults { padding: 7px 12px; } +#content.insearch #noresults { margin-left: 7px; } +ul.collapsed ul, ul.collapsed li { display: none; } +ul.collapsed.search_uncollapsed { display: block; } +ul.collapsed.search_uncollapsed li { display: list-item; } +li a.toggle { cursor: default; position: relative; left: -5px; top: 4px; text-indent: -999px; width: 10px; height: 9px; margin-left: -10px; display: block; float: left; background: url(data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAASCAYAAABb0P4QAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAAK8AAACvABQqw0mAAAABx0RVh0U29mdHdhcmUAQWRvYmUgRmlyZXdvcmtzIENTM5jWRgMAAAAVdEVYdENyZWF0aW9uIFRpbWUAMy8xNC8wOeNZPpQAAAE2SURBVDiNrZTBccIwEEXfelIAHUA6CZ24BGaWO+FuzZAK4k6gg5QAdGAq+Bxs2Yqx7BzyL7Llp/VfzZeQhCTc/ezuGzKKnKSzpCxXJM8fwNXda3df5RZETlIt6YUzSQDs93sl8w3wBZxCCE10GM1OcWbWjB2mWgEH4Mfdyxm3PSepBHibgQE2wLe7r4HjEidpnXMYdQPKEMJcsZ4zs2POYQOcaPfwMVOo58zsAdMt18BuoVDPxUJRacELbXv3hUIX2vYmOUvi8C8ydz/ThjXrqKqqLbDIAdsCKBd+Wo7GWa7o9qzOQHVVVXeAbs+yHHCH4aTsaCOQqunmUy1yBUAXkdMIfMlgF5EXLo2OpV/c/Up7jG4hhHcYLgWzAZXUc2b2ixsfvc/RmNNfOXD3Q/oeL9axJE1yT9IOoUu6MGUkAAAAAElFTkSuQmCC) no-repeat bottom left; } +li.collapsed a.toggle { opacity: 0.5; cursor: default; background-position: top left; } +li { color: #888; cursor: pointer; } +li.deprecated { text-decoration: line-through; font-style: italic; } +li.r1 { background: #f0f0f0; } +li.r2 { background: #fafafa; } +li:hover { background: #ddd; } +li small:before { content: "("; } +li small:after { content: ")"; } +li small.search_info { display: none; } +a:link, a:visited { text-decoration: none; color: #05a; } +li.clicked { background: #05a; color: #ccc; } +li.clicked a:link, li.clicked a:visited { color: #eee; } +li.clicked a.toggle { opacity: 0.5; background-position: bottom right; } +li.collapsed.clicked a.toggle { background-position: top right; } +#search input { border: 1px solid #bbb; -moz-border-radius: 3px; -webkit-border-radius: 3px; } +#nav { margin-left: 10px; font-size: 0.9em; display: none; color: #aaa; } +#nav a:link, #nav a:visited { color: #358; } +#nav a:hover { background: transparent; color: #5af; } + +.frames #content h1 { margin-top: 0; } +.frames li { white-space: nowrap; cursor: normal; } +.frames li small { display: block; font-size: 0.8em; } +.frames li small:before { content: ""; } +.frames li small:after { content: ""; } +.frames li small.search_info { display: none; } +.frames #search { width: 170px; position: static; margin: 3px; margin-left: 10px; font-size: 0.9em; color: #888; padding-left: 0; padding-right: 24px; } +.frames #content.insearch #search { background-position: center right; } +.frames #search input { width: 110px; } +.frames #nav { display: block; } + +#full_list.insearch li { display: none; } +#full_list.insearch li.found { display: list-item; padding-left: 10px; } +#full_list.insearch li a.toggle { display: none; } +#full_list.insearch li small.search_info { display: block; } diff --git a/doc/css/style.css b/doc/css/style.css new file mode 100644 index 0000000..c8ff2bf --- /dev/null +++ b/doc/css/style.css @@ -0,0 +1,322 @@ +body { + padding: 0 20px; + font-family: "Lucida Sans", "Lucida Grande", Verdana, Arial, sans-serif; + font-size: 13px; +} +body.frames { padding: 0 5px; } +h1 { font-size: 25px; margin: 1em 0 0.5em; padding-top: 4px; border-top: 1px dotted #d5d5d5; } +h1.noborder { border-top: 0px; margin-top: 0; padding-top: 4px; } +h1.title { margin-bottom: 10px; } +h1.alphaindex { margin-top: 0; font-size: 22px; } +h2 { + padding: 0; + padding-bottom: 3px; + border-bottom: 1px #aaa solid; + font-size: 1.4em; + margin: 1.8em 0 0.5em; +} +h2 small { font-weight: normal; font-size: 0.7em; display: block; float: right; } +.clear { clear: both; } +.inline { display: inline; } +.inline p:first-child { display: inline; } +.docstring h1, .docstring h2, .docstring h3, .docstring h4 { padding: 0; border: 0; border-bottom: 1px dotted #bbb; } +.docstring h1 { font-size: 1.2em; } +.docstring h2 { font-size: 1.1em; } +.docstring h3, .docstring h4 { font-size: 1em; border-bottom: 0; padding-top: 10px; } +.summary_desc .object_link, .docstring .object_link { font-family: monospace; } +.rdoc-term { padding-right: 25px; font-weight: bold; } +.rdoc-list p { margin: 0; padding: 0; margin-bottom: 4px; } + +/* style for