-
Notifications
You must be signed in to change notification settings - Fork 5
/
Copy pathindex.html
369 lines (280 loc) · 28.9 KB
/
index.html
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
<title>
File: README
— FSelector Documentation
</title>
<link rel="stylesheet" href="css/style.css" type="text/css" media="screen" charset="utf-8" />
<link rel="stylesheet" href="css/common.css" type="text/css" media="screen" charset="utf-8" />
<script type="text/javascript" charset="utf-8">
relpath = '';
if (relpath != '') relpath += '/';
</script>
<script type="text/javascript" charset="utf-8" src="js/jquery.js"></script>
<script type="text/javascript" charset="utf-8" src="js/app.js"></script>
</head>
<body>
<script type="text/javascript" charset="utf-8">
if (window.top.frames.main) document.body.className = 'frames';
</script>
<div id="header">
<div id="menu">
<a href="_index.html" title="Index">Index</a> »
<span class="title">File: README</span>
<div class="noframes"><span class="title">(</span><a href="." target="_top">no frames</a><span class="title">)</span></div>
</div>
<div id="search">
<a id="class_list_link" href="#">Class List</a>
<a id="method_list_link" href="#">Method List</a>
<a id="file_list_link" href="#">File List</a>
</div>
<div class="clear"></div>
</div>
<iframe id="search_frame"></iframe>
<div id="content"><div id='filecontents'><h1>FSelector: a Ruby gem for feature selection</h1>
<p><strong>Home</strong>: <a href="https://rubygems.org/gems/fselector">https://rubygems.org/gems/fselector</a><br>
<strong>Source Code</strong>: <a href="https://github.com/need47/fselector">https://github.com/need47/fselector</a><br>
<strong>Documentation</strong>: <a href="http://rubydoc.info/gems/fselector/frames">http://rubydoc.info/gems/fselector/frames</a><br>
<strong>Publication</strong>: <a href="http://bioinformatics.oxfordjournals.org/content/28/21/2851">Bioinformatics, 2012, 28, 2851-2852</a><br>
<strong>Author</strong>: Tiejun Cheng<br>
<strong>Email</strong>: <a href="mailto:[email protected]">[email protected]</a><br>
<strong>Copyright</strong>: 2012<br>
<strong>License</strong>: MIT License<br>
<strong>Latest Version</strong>: 1.4.0<br>
<strong>Release Date</strong>: 2012-11-05</p>
<h2>Synopsis</h2>
<p>FSelector is a Ruby gem that aims to integrate various feature
selection algorithms and related functions into one single
package. Welcome to contact me (<a href="mailto:[email protected]">[email protected]</a>) if you'd like to
contribute your own algorithms or report a bug. FSelector allows user
to perform feature selection by using either a single algorithm or an
ensemble of multiple algorithms, and other common tasks including
normalization and discretization on continuous data, as well as replace
missing feature values with certain criterion. FSelector acts on a
full-feature data set in either CSV, LibSVM or WEKA file format and
outputs a reduced data set with only selected subset of features, which
can later be used as the input for various machine learning softwares
such as LibSVM and WEKA. FSelector, as a collection of filter methods,
does not implement any classifier like support vector machines or
random forest. Check below for a list of FSelector's features,
<a href="file.ChangeLog.html" title="ChangeLog">ChangeLog</a> for updates, and <a href="file.HowToContribute.html" title="HowToContribute">HowToContribute</a> if you want
to contribute.</p>
<h2>Feature List</h2>
<p><strong>1. supported input/output file types</strong></p>
<ul>
<li>csv</li>
<li>libsvm</li>
<li>weka ARFF</li>
<li>on-line dataset in one of the above three formats (read only)</li>
<li>random data (read only, for test purpose)</li>
</ul>
<p><strong>2. available feature selection/ranking algorithms</strong></p>
<pre class="code ruby"><code>algorithm shortcut algo_type applicability feature_type
--------------------------------------------------------------------------------------------------
Accuracy Acc weighting multi-class discrete
AccuracyBalanced Acc2 weighting multi-class discrete
BiNormalSeparation BNS weighting multi-class discrete
CFS_d CFS_d searching multi-class discrete
ChiSquaredTest CHI weighting multi-class discrete
CorrelationCoefficient CC weighting multi-class discrete
DocumentFrequency DF weighting multi-class discrete
F1Measure F1 weighting multi-class discrete
FishersExactTest FET weighting multi-class discrete
FastCorrelationBasedFilter FCBF searching multi-class discrete
GiniIndex GI weighting multi-class discrete
GMean GM weighting multi-class discrete
GSSCoefficient GSS weighting multi-class discrete
InformationGain IG weighting multi-class discrete
INTERACT INTERACT searching multi-class discrete
JMeasure JM weighting multi-class discrete
KLDivergence KLD weighting multi-class discrete
MatthewsCorrelationCoefficient MCC, PHI weighting multi-class discrete
McNemarsTest MNT weighting multi-class discrete
OddsRatio OR weighting multi-class discrete
OddsRatioNumerator ORN weighting multi-class discrete
PhiCoefficient PHI weighting multi-class discrete
Power Power weighting multi-class discrete
Precision Precision weighting multi-class discrete
ProbabilityRatio PR weighting multi-class discrete
Recall Recall weighting multi-class discrete
Relief_d Relief_d weighting two-class discrete
ReliefF_d ReliefF_d weighting multi-class discrete
Sensitivity SN, Recall weighting multi-class discrete
Specificity SP weighting multi-class discrete
SymmetricalUncertainty SU weighting multi-class discrete
BetweenWithinClassesSumOfSquare BSS_WSS weighting multi-class continuous
CFS_c CFS_c searching multi-class continuous
FTest FT weighting multi-class continuous
KS_CCBF KS_CCBF searching multi-class continuous
KSTest KST weighting two-class continuous
PMetric PM weighting two-class continuous
Relief_c Relief_c weighting two-class continuous
ReliefF_c ReliefF_c weighting multi-class continuous
TScore TS weighting two-class continuous
WilcoxonRankSum WRS weighting two-class continuous
LasVegasFilter LVF searching multi-class discrete, continuous, mixed
LasVegasIncremental LVI searching multi-class discrete, continuous, mixed
Random Rand weighting multi-class discrete, continuous, mixed
RandomSubset RandS searching multi-class discrete, continuous, mixed
</code></pre>
<p><strong>note for feature selection interface:</strong><br>
there are two types of filter algorithms: filter_by_feature_weighting and filter_by_feature_searching </p>
<ul>
<li>for former: use either <strong>select_feature_by_score!</strong> or <strong>select_feature_by_rank!</strong><br></li>
<li>for latter: use <strong>select_feature!</strong></li>
</ul>
<p><strong>3. feature selection approaches</strong></p>
<ul>
<li>by a single algorithm</li>
<li>by multiple algorithms in a tandem manner</li>
<li>by multiple algorithms in an ensemble manner (share the same feature selection interface as single algorithm)</li>
</ul>
<p><strong>4. availabe normalization and discretization algorithms for continuous feature</strong></p>
<pre class="code ruby"><code>algorithm note
---------------------------------------------------------------------------------------
normalize_by_log! normalize by logarithmic transformation
normalize_by_min_max! normalize by scaling into [min, max]
normalize_by_zscore! normalize by converting into zscore
discretize_by_equal_width! discretize by equal width among intervals
discretize_by_equal_frequency! discretize by equal frequency among intervals
discretize_by_ChiMerge! discretize by ChiMerge algorithm
discretize_by_Chi2! discretize by Chi2 algorithm
discretize_by_MID! discretize by Multi-Interval Discretization algorithm
discretize_by_TID! discretize by Three-Interval Discretization algorithm
</code></pre>
<p><strong>5. availabe algorithms for replacing missing feature values</strong></p>
<pre class="code ruby"><code>algorithm note feature_type
---------------------------------------------------------------------------------------------
replace_by_fixed_value! replace by a fixed value discrete, continuous
replace_by_mean_value! replace by mean feature value continuous
replace_by_median_value! replace by median feature value continuous
replace_by_knn_value! replace by weighted knn feature value continuous
replace_by_most_seen_value! replace by most seen feature value discrete
</code></pre>
<h2>Installing</h2>
<p>To install FSelector, use the following command:</p>
<pre class="code ruby"><code>$ gem install fselector
</code></pre>
<p><strong>note:</strong> From version 0.5.0, FSelector uses the RinRuby gem (<a href="http://rinruby.ddahl.org">http://rinruby.ddahl.org</a>)
as a seemless bridge to access the statistical routines in the R package (<a href="http://www.r-project.org">http://www.r-project.org</a>),
which will greatly expand the inclusion of algorithms to FSelector, especially for those relying
on statistical test. To this end, please pre-install the R package. RinRuby should have been
auto-installed with FSelector via the above command.</p>
<h2>Usage</h2>
<p><strong>1. feature selection by a single algorithm</strong></p>
<pre class="code ruby"><code><span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'>fselector</span><span class='tstring_end'>'</span></span>
<span class='comment'># use InformationGain (IG) as a feature selection algorithm
</span><span class='id identifier rubyid_r1'>r1</span> <span class='op'>=</span> <span class='const'>FSelector</span><span class='op'>::</span><span class='const'>IG</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span>
<span class='comment'># read from random data (or csv, libsvm, weka ARFF file)
</span><span class='comment'># no. of samples: 100
</span><span class='comment'># no. of classes: 2
</span><span class='comment'># no. of features: 15
</span><span class='comment'># no. of possible values for each feature: 3
</span><span class='comment'># allow missing values: true
</span><span class='id identifier rubyid_r1'>r1</span><span class='period'>.</span><span class='id identifier rubyid_data_from_random'>data_from_random</span><span class='lparen'>(</span><span class='int'>100</span><span class='comma'>,</span> <span class='int'>2</span><span class='comma'>,</span> <span class='int'>15</span><span class='comma'>,</span> <span class='int'>3</span><span class='comma'>,</span> <span class='kw'>true</span><span class='rparen'>)</span>
<span class='comment'># number of features before feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>"</span><span class='tstring_content'> # features (before): </span><span class='tstring_end'>"</span></span><span class='op'>+</span> <span class='id identifier rubyid_r1'>r1</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
<span class='comment'># select the top-ranked features with scores >0.01
</span><span class='id identifier rubyid_r1'>r1</span><span class='period'>.</span><span class='id identifier rubyid_select_feature_by_score!'>select_feature_by_score!</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'>>0.01</span><span class='tstring_end'>'</span></span><span class='rparen'>)</span>
<span class='comment'># number of features after feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>"</span><span class='tstring_content'> # features (after): </span><span class='tstring_end'>"</span></span><span class='op'>+</span> <span class='id identifier rubyid_r1'>r1</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
<span class='comment'># you can also use a second alogirithm for further feature selection
</span><span class='comment'># e.g. use the ChiSquaredTest (CHI) with Yates' continuity correction
</span><span class='comment'># initialize from r1's data
</span><span class='id identifier rubyid_r2'>r2</span> <span class='op'>=</span> <span class='const'>FSelector</span><span class='op'>::</span><span class='const'>CHI</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='symbol'>:yates</span><span class='comma'>,</span> <span class='id identifier rubyid_r1'>r1</span><span class='period'>.</span><span class='id identifier rubyid_get_data'>get_data</span><span class='rparen'>)</span>
<span class='comment'># number of features before feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>"</span><span class='tstring_content'> # features (before): </span><span class='tstring_end'>"</span></span><span class='op'>+</span> <span class='id identifier rubyid_r2'>r2</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
<span class='comment'># select the top-ranked 3 features
</span><span class='id identifier rubyid_r2'>r2</span><span class='period'>.</span><span class='id identifier rubyid_select_feature_by_rank!'>select_feature_by_rank!</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'><=3</span><span class='tstring_end'>'</span></span><span class='rparen'>)</span>
<span class='comment'># number of features after feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>"</span><span class='tstring_content'> # features (after): </span><span class='tstring_end'>"</span></span><span class='op'>+</span> <span class='id identifier rubyid_r2'>r2</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
<span class='comment'># save data to standard ouput as a weka ARFF file (sparse format)
</span><span class='comment'># with selected features only
</span><span class='id identifier rubyid_r2'>r2</span><span class='period'>.</span><span class='id identifier rubyid_data_to_weka'>data_to_weka</span><span class='lparen'>(</span><span class='symbol'>:stdout</span><span class='comma'>,</span> <span class='symbol'>:sparse</span><span class='rparen'>)</span>
</code></pre>
<p><strong>2. feature selection by an ensemble of multiple feature selectors</strong></p>
<pre class="code ruby"><code><span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'>fselector</span><span class='tstring_end'>'</span></span>
<span class='comment'># example 1
</span><span class='comment'>#
</span>
<span class='comment'># creating an ensemble of feature selectors by using
</span><span class='comment'># a single feature selection algorithm (INTERACT)
</span><span class='comment'># by instance perturbation (e.g. random sampling)
</span>
<span class='comment'># test for the type of feature subset selection algorithms
</span><span class='id identifier rubyid_r'>r</span> <span class='op'>=</span> <span class='const'>FSelector</span><span class='op'>::</span><span class='const'>INTERACT</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='float'>0.0001</span><span class='rparen'>)</span>
<span class='comment'># an ensemble of 40 feature selectors with 90% data by random sampling
</span><span class='id identifier rubyid_re'>re</span> <span class='op'>=</span> <span class='const'>FSelector</span><span class='op'>::</span><span class='const'>EnsembleSingle</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='id identifier rubyid_r'>r</span><span class='comma'>,</span> <span class='int'>40</span><span class='comma'>,</span> <span class='float'>0.90</span><span class='comma'>,</span> <span class='symbol'>:random_sampling</span><span class='rparen'>)</span>
<span class='comment'># read SPECT data set (under the test/ directory)
</span><span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_data_from_csv'>data_from_csv</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'>test/SPECT_train.csv</span><span class='tstring_end'>'</span></span><span class='rparen'>)</span>
<span class='comment'># number of features before feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'> # features (before): </span><span class='tstring_end'>'</span></span> <span class='op'>+</span> <span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
<span class='comment'># only features with above average count among ensemble are selected
</span><span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_select_feature!'>select_feature!</span>
<span class='comment'># number of features after feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'> # features before (after): </span><span class='tstring_end'>'</span></span> <span class='op'>+</span> <span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
<span class='comment'># example 2
</span><span class='comment'>#
</span>
<span class='comment'># creating an ensemble of feature selectors by using
</span><span class='comment'># two feature selection algorithms: InformationGain (IG) and Relief_d.
</span><span class='comment'># note: can be 2+ algorithms, as long as they are of the same type,
</span><span class='comment'># either filter_by_feature_weighting or filter_by_feature_searching
</span>
<span class='comment'># test for the type of feature weighting algorithms
</span><span class='id identifier rubyid_r1'>r1</span> <span class='op'>=</span> <span class='const'>FSelector</span><span class='op'>::</span><span class='const'>IG</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span>
<span class='id identifier rubyid_r2'>r2</span> <span class='op'>=</span> <span class='const'>FSelector</span><span class='op'>::</span><span class='const'>Relief_d</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='int'>10</span><span class='rparen'>)</span>
<span class='comment'># an ensemble of two feature selectors
</span><span class='id identifier rubyid_re'>re</span> <span class='op'>=</span> <span class='const'>FSelector</span><span class='op'>::</span><span class='const'>EnsembleMultiple</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span><span class='lparen'>(</span><span class='id identifier rubyid_r1'>r1</span><span class='comma'>,</span> <span class='id identifier rubyid_r2'>r2</span><span class='rparen'>)</span>
<span class='comment'># read random discrete data (containing missing value)
</span><span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_data_from_random'>data_from_random</span><span class='lparen'>(</span><span class='int'>100</span><span class='comma'>,</span> <span class='int'>2</span><span class='comma'>,</span> <span class='int'>15</span><span class='comma'>,</span> <span class='int'>3</span><span class='comma'>,</span> <span class='kw'>true</span><span class='rparen'>)</span>
<span class='comment'># replace missing value because Relief_d
</span><span class='comment'># does not allow missing value
</span><span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_replace_by_most_seen_value!'>replace_by_most_seen_value!</span>
<span class='comment'># number of features before feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'> # features (before): </span><span class='tstring_end'>'</span></span> <span class='op'>+</span> <span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
<span class='comment'># based on the max feature score (z-score standardized) among
</span><span class='comment'># an ensemble of feature selectors
</span><span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_ensemble_by_score'>ensemble_by_score</span><span class='lparen'>(</span><span class='symbol'>:by_max</span><span class='comma'>,</span> <span class='symbol'>:by_zscore</span><span class='rparen'>)</span>
<span class='comment'># select the top-ranked 3 features
</span><span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_select_feature_by_rank!'>select_feature_by_rank!</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'><=3</span><span class='tstring_end'>'</span></span><span class='rparen'>)</span>
<span class='comment'># number of features after feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'> # features (after): </span><span class='tstring_end'>'</span></span> <span class='op'>+</span> <span class='id identifier rubyid_re'>re</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
</code></pre>
<p><strong>3. feature selection after discretization</strong></p>
<pre class="code ruby"><code><span class='id identifier rubyid_require'>require</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'>fselector</span><span class='tstring_end'>'</span></span>
<span class='comment'># the Information Gain (IG) algorithm requires data with discrete feature
</span><span class='id identifier rubyid_r'>r</span> <span class='op'>=</span> <span class='const'>FSelector</span><span class='op'>::</span><span class='const'>IG</span><span class='period'>.</span><span class='id identifier rubyid_new'>new</span>
<span class='comment'># but the Iris data set contains continuous features
</span><span class='id identifier rubyid_r'>r</span><span class='period'>.</span><span class='id identifier rubyid_data_from_url'>data_from_url</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'>http://repository.seasr.org/Datasets/UCI/arff/iris.arff</span><span class='tstring_end'>'</span></span><span class='comma'>,</span> <span class='symbol'>:weka</span><span class='rparen'>)</span>
<span class='comment'># let's first discretize it by ChiMerge algorithm at alpha=0.10
</span><span class='comment'># then perform feature selection as usual
</span><span class='id identifier rubyid_r'>r</span><span class='period'>.</span><span class='id identifier rubyid_discretize_by_ChiMerge!'>discretize_by_ChiMerge!</span><span class='lparen'>(</span><span class='float'>0.10</span><span class='rparen'>)</span>
<span class='comment'># number of features before feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'> # features (before): </span><span class='tstring_end'>'</span></span> <span class='op'>+</span> <span class='id identifier rubyid_r'>r</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
<span class='comment'># select the top-ranked feature
</span><span class='id identifier rubyid_r'>r</span><span class='period'>.</span><span class='id identifier rubyid_select_feature_by_rank!'>select_feature_by_rank!</span><span class='lparen'>(</span><span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'><=1</span><span class='tstring_end'>'</span></span><span class='rparen'>)</span>
<span class='comment'># number of features after feature selection
</span><span class='id identifier rubyid_puts'>puts</span> <span class='tstring'><span class='tstring_beg'>'</span><span class='tstring_content'> # features (after): </span><span class='tstring_end'>'</span></span> <span class='op'>+</span> <span class='id identifier rubyid_r'>r</span><span class='period'>.</span><span class='id identifier rubyid_get_features'>get_features</span><span class='period'>.</span><span class='id identifier rubyid_size'>size</span><span class='period'>.</span><span class='id identifier rubyid_to_s'>to_s</span>
</code></pre>
<p><strong>4. see more examples test_*.rb under the test/ directory</strong></p>
<h2>How to contribute</h2>
<p>check <a href="file.HowToContribute.html" title="HowToContribute">HowToContribute</a> to see how to write your own feature selection algorithms and/or make contribution to FSelector.</p>
<h2>Change Log</h2>
<p>A <a href="file.ChangeLog.html" title="ChangeLog">ChangeLog</a> is available from version 0.5.0 and upward to refelect
what's new and what's changed.</p>
<h2>Copyright</h2>
<p>FSelector © 2012 by <a href="mailto:[email protected]">Tiejun Cheng</a>.
FSelector is licensed under the MIT license. Please see the <a href="file.LICENSE.html" title="LICENSE">LICENSE</a> for
more information.</p>
</div></div>
<div id="footer">
Generated on Mon Nov 5 11:19:43 2012 by
<a href="http://yardoc.org" title="Yay! A Ruby Documentation Tool" target="_parent">yard</a>
0.7.5 (ruby-1.9.3).
</div>
</body>
</html>