Skip to content

Commit

Permalink
Merge pull request #283 from PolyMathOrg/add_shuffling
Browse files Browse the repository at this point in the history
Add shuffling
  • Loading branch information
jecisc authored Dec 14, 2023
2 parents f085adf + c4df4c1 commit a37feeb
Show file tree
Hide file tree
Showing 17 changed files with 542 additions and 404 deletions.
2 changes: 1 addition & 1 deletion src/DataFrame-Tests/DataFrameAggrGroupTest.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Class {
#instVars : [
'df'
],
#category : #'DataFrame-Tests-Core'
#category : 'DataFrame-Tests-Core'
}

{ #category : #running }
Expand Down
2 changes: 1 addition & 1 deletion src/DataFrame-Tests/DataFrameHeadTailTest.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Class {
'df',
'series'
],
#category : #'DataFrame-Tests-Core'
#category : 'DataFrame-Tests-Core'
}

{ #category : #running }
Expand Down
2 changes: 1 addition & 1 deletion src/DataFrame-Tests/DataFrameInternalTest.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Class {
#instVars : [
'df'
],
#category : #'DataFrame-Tests-Core'
#category : 'DataFrame-Tests-Core'
}

{ #category : #running }
Expand Down
2 changes: 1 addition & 1 deletion src/DataFrame-Tests/DataFrameStatsTest.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Class {
#instVars : [
'df'
],
#category : #'DataFrame-Tests-Core'
#category : 'DataFrame-Tests-Core'
}

{ #category : #running }
Expand Down
83 changes: 83 additions & 0 deletions src/DataFrame-Tests/DataFrameTest.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -7,6 +7,62 @@ Class {
#category : #'DataFrame-Tests-Core'
}

{ #category : #private }
DataFrameTest >> expectedShuffledDataFrameWithSeedOne [
"In theory, shuffling an array with a fixed random seed should produce the same result across different versions of Pharo and the same implementation of the random number generator. The purpose of using a fixed seed is to make the random number generation deterministic, meaning that given the same seed, the sequence of random numbers generated will be the same every time.
There is a difference in the RNG used in Pharo 12, which uses the primitive 231, and the previous Pharo versions, which used a native implementation in privateNextSeed.
The change was introduced in this commit: https://github.com/pharo-project/pharo/commit/bf22496dbd0996ee470c9f85a7cc076e01dff57f
So we answer different data frames as the result ordering changes because of this implementation change"

| expected |
expected := SystemVersion current major >= 12
ifTrue: [
(DataFrame withRows: #(
('Barcelona' 1.609 true)
('London' 8.788 false)
('Dubai' 2.789 true))
rowNames: #( 'A' 'C' 'B'))
yourself ]
ifFalse: [
(DataFrame withRows: #(
('Dubai' 2.789 true)
('London' 8.788 false)
('Barcelona' 1.609 true))
rowNames: #('B' 'C' 'A'))
yourself ].
expected columnNames: #( 'City' 'Population' 'BeenThere' ).
^ expected

]

{ #category : #private }
DataFrameTest >> expectedShuffledDataFrameWithSeedTwo [
"In theory, shuffling an array with a fixed random seed should produce the same result across different versions of Pharo and the same implementation of the random number generator. The purpose of using a fixed seed is to make the random number generation deterministic, meaning that given the same seed, the sequence of random numbers generated will be the same every time.
There is a difference in the RNG used in Pharo 12, which uses the primitive 231, and the previous Pharo versions, which used a native implementation in privateNextSeed.
The change was introduced in this commit: https://github.com/pharo-project/pharo/commit/bf22496dbd0996ee470c9f85a7cc076e01dff57f
So we answer different data frames as the result ordering changes because of this implementation change"

| expected |
expected := SystemVersion current major >= 12
ifTrue: [
(DataFrame withRows: #(
('Dubai' 2.789 true)
('Barcelona' 1.609 true)
('London' 8.788 false) )
rowNames: #('B' 'A' 'C'))
yourself ]
ifFalse: [
(DataFrame withRows: #(
('London' 8.788 false)
('Barcelona' 1.609 true)
('Dubai' 2.789 true))
rowNames: #('C' 'A' 'B'))
yourself ].
expected columnNames: #( 'City' 'Population' 'BeenThere' ).
^ expected

]

{ #category : #running }
DataFrameTest >> setUp [

Expand Down Expand Up @@ -920,6 +976,19 @@ DataFrameTest >> testColumns [
self assert: df columns equals: expectedCollection
]

{ #category : #tests }
DataFrameTest >> testColumnsAllBut [

| expectedDataFrame |
expectedDataFrame := DataFrame withRows: #( #( 'Barcelona' 1.609 ) #( 'Dubai' 2.789 ) #( 'London' 8.788 ) ).
expectedDataFrame rowNames: #( 'A' 'B' 'C' ).
expectedDataFrame columnNames: #( 'City' 'Population').

self
assert: (df columnsAllBut: #(BeenThere))
equals: expectedDataFrame
]

{ #category : #tests }
DataFrameTest >> testColumnsAt [

Expand Down Expand Up @@ -5164,6 +5233,20 @@ DataFrameTest >> testSelectEmptyDataFrame [
self assert: actual equals: expected
]

{ #category : #tests }
DataFrameTest >> testShuffledWithSeed [

| expected |

expected := self expectedShuffledDataFrameWithSeedOne.
self assert: (df shuffleWithSeed: 1) equals: expected.

expected := self expectedShuffledDataFrameWithSeedTwo.
self assert: (df shuffleWithSeed: 2) equals: expected.


]

{ #category : #tests }
DataFrameTest >> testSortBy [

Expand Down
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
Class {
#name : #DataPearsonCorrelationMethodTest,
#superclass : #TestCase,
#category : #'DataFrame-Tests-Math'
#category : 'DataFrame-Tests-Math'
}

{ #category : #tests }
Expand Down
2 changes: 1 addition & 1 deletion src/DataFrame-Tests/DataSeriesTest.class.st
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ Class {
'series',
'keyArray'
],
#category : #'DataFrame-Tests-Core'
#category : 'DataFrame-Tests-Core'
}

{ #category : #running }
Expand Down
8 changes: 4 additions & 4 deletions src/DataFrame/Array.extension.st
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Extension { #name : #Array }
Extension { #name : 'Array' }

{ #category : #'*DataFrame' }
{ #category : '*DataFrame' }
Array >> calculateDataType [

| types |
Expand All @@ -18,7 +18,7 @@ Array >> calculateDataType [
^ UndefinedObject
]

{ #category : #'*DataFrame' }
{ #category : '*DataFrame' }
Array >> leastCommonSuperclassOf: firstClass and: secondClass [
"Determines the closest element of class hierarchy which is the common ancestor of two given classes"

Expand All @@ -40,7 +40,7 @@ Array >> leastCommonSuperclassOf: firstClass and: secondClass [
^ Object
]

{ #category : #'*DataFrame' }
{ #category : '*DataFrame' }
Array >> sortIfPossible [
"Sort if possible"

Expand Down
4 changes: 2 additions & 2 deletions src/DataFrame/Behavior.extension.st
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
Extension { #name : #Behavior }
Extension { #name : 'Behavior' }

{ #category : #'*DataFrame-Core-Base' }
{ #category : '*DataFrame-Core-Base' }
Behavior >> inheritsFromOrEqualTo: aClass [
"Answer whether the argument, aClass, is equal to the receiver or belongs to its superclass chain."

Expand Down
14 changes: 7 additions & 7 deletions src/DataFrame/Collection.extension.st
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
Extension { #name : #Collection }
Extension { #name : 'Collection' }

{ #category : #'*DataFrame-Core-Base' }
{ #category : '*DataFrame-Core-Base' }
Collection >> ** arg [

^ self raisedTo: arg
]

{ #category : #'*DataFrame' }
{ #category : '*DataFrame' }
Collection >> asDataFrame [

| numberOfRows numberOfColumns dataFrame |
Expand All @@ -31,26 +31,26 @@ Collection >> asDataFrame [
^ dataFrame
]

{ #category : #'*DataFrame-Core-Base' }
{ #category : '*DataFrame-Core-Base' }
Collection >> asDataSeries [

^ DataSeries newFrom: self
]

{ #category : #'*DataFrame-Core-Base' }
{ #category : '*DataFrame-Core-Base' }
Collection >> closeTo: aCollection [

^ (self - aCollection) inject: true into: [ :accum :each |
accum and: (each closeTo: 0) ]
]

{ #category : #'*DataFrame-Core-Base' }
{ #category : '*DataFrame-Core-Base' }
Collection >> variance [

^ self stdev squared
]

{ #category : #'*DataFrame' }
{ #category : '*DataFrame' }
Collection >> withSeries: aDataSeries collect: twoArgBlock [
"Collect and return the result of evaluating twoArgBlock with corresponding elements from this collection and aDataSeries."
| result |
Expand Down
10 changes: 6 additions & 4 deletions src/DataFrame/DataCorrelationMethod.class.st
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
Class {
#name : #DataCorrelationMethod,
#superclass : #Object,
#category : #'DataFrame-Math'
#name : 'DataCorrelationMethod',
#superclass : 'Object',
#category : 'DataFrame-Math',
#package : 'DataFrame',
#tag : 'Math'
}

{ #category : #comparing }
{ #category : 'comparing' }
DataCorrelationMethod class >> between: x and: y [
"Calcualte the correlation coefficient between two data series"
self subclassResponsibility
Expand Down
Loading

0 comments on commit a37feeb

Please sign in to comment.