Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Symbol column indexing #377

Merged
merged 61 commits into from
Oct 24, 2018
Merged

Symbol column indexing #377

merged 61 commits into from
Oct 24, 2018

Conversation

iblislin
Copy link
Collaborator

@iblislin iblislin commented Oct 6, 2018

ref #262

  • add deprecation
  • support getproperty
  • support Base.propertynames
  • Update doc

src/utilities.jl Outdated
find_dupes_index(A::Vector{Symbol}) =
@inbounds [i for i in eachindex(A) if A[i] ∈ A[1:i-1]]

@memoize function gen_colnames(n::Integer)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you seen a notable performance improvement when using @memoize?

Copy link
Collaborator Author

@iblislin iblislin Oct 6, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, this function is quite slow. (or my algo is bad, is there an better algo?

Orig:

julia> @btime TimeSeries.gen_colnames(5)
  899.902 ns (9 allocations: 384 bytes) 
5-element Array{Symbol,1}:
 :A
 :B
 :C
 :D
 :E

vs @memoize

julia> @btime TimeSeries.gen_colnames(5)
  92.643 ns (2 allocations: 32 bytes)
5-element Array{Symbol,1}:
 :A
 :B
 :C
 :D
 :E

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting. I'll take a look at the algorithm itself in more detail later, this is fine in the meantime I suppose.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hmm...
How does DataFrames.jl generate the default column names?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IIRC it's just x appended with sequential numbers, i.e. x1, x2, ...

test/combine.jl Outdated
@test_throws ErrorException merge(cl, op, colnames=[:a, :b, :c])

for mode ∈ [:inner, :left, :right, :outer]
@test merge(cl, ohlc[:High, :Low], mode, colnames=[:a, :b, :c]).colnames == [:a, :b, :c]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be the Julia standard 4 spaces for indentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@codecov-io
Copy link

codecov-io commented Oct 7, 2018

Codecov Report

Merging #377 into master will not change coverage.
The diff coverage is 100%.

Impacted file tree graph

@@           Coverage Diff           @@
##           master     #377   +/-   ##
=======================================
  Coverage   99.11%   99.11%           
=======================================
  Files           9        9           
  Lines         340      340           
=======================================
  Hits          337      337           
  Misses          3        3
Impacted Files Coverage Δ
src/combine.jl 98% <100%> (-0.44%) ⬇️
src/apply.jl 100% <100%> (ø) ⬆️
src/modify.jl 100% <100%> (ø) ⬆️
src/split.jl 100% <100%> (ø) ⬆️
src/readwrite.jl 100% <100%> (ø) ⬆️
src/utilities.jl 100% <100%> (ø) ⬆️
src/timearray.jl 98.03% <100%> (-0.2%) ⬇️
src/basemisc.jl 100% <100%> (ø) ⬆️
src/broadcast.jl 100% <100%> (ø) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 5ece205...1afaadd. Read the comment docs.

@iblislin
Copy link
Collaborator Author

iblislin commented Oct 7, 2018

What... the coverage is 99.x% ⁉️

@iblislin iblislin changed the title WIP: Symbol column indexing Symbol column indexing Oct 7, 2018
@iblislin
Copy link
Collaborator Author

iblislin commented Oct 7, 2018

ready for review.

@femtotrader
Copy link
Contributor

LGTM

@femtotrader
Copy link
Contributor

But doc should also be updated accordingly

@iblislin
Copy link
Collaborator Author

iblislin commented Oct 7, 2018

Ah, sure.
But I want to make another PRs for doc stuffs, including sort out docstring from current doc pages.

@femtotrader
Copy link
Contributor

As API is changing (access column using Symbol instead of String) I think it's always safer to document changes and after (in an other PR) improve doc.
Because we don't know how long it will last to create the second PR... and during this delay doc and code will be inconsistent.
That's just my opinion. But code LGTM! You did a great work!

src/timearray.jl Outdated
fs = fieldnames(T)

quote
(c ∈ $fs) ? getfield(ta, c) : ta[c]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If a user tries to call a column values, for example, this will be incredibly odd behavior. Either replace all field accesses within this package to use getfield explicitly, or don't support getproperty. (I'd vote for the latter.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay, I need more time work on this issue.
There are tons of statements like ta.values in the test cases.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, every use of ta.values would need to be changed to getfield(ta, :values).

Copy link
Collaborator Author

@iblislin iblislin Oct 9, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will change them to our helper function vaules(ta), timestamp(ta), colnames(ta) and meta(ta).

src/utilities.jl Outdated
if n == 1
cnames[d] = Symbol(cnames[d], "_$n")
else
s = cnames[d] |> string
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why change the function call to a pipe? It just makes it harder to read.

src/utilities.jl Outdated
findcol(ta::AbstractTimeSeries, s::Symbol) = findcol(colnames(ta), s)

@inline function findcol(cols::Vector{Symbol}, s::Symbol)
i = findfirst(isequal(s), cols)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Incorrect indentation

iblislin added a commit to JuliaQuant/MarketData.jl that referenced this pull request Oct 9, 2018
Since we are moving toward the symbol indexing.
Ref JuliaStats/TimeSeries.jl#377
iblislin added a commit to JuliaQuant/MarketData.jl that referenced this pull request Oct 11, 2018
* Deprecate the helper functions

Since we are moving toward the symbol indexing.
Ref JuliaStats/TimeSeries.jl#377

* Update NEWS.md

* travis: add 1.0 to build matrix

- update doc deploy script
@iblislin
Copy link
Collaborator Author

test cases and doc updated.

@iblislin
Copy link
Collaborator Author

Rebased. good to go?

rename(cl, "New Close")
rename(cl, ["New Close"])
rename(ohlc, ["New Open", "New High", "New Low", "New Close"])
rename(cl, :Close′)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the ' here intentional?

Copy link
Contributor

@femtotrader femtotrader Oct 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(not ') is used several times in https://github.com/JuliaStats/TimeSeries.jl/blob/master/src/broadcast.jl

I was surprised by the use of such a character also which can't be found easily on my AZERTY keyboard

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Those render exactly the same for me. Weird.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not naming simply :NewClose (or :Close2)?

What are the english name of , ' and `

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See screenshot to see how it's rendering here (France)
capture d ecran 2018-10-16 a 22 39 24

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure about the first one, the second is "apostrophe," and the third is "backtick" (assuming that's rendering for me properly).

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's :Close\prime<tab>

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not naming simply :NewClose (or :Close2)?

well, I personally love the unicode support in Julia.
I always write code (in my research project) like x = 1; x\prime<tab> = x + 42
or
x\vec<tab> ...

julia> x = 1
1

julia> x′ = x + 42
43

julia> function ϕ()
        42
       end 
ϕ (generic function with 1 method)

julia> :Close₂                                                   

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I see. I was on Linux before when it rendered like an apostrophe for me, but on macOS they're distinct.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ararslan maybe you should open an issue on your Linux distribution bug tracker because that's quite problematic to have 2 differents characters rendered the same.

## Fields getter functions

There are four field getter functions exported.
They are named as same as the field names.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Field names are effectively an implementation detail, so I don't know that it's worth mentioning that. Good to document these functions though.

Copy link
Collaborator Author

@iblislin iblislin Oct 17, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the field name is revealed to user already here:
http://juliastats.github.io/TimeSeries.jl/latest/timearray.html

So I added that section.

src/combine.jl Outdated
colnames = Symbol.(colnames)
end

meta = if _meta(ta1) == _meta(ta2) && meta isa Nothing
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

meta === nothing is typically preferred to meta isa Nothing. In particular, === makes the compiler happy. (Though isa is still better than == in this case.)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ha, okay 5d1dcc7

src/combine.jl Outdated
else
meta = meta
meta
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

src/utilities.jl Outdated
if n == 1
cnames[d] = Symbol(cnames[d], "_$n")
else
s = string(cnames[d])
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indentation

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ararslan
Copy link
Member

Looks good to me for the most part, just a few comments. Nice work here!

@iblislin
Copy link
Collaborator Author

One more thing to be discussed:
This warning will pop up during recompilation. Is it safe to ignore it ?

julia> using TimeSeries
[ Info: Recompiling stale cache file /home/iblis/.julia/compiled/v1.0/TimeSeries/dmqCl.ji for TimeSeries [9e3dc215-6440-5c97-bce1-76c03772f85e]
WARNING: eval from module Memoize to TimeSeries:
Expr(:const, Symbol("##gen_colnames_memoized_cache") = Expr(:call, :IdDict))
  ** incremental compilation may be broken for this module **

WARNING: eval from module Memoize to TimeSeries:
Expr(:const, Symbol("##carry_char_memoized_cache") = Expr(:call, :IdDict))
  ** incremental compilation may be broken for this module **

@iblislin
Copy link
Collaborator Author

I decided to drop Memoize.jl finally

@iblislin iblislin merged commit b98c968 into master Oct 24, 2018
@iblislin iblislin deleted the ib/symbol branch October 24, 2018 02:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants