forked from JuliaLang/julia
-
Notifications
You must be signed in to change notification settings - Fork 0
Statistical Programming
smc77 edited this page Mar 31, 2012
·
8 revisions
Language-level inspiration:
Packages for inspiration:
Here is a likely-incomplete list of early requirements to get to a stage where basic linear models could be easily built in Julia. Some are specific to statistical programming, while others are language-general.
- New data types that support
NA
. They might be calledIntData
,NumData
,BoolData
,StrData
, etc. Issue #470. - An updated testing framework to better allow test-driven development. Issue #8.
- A
FactorData
type, supporting optionally ordered enumerations withNA
s. - Either named arguments with defaults (e.g.,
f(a, b, q=7, x="hi")
) or some alternative approach to options to functions. Issue #485. - A
DataFrame
(or maybeDataTable
is a better name) type, of heterogeneous *Data columns, complete with rownames and colnames. We should find out more about what John Chambers thinks aboutdata.frame
s in S/R and how they should be done better. We should also look at thedata.table
implementation and also at what Pandas is doing. - The power of reshape2 is severely limited by the asymmetric treatment of row and column variables in a
data.drame
. New data type should treat column and row variables symmetrically, and may be a better name would bedata.matrix
or evendata.array
. A related limitation of R's data.frame is that values in a column must have same type. Pandas corrected for this issue in the implementation of the data frame by have symmetrical treatment for rows and columns. - A deep dive into the core libraries of R and Pandas and maybe other languages to learn from previous mistakes and develop a clean, modern, orthogonal set of methods for data manipulation. For the love of god, please let Julia not have a broken
sample()
function like R's... - Formulas will probably be explicitly quoted expression in Julia, ala
lm(:(y ~ x), dat)
. So we just need a set of conventions (and maybe an extra operator or two). -
csvread()
anddlmread()
only generate matrices. There should be similar functions that read intoDataFrame
s, as well as output them. -
model.matrix
and related equivalent methods on formulas. - a pure-julia implementation of
lm()
. - Packages/Libraries/Gems/whatever.
- Date/Time types, inspired by Joda Time (Java) and Lubridate (R)
- ggplot like functionality in a core library
Please add or edit this list as thinking evolves!