`size`, `reshape` not consistent #22665

bjarthur · 2017-07-03T12:40:41Z

if size returns a tuple of an arrays dimensions, then why is the function which changes an array's dimensions not called resize? size and reshape are borrowed from matlab. numpy uses shape and reshape. the latter make a lot more sense to me, despite being a recovering matlab user. shall we consider changing? happy to submit a PR if there is a consensus. some discussion here.

The text was updated successfully, but these errors were encountered:

Gnimuc · 2017-07-03T14:10:32Z

-1 since reshape(to make into a different shape) is a more strict operation than resize(to alter the size of something):

julia> reshape(rand(2,2), 3, 1)
ERROR: DimensionMismatch("new dimensions (3,1) must be consistent with array size 4")

julia> resize!([1,2,3,4], 3)
3-element Array{Int64,1}:
 1
 2
 3

andyferris · 2017-07-04T10:28:35Z

If we had multidimensional resizable (as in re-length-able) arrays, then a multidimensional resize method would indeed make some sense. But we can't change their "total" size (length), only their shape.

If we decide to go with non-length-changing arrays, then we can still support reshape, as a view...

timholy · 2017-07-04T11:16:44Z

I think the core suggestion here is to rename size to shape. This makes sense to me. size is less important anyway given that indices is what really matters 😄.

timholy · 2017-07-04T11:18:26Z

Linking to #20402 so this gets more eyes before anyone goes to great effort.

yuyichao · 2017-07-04T11:38:56Z

shape(a) sounds fine, not so much for shape(a, 1) though....

bjarthur · 2017-07-04T11:53:08Z

renaming size to shape is not just the core suggestion, it's the only suggestion. thanks @timholy for clarifying my verbosity.

seems to me this would be relatively straightforward. just a cut and paste with a deprecation, no?

tknopp · 2017-07-04T12:56:26Z

If we think this to the end then length should be renamed to size. Since resize! modifies the array size.

yuyichao has a point though size(a,d) gives us the size of the d-th dimension.

Personally I don't think all this is worth it. Naming is something that at some point gets subjective and IMHO reaching 100% consistency is close to impossible.

StefanKarpinski · 2017-07-11T14:09:25Z

If we were to move to shape instead of size we should at the same time move to an API where the range of valid values is returned. At which point, we'd probably also want a function that returns the length of each of those index collections – maybe we could call it size.

bjarthur · 2017-07-11T14:56:05Z

@StefanKarpinski we already have indices, which returns the range of valid values.

i'm surprised to see so much resistance to this idea. is it just that change is bad / a lot of work and this syntax is too entrenched? we're not at 1.0 yet.

would there be less opposition to change reshape to resize? then we'd have size, resize, and resize!. methods to the latter could eventually be added to handle N-D arrays. i'd be happy with that too. anything to make the nomenclature consistent.

JeffBezanson · 2017-07-11T15:34:30Z

There are two key properties here: the number of elements, and how they are factored into dimensions. size is a bit ambiguous and could refer to either, but shape definitely only refers to the second property (do we agree shape(a, 1) probably doesn't make sense?). The defining feature of reshape is that it can only change the factoring into dimensions, not the number of elements. So I think its name should stay. There is never going to be a reshape!, since that would require changing the type of an array (we also want to avoid mutating dimension sizes in general).

One issue we're up against here is that length is a standard term for the number of items in an array, but relength is not a word. Sometimes you just have to live with things like that. Maybe we could use length!? Though a bit weird since it's not a verb.

bjarthur · 2017-07-14T11:44:51Z

@JeffBezanson matlab uses length to mean the number of elements in the longest dimension. it doesn't make sense; just wanted to point it out. [edit: perhaps the person he coined this usage was a woodworker, where length, width, and depth refers to the longest, 2nd longest, and shortest dimensions of a board, respectively]

to me, in plain layman terms, vectors have lengths, matrices have areas, and 3-D arrays have volumes. i guess i tend to think in physical terms, as if they represented a space in the real world. again, to me, a general term for a scalar quantity that includes length, area, and volume is size (not length), meaning how big it is, that is how many elements it has.

in julia sizeof currently means the number of bytes an array consumes, and length means the number of elements. two entirely different words for functions that return the same thing but in different units. really?? i like @tknopp 's suggestion that length be renamed to size. my mother would understand terminology like that.

tknopp · 2017-07-14T14:21:15Z

the STL (C++) also uses size. The suggestion length -> size and size -> shape seems to make things indeed a little bit more consistent.

The question if its worth the effort. On the other hand: Now or never... :-)

StefanKarpinski · 2017-07-18T18:17:59Z

I don't think we have enough deprecation cycles to make that change happen in 1.0.

JeffBezanson · 2017-07-18T18:49:04Z

in julia sizeof currently means the number of bytes an array consumes, and length means the number of elements. two entirely different words for functions that return the same thing but in different units. really??

No. length is the abstract number of elements in anything iterable, while sizeof refers to the concrete memory representation. You can have an object whose size is 16 bytes but that iterates millions of elements, or whose size is 1GB but that iterates one or zero elements. These are in no way the same concept.

I think renaming size to shape is reasonable; there is precedent for that name. But I'm not sure it's worth it. size is also an increasingly problematic function for e.g. OffsetArrays. Not sure if it's necessarily related, but helps motivate some kind of shake-up there.

Sacha0 · 2017-07-18T18:59:06Z

A more descriptive name for sizeof would be nice, e.g. memsize. Best!

bjarthur · 2017-07-21T10:57:08Z

@StefanKarpinski not enough deprecation cycles? how is the change proposed here (size to shape, length to size, and, for good measure, sizeof to memsize) different in this regard than anything that's proposed in #20402 ? your roadmap talk is not posted yet, but if it includes even just one more release (0.7) before 1.0, then making the change now and including a deprecation should suffice, no?

JeffBezanson · 2017-07-21T14:14:49Z

I really don't want to rename length. If we didn't allow growing collections, then I assume there'd be nothing wrong with using length. So maybe we should rename resize!? Also, length is not tied to arrays; it is much more general than that. It refers to the number of elements in a sequence (you might even say the "length" of a sequence), and arrays happen to be able to implement that interface.

StefanKarpinski · 2017-07-21T19:58:37Z

@bjarthur: a chain of renames requires a deprecation cycle for each link, and we only have one left. Specifically, we can't deprecated length to size until one cycle after we've deprecated size to length – even just one cycle is kind of dangerous since deprecations often don't get caught until the function is deleted entirely.

bjarthur · 2017-07-22T22:32:54Z

resize! has an if/else/end block with entirely different code to handle lengthening and shortening. not surprising. what about splitting it into two functions: lengthen! and shorten!? upside is that this terminology would then be consistent with retaining length (as opposed to renaming to size). downside is that the logic about which to use would then have to be hoisted to the user. for the life of me i can't think of a verb which encompasses both lengthen and shorten and is specific to altering a dimension.

re. renaming sizeof to something more descriptive, @Sacha0 suggested memsize, and here i put forth footprint. is that too slang-y?

size to shape, resize! to lengthen! and shorten!, and sizeof to either memsize or footprint could all be done at once i believe.

yuyichao · 2017-07-22T22:38:18Z

splitting it into two functions: lengthen! and shorten!?

NO. Don't do that! It's not even useful for performance since LLVM can fold that branch easily.

yuyichao · 2017-07-22T22:42:01Z

memsize, and here i put forth footprint

memsize is OK although ~~less~~more ambiguous than the sizeof which has a well accepted meaning from C. footprint is wrong since the return value is not the memory footprint of the object, that's what summarysize estimates.

StefanKarpinski · 2017-07-24T19:41:36Z

for the life of me i can't think of a verb which encompasses both lengthen and shorten and is specific to altering a dimension.

"resize"?

bjarthur · 2017-07-31T11:31:37Z

there are three things wrong with the following interface to arrays:

.	query	alter
tuple	`size`	`reshape!`
scalar	`length`	`resize!`

the first is that the same root word, "size", is used to query an array and return a tuple of dimensions, as well as to alter an array with a scalar input (the diagonal elements in the 2x2 table above).

the second and third are that the same root word is not shared for tuple inputs and outputs, and scalar input and outputs (the columns in each row).

a single change, size to shape, would fix the first and second. we can't seem to agree on how to fix the third. but 2 of 3 ain't bad. can we please proceed with at least this?

JeffBezanson · 2017-07-31T15:09:42Z

What about shape(a, 2)? That seems awkward to me. Maybe length(a, 2) makes sense? For querying a single dimension "what is the size of the dimension" seems like how I would usually say it.

bjarthur · 2017-07-31T20:43:45Z

i'd suggest deprecating the 2-input method of shape, and use shape(a)[2] instead. they both produce the same llvm and native code, so there is no performance penalty.

mbauman · 2017-07-31T20:46:01Z

Just to fill in your table a bit more:

	query	alter
tuple of indices	`indices`	`reshape`
indices of dimension `d`	`indices(A, d)`	-
tuple of dimension sizes	`size`	`reshape`
size of dimension `d`	`size(A, d)`	-
number of iterated elements	`length`	`resize!`

Personally, I don't find this all that dreadful. Note that length is more strongly tied to iteration than it is to arrays specifically, and resize! only works for vectors, where size(A) == (size(A, 1),).

The only refactoring that I could really see is size(A) → shape(A), but I don't think asking for the "shape" of a given dimension makes all that much sense… so you'd probably also want size(A, d) → length(A, d). Then we'd probably get the complaint that shape(A) and length(A, d) is inconsistent with indices(A) and indices(A,d).

tknopp · 2017-07-31T21:19:54Z

@bjarthur size(a)[2] and size(a,2) are not redundant. Try calling both with a vector. This behavior is pretty useful.

tknopp · 2017-07-31T21:42:41Z

Thinking more from the linguistic point of view I would second @JeffBezanson and even say that shape(a) is not as specific as size(a). Just for instance the shape of a matrix could also be a property describing if it is a diagonal, or upper diagonal matrix, etc.

bjarthur · 2017-08-01T11:15:32Z

i would've expected size([1,2,3],2) to return 0 if it didn't throw an error. is there an issue i can read about the design decision here?

JeffBezanson · 2017-08-01T14:33:51Z

If [1,2,3] conceptually had size 3x0x...x0 then it would have zero total elements (the product of the dimension sizes).

StefanKarpinski · 2017-08-01T20:58:18Z

I support having a shape function, but as I said before, it should support arrays with arbitrary index offsets, which essentially means that it's a rename/rebrand of indices, not size. Moreover, it makes sense to allow reshape to change the index offsets of dimensions, not just their size. In fact, OffsetArrays does precisely this. In this context, changing indices to shape would make more sense – and that's something I already proposed above. Then shape and reshape would match.

From this point of view, size(A) would really just a shorthand for map(length, shape(A)) and size(A, d) would be shorthand for length(shape(A, d)). I could get behind changing the spelling of size(A, d) to length(A, d) but calling size(A) is still a really common and useful idiom and that can't be shoehorned into length since length(A) means something quite different.

I would also point out that some of these things are just vagaries of English, which we're inheriting. To change the length of something, you "resize" it, you don't "relength" it – that's just not a word. Moreover, even though asking for the length of an individual dimension makes sense, asking for the length of an entire array and getting back a tuple of dimension lengths does not make sense, so swapping length and size would not be good despite the length vs resize! mismatch.

JeffBezanson · 2017-08-01T21:13:41Z

I would actually like to rename indices anyway, since to me it sounds like it would be a collection of the indices themselves, i.e. for i in indices(a) sounds like it would be similar to for i in eachindex(a).

ttparker · 2017-12-26T17:45:32Z

I bet this suggestion will be unpopular, but one possible way to reduce the unfortunate linguistic similarity between the conceptually distinct functions size and resize would be to simply rename size to sizes, since it returns a tuple of dimension sizes. While sizes(A, d) perhaps sounds less natural than either size(A, d) or length(A, d), I think it certainly sounds more natural than shape(A, d). (Never too early to start thinking about version 2.0 ...)

mkitti · 2022-04-06T17:29:17Z

Since this has been added to the 2.0 milestone, could we describe how we plan to address this in the context of modern Julia.

index is now eachindex I believe.

Should we add an alias shape(obj) = size(obj)?

StefanKarpinski · 2022-04-07T18:56:43Z

I have an idle thought for Julia 2.0, which would be to do away with implicit array bases altogether. Not sure if it's a good idea or not, but it would basically entail making the built-in arrays OffsetArrays and instead of giving dimension sizes in APIs, you would give dimension ranges, e.g. 1:n or 0:n-1. In that case it would make more sense to ask for the shape of an array than it's size, i.e. shape(a) would return a tuple of index ranges.

All that said, I just don't think the slight linguistic mismatch between size and reshape is awful or worth making a breaking change for unless there were other major benefits to be had.

mkitti · 2022-04-07T19:05:24Z

instead of giving dimension sizes in APIs, you would give dimension ranges, e.g. 1:n or 0:n-1. In that case it would make more sense to ask for the shape of an array than it's size, i.e. shape(a) would return a tuple of index ranges.

It sounds like this could be prototyped in OffsetArrays before reaching Base.

mkitti · 2022-04-07T20:48:03Z

Not sure if it's a good idea or not, but it would basically entail making the built-in arrays OffsetArrays and instead of giving dimension sizes in APIs, you would give dimension ranges, e.g. 1:n or 0:n-1

Isn't this just axes https://docs.julialang.org/en/v1/base/arrays/#Base.axes-Tuple{Any} ?

julia> A = Array{Int}(undef, 5,2,3); axes(A)
(Base.OneTo(5), Base.OneTo(2), Base.OneTo(3))

julia> using OffsetArrays

julia> O = OffsetArray(A, (-1, -2, -3)); axes(O)
(OffsetArrays.IdOffsetRange(values=0:4, indices=0:4), OffsetArrays.IdOffsetRange(values=-1:0, indices=-1:0), OffsetArrays.IdOffsetRange(values=-2:0, indices=-2:0))

julia> collect.(axes(O))
([0, 1, 2, 3, 4], [-1, 0], [-2, -1, 0])

timholy mentioned this issue Jul 4, 2017

API consistency review #20402

Closed

19 tasks

quinnj mentioned this issue Sep 8, 2017

Deprecate length, nrow, and ncol on DataFrames in favor of size. Fixe… JuliaData/DataFrames.jl#1224

Closed

MasonProtter mentioned this issue Jun 15, 2020

Rename typeof() to type() for 2.0? #35808

Closed

ViralBShah added this to the 2.0 milestone Mar 11, 2022

brenhinkeller added speculative Whether the change will be implemented is speculative breaking This change will break code labels Nov 21, 2022

jariji mentioned this issue Sep 8, 2023

2.0: Slicer function names #51252

Open

nsajko added the arrays [a, r, r, a, y, s] label Aug 16, 2024

size, reshape not consistent #22665

size, reshape not consistent #22665

Comments

bjarthur commented Jul 3, 2017 • edited Loading

Gnimuc commented Jul 3, 2017

andyferris commented Jul 4, 2017

timholy commented Jul 4, 2017

timholy commented Jul 4, 2017

yuyichao commented Jul 4, 2017

bjarthur commented Jul 4, 2017

tknopp commented Jul 4, 2017

StefanKarpinski commented Jul 11, 2017

bjarthur commented Jul 11, 2017

JeffBezanson commented Jul 11, 2017

bjarthur commented Jul 14, 2017 • edited Loading

tknopp commented Jul 14, 2017

StefanKarpinski commented Jul 18, 2017

JeffBezanson commented Jul 18, 2017 • edited Loading

Sacha0 commented Jul 18, 2017

bjarthur commented Jul 21, 2017

JeffBezanson commented Jul 21, 2017

StefanKarpinski commented Jul 21, 2017 • edited Loading

bjarthur commented Jul 22, 2017

yuyichao commented Jul 22, 2017

yuyichao commented Jul 22, 2017 • edited Loading

StefanKarpinski commented Jul 24, 2017

bjarthur commented Jul 31, 2017 • edited Loading

JeffBezanson commented Jul 31, 2017

bjarthur commented Jul 31, 2017

mbauman commented Jul 31, 2017 • edited Loading

tknopp commented Jul 31, 2017

tknopp commented Jul 31, 2017

bjarthur commented Aug 1, 2017

JeffBezanson commented Aug 1, 2017

StefanKarpinski commented Aug 1, 2017 • edited Loading

JeffBezanson commented Aug 1, 2017

ttparker commented Dec 26, 2017

mkitti commented Apr 6, 2022

StefanKarpinski commented Apr 7, 2022

mkitti commented Apr 7, 2022

mkitti commented Apr 7, 2022

`size`, `reshape` not consistent #22665

`size`, `reshape` not consistent #22665

bjarthur commented Jul 3, 2017 •

edited

Loading

bjarthur commented Jul 14, 2017 •

edited

Loading

JeffBezanson commented Jul 18, 2017 •

edited

Loading

StefanKarpinski commented Jul 21, 2017 •

edited

Loading

yuyichao commented Jul 22, 2017 •

edited

Loading

bjarthur commented Jul 31, 2017 •

edited

Loading

mbauman commented Jul 31, 2017 •

edited

Loading

StefanKarpinski commented Aug 1, 2017 •

edited

Loading