Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

size, reshape not consistent #22665

Open
bjarthur opened this issue Jul 3, 2017 · 37 comments
Open

size, reshape not consistent #22665

bjarthur opened this issue Jul 3, 2017 · 37 comments
Labels
arrays [a, r, r, a, y, s] breaking This change will break code speculative Whether the change will be implemented is speculative
Milestone

Comments

@bjarthur
Copy link
Contributor

bjarthur commented Jul 3, 2017

if size returns a tuple of an arrays dimensions, then why is the function which changes an array's dimensions not called resize? size and reshape are borrowed from matlab. numpy uses shape and reshape. the latter make a lot more sense to me, despite being a recovering matlab user. shall we consider changing? happy to submit a PR if there is a consensus. some discussion here.

@Gnimuc
Copy link
Contributor

Gnimuc commented Jul 3, 2017

-1 since reshape(to make into a different shape) is a more strict operation than resize(to alter the size of something):

julia> reshape(rand(2,2), 3, 1)
ERROR: DimensionMismatch("new dimensions (3,1) must be consistent with array size 4")

julia> resize!([1,2,3,4], 3)
3-element Array{Int64,1}:
 1
 2
 3

@andyferris
Copy link
Member

If we had multidimensional resizable (as in re-length-able) arrays, then a multidimensional resize method would indeed make some sense. But we can't change their "total" size (length), only their shape.

If we decide to go with non-length-changing arrays, then we can still support reshape, as a view...

@timholy
Copy link
Member

timholy commented Jul 4, 2017

I think the core suggestion here is to rename size to shape. This makes sense to me. size is less important anyway given that indices is what really matters 😄.

@timholy
Copy link
Member

timholy commented Jul 4, 2017

Linking to #20402 so this gets more eyes before anyone goes to great effort.

@timholy timholy mentioned this issue Jul 4, 2017
19 tasks
@yuyichao
Copy link
Contributor

yuyichao commented Jul 4, 2017

shape(a) sounds fine, not so much for shape(a, 1) though....

@bjarthur
Copy link
Contributor Author

bjarthur commented Jul 4, 2017

renaming size to shape is not just the core suggestion, it's the only suggestion. thanks @timholy for clarifying my verbosity.

seems to me this would be relatively straightforward. just a cut and paste with a deprecation, no?

@tknopp
Copy link
Contributor

tknopp commented Jul 4, 2017

If we think this to the end then length should be renamed to size. Since resize! modifies the array size.

yuyichao has a point though size(a,d) gives us the size of the d-th dimension.

Personally I don't think all this is worth it. Naming is something that at some point gets subjective and IMHO reaching 100% consistency is close to impossible.

@StefanKarpinski
Copy link
Member

If we were to move to shape instead of size we should at the same time move to an API where the range of valid values is returned. At which point, we'd probably also want a function that returns the length of each of those index collections – maybe we could call it size.

@bjarthur
Copy link
Contributor Author

@StefanKarpinski we already have indices, which returns the range of valid values.

i'm surprised to see so much resistance to this idea. is it just that change is bad / a lot of work and this syntax is too entrenched? we're not at 1.0 yet.

would there be less opposition to change reshape to resize? then we'd have size, resize, and resize!. methods to the latter could eventually be added to handle N-D arrays. i'd be happy with that too. anything to make the nomenclature consistent.

@JeffBezanson
Copy link
Member

There are two key properties here: the number of elements, and how they are factored into dimensions. size is a bit ambiguous and could refer to either, but shape definitely only refers to the second property (do we agree shape(a, 1) probably doesn't make sense?). The defining feature of reshape is that it can only change the factoring into dimensions, not the number of elements. So I think its name should stay. There is never going to be a reshape!, since that would require changing the type of an array (we also want to avoid mutating dimension sizes in general).

One issue we're up against here is that length is a standard term for the number of items in an array, but relength is not a word. Sometimes you just have to live with things like that. Maybe we could use length!? Though a bit weird since it's not a verb.

@bjarthur
Copy link
Contributor Author

bjarthur commented Jul 14, 2017

@JeffBezanson matlab uses length to mean the number of elements in the longest dimension. it doesn't make sense; just wanted to point it out. [edit: perhaps the person he coined this usage was a woodworker, where length, width, and depth refers to the longest, 2nd longest, and shortest dimensions of a board, respectively]

to me, in plain layman terms, vectors have lengths, matrices have areas, and 3-D arrays have volumes. i guess i tend to think in physical terms, as if they represented a space in the real world. again, to me, a general term for a scalar quantity that includes length, area, and volume is size (not length), meaning how big it is, that is how many elements it has.

in julia sizeof currently means the number of bytes an array consumes, and length means the number of elements. two entirely different words for functions that return the same thing but in different units. really?? i like @tknopp 's suggestion that length be renamed to size. my mother would understand terminology like that.

@tknopp
Copy link
Contributor

tknopp commented Jul 14, 2017

the STL (C++) also uses size. The suggestion length -> size and size -> shape seems to make things indeed a little bit more consistent.

The question if its worth the effort. On the other hand: Now or never... :-)

@StefanKarpinski
Copy link
Member

I don't think we have enough deprecation cycles to make that change happen in 1.0.

@JeffBezanson
Copy link
Member

JeffBezanson commented Jul 18, 2017

in julia sizeof currently means the number of bytes an array consumes, and length means the number of elements. two entirely different words for functions that return the same thing but in different units. really??

No. length is the abstract number of elements in anything iterable, while sizeof refers to the concrete memory representation. You can have an object whose size is 16 bytes but that iterates millions of elements, or whose size is 1GB but that iterates one or zero elements. These are in no way the same concept.

I think renaming size to shape is reasonable; there is precedent for that name. But I'm not sure it's worth it. size is also an increasingly problematic function for e.g. OffsetArrays. Not sure if it's necessarily related, but helps motivate some kind of shake-up there.

@Sacha0
Copy link
Member

Sacha0 commented Jul 18, 2017

A more descriptive name for sizeof would be nice, e.g. memsize. Best!

@bjarthur
Copy link
Contributor Author

@StefanKarpinski not enough deprecation cycles? how is the change proposed here (size to shape, length to size, and, for good measure, sizeof to memsize) different in this regard than anything that's proposed in #20402 ? your roadmap talk is not posted yet, but if it includes even just one more release (0.7) before 1.0, then making the change now and including a deprecation should suffice, no?

@JeffBezanson
Copy link
Member

I really don't want to rename length. If we didn't allow growing collections, then I assume there'd be nothing wrong with using length. So maybe we should rename resize!? Also, length is not tied to arrays; it is much more general than that. It refers to the number of elements in a sequence (you might even say the "length" of a sequence), and arrays happen to be able to implement that interface.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Jul 21, 2017

@bjarthur: a chain of renames requires a deprecation cycle for each link, and we only have one left. Specifically, we can't deprecated length to size until one cycle after we've deprecated size to length – even just one cycle is kind of dangerous since deprecations often don't get caught until the function is deleted entirely.

@bjarthur
Copy link
Contributor Author

resize! has an if/else/end block with entirely different code to handle lengthening and shortening. not surprising. what about splitting it into two functions: lengthen! and shorten!? upside is that this terminology would then be consistent with retaining length (as opposed to renaming to size). downside is that the logic about which to use would then have to be hoisted to the user. for the life of me i can't think of a verb which encompasses both lengthen and shorten and is specific to altering a dimension.

re. renaming sizeof to something more descriptive, @Sacha0 suggested memsize, and here i put forth footprint. is that too slang-y?

size to shape, resize! to lengthen! and shorten!, and sizeof to either memsize or footprint could all be done at once i believe.

@yuyichao
Copy link
Contributor

splitting it into two functions: lengthen! and shorten!?

NO. Don't do that! It's not even useful for performance since LLVM can fold that branch easily.

@yuyichao
Copy link
Contributor

yuyichao commented Jul 22, 2017

memsize, and here i put forth footprint

memsize is OK although lessmore ambiguous than the sizeof which has a well accepted meaning from C. footprint is wrong since the return value is not the memory footprint of the object, that's what summarysize estimates.

@StefanKarpinski
Copy link
Member

for the life of me i can't think of a verb which encompasses both lengthen and shorten and is specific to altering a dimension.

"resize"?  

@bjarthur
Copy link
Contributor Author

bjarthur commented Jul 31, 2017

there are three things wrong with the following interface to arrays:

. query alter
tuple size reshape!
scalar length resize!

the first is that the same root word, "size", is used to query an array and return a tuple of dimensions, as well as to alter an array with a scalar input (the diagonal elements in the 2x2 table above).

the second and third are that the same root word is not shared for tuple inputs and outputs, and scalar input and outputs (the columns in each row).

a single change, size to shape, would fix the first and second. we can't seem to agree on how to fix the third. but 2 of 3 ain't bad. can we please proceed with at least this?

@JeffBezanson
Copy link
Member

What about shape(a, 2)? That seems awkward to me. Maybe length(a, 2) makes sense? For querying a single dimension "what is the size of the dimension" seems like how I would usually say it.

@bjarthur
Copy link
Contributor Author

i'd suggest deprecating the 2-input method of shape, and use shape(a)[2] instead. they both produce the same llvm and native code, so there is no performance penalty.

@mbauman
Copy link
Member

mbauman commented Jul 31, 2017

Just to fill in your table a bit more:

  query alter
tuple of indices indices reshape
indices of dimension d indices(A, d) -
tuple of dimension sizes size reshape
size of dimension d size(A, d) -
number of iterated elements length resize!

Personally, I don't find this all that dreadful. Note that length is more strongly tied to iteration than it is to arrays specifically, and resize! only works for vectors, where size(A) == (size(A, 1),).

The only refactoring that I could really see is size(A)shape(A), but I don't think asking for the "shape" of a given dimension makes all that much sense… so you'd probably also want size(A, d)length(A, d). Then we'd probably get the complaint that shape(A) and length(A, d) is inconsistent with indices(A) and indices(A,d).

@tknopp
Copy link
Contributor

tknopp commented Jul 31, 2017

@bjarthur size(a)[2] and size(a,2) are not redundant. Try calling both with a vector. This behavior is pretty useful.

@tknopp
Copy link
Contributor

tknopp commented Jul 31, 2017

Thinking more from the linguistic point of view I would second @JeffBezanson and even say that shape(a) is not as specific as size(a). Just for instance the shape of a matrix could also be a property describing if it is a diagonal, or upper diagonal matrix, etc.

@bjarthur
Copy link
Contributor Author

bjarthur commented Aug 1, 2017

i would've expected size([1,2,3],2) to return 0 if it didn't throw an error. is there an issue i can read about the design decision here?

@JeffBezanson
Copy link
Member

If [1,2,3] conceptually had size 3x0x...x0 then it would have zero total elements (the product of the dimension sizes).

@StefanKarpinski
Copy link
Member

StefanKarpinski commented Aug 1, 2017

I support having a shape function, but as I said before, it should support arrays with arbitrary index offsets, which essentially means that it's a rename/rebrand of indices, not size. Moreover, it makes sense to allow reshape to change the index offsets of dimensions, not just their size. In fact, OffsetArrays does precisely this. In this context, changing indices to shape would make more sense – and that's something I already proposed above. Then shape and reshape would match.

From this point of view, size(A) would really just a shorthand for map(length, shape(A)) and size(A, d) would be shorthand for length(shape(A, d)). I could get behind changing the spelling of size(A, d) to length(A, d) but calling size(A) is still a really common and useful idiom and that can't be shoehorned into length since length(A) means something quite different.

I would also point out that some of these things are just vagaries of English, which we're inheriting. To change the length of something, you "resize" it, you don't "relength" it – that's just not a word. Moreover, even though asking for the length of an individual dimension makes sense, asking for the length of an entire array and getting back a tuple of dimension lengths does not make sense, so swapping length and size would not be good despite the length vs resize! mismatch.

@JeffBezanson
Copy link
Member

I would actually like to rename indices anyway, since to me it sounds like it would be a collection of the indices themselves, i.e. for i in indices(a) sounds like it would be similar to for i in eachindex(a).

@ttparker
Copy link

I bet this suggestion will be unpopular, but one possible way to reduce the unfortunate linguistic similarity between the conceptually distinct functions size and resize would be to simply rename size to sizes, since it returns a tuple of dimension sizes. While sizes(A, d) perhaps sounds less natural than either size(A, d) or length(A, d), I think it certainly sounds more natural than shape(A, d). (Never too early to start thinking about version 2.0 ...)

@mkitti
Copy link
Contributor

mkitti commented Apr 6, 2022

Since this has been added to the 2.0 milestone, could we describe how we plan to address this in the context of modern Julia.

index is now eachindex I believe.

Should we add an alias shape(obj) = size(obj)?

@StefanKarpinski
Copy link
Member

I have an idle thought for Julia 2.0, which would be to do away with implicit array bases altogether. Not sure if it's a good idea or not, but it would basically entail making the built-in arrays OffsetArrays and instead of giving dimension sizes in APIs, you would give dimension ranges, e.g. 1:n or 0:n-1. In that case it would make more sense to ask for the shape of an array than it's size, i.e. shape(a) would return a tuple of index ranges.

All that said, I just don't think the slight linguistic mismatch between size and reshape is awful or worth making a breaking change for unless there were other major benefits to be had.

@mkitti
Copy link
Contributor

mkitti commented Apr 7, 2022

instead of giving dimension sizes in APIs, you would give dimension ranges, e.g. 1:n or 0:n-1. In that case it would make more sense to ask for the shape of an array than it's size, i.e. shape(a) would return a tuple of index ranges.

It sounds like this could be prototyped in OffsetArrays before reaching Base.

@mkitti
Copy link
Contributor

mkitti commented Apr 7, 2022

Not sure if it's a good idea or not, but it would basically entail making the built-in arrays OffsetArrays and instead of giving dimension sizes in APIs, you would give dimension ranges, e.g. 1:n or 0:n-1

Isn't this just axes https://docs.julialang.org/en/v1/base/arrays/#Base.axes-Tuple{Any} ?

julia> A = Array{Int}(undef, 5,2,3); axes(A)
(Base.OneTo(5), Base.OneTo(2), Base.OneTo(3))

julia> using OffsetArrays

julia> O = OffsetArray(A, (-1, -2, -3)); axes(O)
(OffsetArrays.IdOffsetRange(values=0:4, indices=0:4), OffsetArrays.IdOffsetRange(values=-1:0, indices=-1:0), OffsetArrays.IdOffsetRange(values=-2:0, indices=-2:0))

julia> collect.(axes(O))
([0, 1, 2, 3, 4], [-1, 0], [-2, -1, 0])

@brenhinkeller brenhinkeller added speculative Whether the change will be implemented is speculative breaking This change will break code labels Nov 21, 2022
@nsajko nsajko added the arrays [a, r, r, a, y, s] label Aug 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
arrays [a, r, r, a, y, s] breaking This change will break code speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests