Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

syntax: separate array concatenation from array construction #7128

Closed
StefanKarpinski opened this issue Jun 5, 2014 · 146 comments
Closed

syntax: separate array concatenation from array construction #7128

StefanKarpinski opened this issue Jun 5, 2014 · 146 comments
Labels
breaking This change will break code speculative Whether the change will be implemented is speculative
Milestone

Comments

@StefanKarpinski
Copy link
Member

Much gnashing of teeth derives from the overlap between syntax for array literal construction and array concatenation in Julia – largely inherited from Matlab. Perhaps we should just use a different syntax for block matrix construction entirely. One thought would be this:

| a b
  c d |

This has the advantage of being pretty terse and lightweight. For example, the current idiom of expanding a range into an array is [1:10] which would become |1:10| while [1:10] would construct a one-element array of type UnitRange{Int}.

@tkelman
Copy link
Contributor

tkelman commented Jun 5, 2014

If breaking changes are being considered for this, I'll put in a vote for using an explicit delimiter here. Comma, semicolon, whatever. Not whitespace.

@StefanKarpinski
Copy link
Member Author

Well, this would free up the ability to use an explicit delimiter instead of requiring using whitespace. It is still sometimes nice to use whitespace, although I guess we could avoid that since it's a bit of a parsing nightmare.

@JeffBezanson
Copy link
Member

dup of:
#3737
#2488

related syntax discussion:
#6960

related:
#6491

@tkelman
Copy link
Contributor

tkelman commented Jun 5, 2014

Inconvenience of one extra character per element for explicit delimiters is minor compared to the parsing inconsistency IMO

@StefanKarpinski
Copy link
Member Author

It's not really a perfect dup of either of those, although it is related.

@JeffBezanson
Copy link
Member

#3737 is an exact dup. Even the title is almost the same.

@jiahao
Copy link
Member

jiahao commented Jun 5, 2014

|...| looks a bit too much like a determinant for my comfort. In the OP, I read | a b; c d | as a*d-c*b the first time round.

@quinnj
Copy link
Member

quinnj commented Jun 5, 2014

As mentioned elsewhere, the {...} syntax could be made available since it's completely redundant anyway (can always use Any[...] instead). I like the thought of making curly braces actually useful!

@kmsquire
Copy link
Member

+1 for making {...} concatenation (and making [ ] equivalent to Any[ ] instead of None[ ]).

@pao
Copy link
Member

pao commented Jun 18, 2014

Consolidating from #7293, I'd like to see ways of constructing both N-vectors and Nx1 matrices, since we're distinguishing them.

@nalimilan
Copy link
Member

Reclaiming {...} may be a good solution. [...] could actually keep its concatenating behavior, and {...} would become the non-concatenating version. This would be minimally disruptive since {...} is used less often, and it doesn't concatenate at the moment. (Plus, I find it more "intuitive" than the reversed roles, not sure why...)

@pao
Copy link
Member

pao commented Jun 18, 2014

Plus, I find it more "intuitive" than the reversed roles, not sure why...

Perhaps it reminds you of C++11 initializer lists?

@nalimilan
Copy link
Member

Perhaps it reminds you of C++11 initializer lists?

I guess not, I don't even know what they are! I've not updated my knowledge of C++ in the last decade...

@MikeInnes
Copy link
Member

{...} syntax could be made available since it's completely redundant anyway

FWIW, as someone who does more general-purpose work with Julia I disagree that the {...} syntax isn't useful. I can definitely understand that perspective from those who mostly write performance- (and therefore type-) sensitive code, but when you don't want to think about the type system at all it's great to have a convenient escape hatch.

(And redundant ≠ useless – all of the current array syntax is essentially a syntactical convenience, but often those are important)

@quinnj
Copy link
Member

quinnj commented Sep 19, 2014

I like the suggestion of using x = [] as shorthand for x = Any[], and
having to explicitly say x = None[] to create a None array. Then we can
recoup {...} for something else and there's no convenience lost.

On Thu, Sep 18, 2014 at 9:11 PM, one-more-minute [email protected]
wrote:

{...} syntax could be made available since it's completely redundant anyway

FWIW, as someone who does more general-purpose work with Julia I disagree
that the {...} syntax isn't useful. I can definitely understand that
perspective from those who mostly write performance- (and therefore type-)
sensitive code, but when you don't want to think about the type system at
all it's great to have a convenient escape hatch.

(And redundant ≠ useless – all of the current array syntax is essentially
a syntactical convenience, but often those are important)


Reply to this email directly or view it on GitHub
#7128 (comment).

@MikeInnes
Copy link
Member

[] = Any[] would definitely makes sense, but is somewhat separate – I often have to write array/dict literals that contain elements. Presumably, recouping {} would mean I'd have to write Any[...].

In general Julia does really well with its "never mention types when you don’t feel like it" philosophy – I really think it would be a shame to lose that.

@PythonNut
Copy link
Contributor

How crazy would it be for |...| to stand for absolute value? Somehow x = | x - 1 | looks really nice to me, but on the other hand, it might be a parsing nightmare.

@pao
Copy link
Member

pao commented Oct 18, 2014

I'm going to go with pretty crazy. Which norm would you want (plenty to choose from)? Are norms commonly needed enough to deserve their own syntax?

@PythonNut
Copy link
Contributor

I feared as much, just wanted to bounce it off of sane people. Probably the sqrt(dot(x,x)) norm for 1d data structures and friends (abs(x) for scalars, det(x) for matrices?). I imagine norms are used quite a bit for vectors, although the norm is technically || x || for them.

@tonyhffong
Copy link

wouldn't standalone | or || create lots of insane parsing edge cases? Does 2 | x | in REPL mean 2 bit-or x bit-or something else to be followed in the next line, or 2 * norm( x )? Compound brackets with directional hints such as [| x |], {| x |}, |: x :|, etc may make a bit more sense, in danger of creating line noise. Or we start to use unicode brackets ⟦ ⟧ ⟨ ⟩ ⟪ ⟫

@PythonNut
Copy link
Contributor

That would apply to the original proposal as well, right? Of those, |: x :| looks cleanest.

So is {...} going to be the new Matrix magic, or the type signature of a Tuple?

@JeffBezanson
Copy link
Member

I don't think using | | as brackets is going to happen. However it would be great to have a general approach to using more kinds of brackets --- a standard way to parse ⟦x⟧, x⟦i⟧, etc. x[] as getindex does not have an obvious generalization.

@tonyhffong
Copy link

Quite right... we have the current behavior at the parsing level, before even getting to what they do

type   with prefix    without prefix
[]     ref           vcat (maybe hcat), comprehension
{}     curly         cell1d, comprehension (and in 0.4: tuple type?)
()     call          tuple, or nothing (AST abstracted it away)

(Wow the brackets wear so many hats. I'm not sure I have all of them listed.)

New brackets could follow a simplified pattern, mapping into refdoublesquare and catdoublesquare for the prefixed and non-prefixed cases. Then we decide what they do: set, norm, matrix notation, dictionary shorthand, etc.

@JeffBezanson
Copy link
Member

The unicode brackets I'd be content just to parse for now. But some alternative ascii brackets like [| |] are a real possibility for use in Base.

@JeffBezanson
Copy link
Member

I disagree that we should claim not to have space-sensitive syntax, and then actually have it within a special kind of string that we tell people to use. You still have space-sensitive syntax, plus the added problems of putting code in strings. This is an important language construct for us, so we ought to be able to parse it with just our normal parser. This makes life easier for writers of other macros or code analysis tools. If some syntax is worth having, it's worth having in the actual parser.

@Jakki42
Copy link

Jakki42 commented Jul 18, 2015

One of the things which attracted me to Julia was the cleanliness of the language - it's pretty and (mostly) easy to comprehend even for someone who is not a scientist and who's programming background is in K&R C with little actual programmin over the last couple of decades. What however is not pretty and clean at all is putting code into strings. For me it turns code hard to read - that's just the way my brain is wired. Likewise using white spaces as separators is just hideous from readability point of view - I'd certainly prefer to see , and ; as mandatory separators. I'm also not sure it's a good idea to use the macro approach as macros tend to be a slight turn off for highly incompetent folks like myself, but it's certainly better from approachability point of view than the string thing.

@SimonDanisch
Copy link
Contributor

@Jakki42, would you think the string solution is not nice because it usually does not offer sensible syntax highlighting? In other words, would you be okay with the solution if we offer specialized syntax highlighting for it?

@mbauman
Copy link
Member

mbauman commented Jul 18, 2015

String macros are like the wild west. The macro could do absolutely anything. Which is awesome and powerful, but also can be crazy and hard to reason about.

Lots of folks have a rightfully-learned aversion to code in strings. And they're right. Code has no place in a string. It requires runtime parsing and evaluation, which is slow and a performance trap in almost all languages since there's no way for a static analyzer to reason about what the evaluation might result in.

But string macros don't necessarily return code in a string. They can implement their own parsing rules and place the resulting expression directly into the surrounding code before it's compiled. There could potentially be no difference between the text of a source file and the text within a string macro. It's all just parsed text. As a toy example, here's a string macro that simply transforms its whitespace separated contents into an addition between all of them:

julia> macro add_str(ex)
          toks = map(parse, split(ex)) # This could be more robust, but works as a simple example
          esc(Expr(:call, :+, toks...))
       end

julia> x = 2; y = 3;
       add"x y 2x*y sin(rand())"
17.775332435685616

julia> macroexpand(:(add"x y 2x*y sin(rand())"))
:(x + y + (2x) * y + sin(rand()))

This means they can implement DSLs, add rich text markup, precompile regular expressions, execute C++ code, and more. A string macro's contents can be entirely data, or it can have its own interpolation rules (with code demarcated by $ or \(…) or any rules it wants), or it can be entirely Julia code in the same scope as its surrounding code, returning a parsed Julian expression.

After all this, I'm still with @JeffBezanson. If this is an important enough construct to be included in the standard library, it should be a first-class parsed syntax. All the other string macros included in base currently follow the "entirely data" or "data with interpolation" semantics — adding this would cause lots of confusion, I think. The only way I could see a consistent story here is if we decide that backtick string macros (#12139) should always contain in-scope Julian code and all other string macros should be predominantly string data (which may happen to be code for another language… which is unfortunate for Cmd).

@Jakki42
Copy link

Jakki42 commented Jul 19, 2015

@SimonDanisch - for me it's not a syntax highlighting issue, but the messyness of the syntax itself - it's confusing and unclear to me, a deviation from how the rest of the syntax looks like, like from a different language - I can not quicly glance though it, but my old and slow brain needs plenty of extra effort to understand what's going on. Maybe it is my background but to me [ ], ( ), { } clearly encapsulate something, make a unit or block of something and " " and ' ' always just seem like string or characher of no programmmatic meaning inside. While < > also form a nice opening and closing, I'm so old that to my brain they do not equate to anything to else but bigger or smaller than symbols and extra effort is needed to understand if they were to have something meaningful inside. And whitespace - to me it's always just something not relevant to syntax, parsed out.

Anyhow, please keep in mind that I am a low end low priority user, certainly not a member of the main target audience groups of the language :-)

@dcarrera
Copy link

I find the whole notion of string macros for doing math slightly horrifying. This is terrible syntax. It is not merely ugly and messy; it is ad-hoc and inconsistent. One principle of design is that similar things should look similar, and different things should look different. String macros should typically do string-like things. For example, PyPlot uses L"..." to LaTeX strings. String-like syntax should be used for string-like objects. By the same principle, collections of things like tuples and arrays should use [ ], ( ), { }.

Another big problem with string macros, which was raised earlier, is that language syntax should be parsed by the parser. Macros force ad-hoc sub-languages inside Julia with their own alternate parsers. This is bad for tools that need to parse Julia (syntax highlighting was mentioned) and it is just bad design because macros in general are problematic. My biggest pet peeve with Julia is that @sprintf is a macro and not a function. Other languages manage to make sprintf functions, why can't Julia? So my view is that we need fewer macros in the core language, not more.

@bermanmaxim
Copy link

@dcarrera your question about the @sprintf macro was very nicely addressed here on stackoverflow by @StefanKarpinski.

@StefanKarpinski
Copy link
Member Author

We've addressed much of the core issue here by changing the meaning of [a,b,c] to always do array construction. The whitespace-sensitive syntax remains an issue, but not a huge one. In 1.x we can revisit this and try out new syntaxes for this, and eventually could remove the whitespace sensitive syntax in 2.0, but this isn't going to happen for 1.0.

@andyferris
Copy link
Member

Just wanted to register my strong preference that none of the array literals (including matrices) do any form of concatenation. The concatenation behavior feels like an unnecessary MATLAB-ism to me. If we're building LinAlg to be block-matrix friendly, why not let [A B; C D] make a block matrix?

Regarding syntax changes, the syntax that appeals most to me is the one mentioned by @yuyichao, with commas and semicolons and no whitespace sensitivity. I can see this generalizing arbitrary dimensions by having other symbols like double-semicolon ;;, being whitespace invariant, allowing row matrices vs vector, etc.

# vector
[1, 2, 3, 4]

# matrix
[1, 2;
 3, 4]

# 3D array
[1, 2;
 3, 4;;

 5, 6;
 7, 8]

The only thing is I sometimes wonder if it would be nicer to do this in storage order not "looks like linear algebra" order. Otherwise [1, 2, 3, 4;] is a 1x4 matrix not a 4x1 matrix, it seems odd that the last semicolon can make such a drastic change.

The final thing I'd love is a literal for 0-dimensional arrays, as I find StaticArrays.Scalar extremely useful to control broadcast, etc. The possibility here is [a] makes a zero-D array and [a,] makes a 1D vector. I admit some users might find that obnoxious, but it has some logic to it (with , being the first dimensional separator and following the same rule as trailing ;, etc, plus we have to treat length-1 tuples with trailing ,, as in (a,) so there is a precedence.).

@nalimilan
Copy link
Member

+1 for the rules proposed by @andyferris, except for the 0D array part ([a,] sounds too cumbersome).

@fredrikekre
Copy link
Member

On the other hand [a,] would be consistent with length 1 tuples (a,).

@cdsousa
Copy link
Contributor

cdsousa commented Dec 4, 2017

Besides all the issues, whitespace sensitivity is so damn nicely lightweight 😢

[1 2 3
 4 5 6
 7 8 9]

vs

[1, 2, 3;
 4, 5, 6;
 7, 8, 9]

@tpapp
Copy link
Contributor

tpapp commented Feb 6, 2018

Related to this issue: the construction of n x 1 matrices comes up often (eg this and this just in the past month). Until this issue is resolved, would it make sense to make a FAQ item for this?

@vtjnash
Copy link
Member

vtjnash commented Dec 13, 2023

There is no compelling reason to make a breaking change now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
breaking This change will break code speculative Whether the change will be implemented is speculative
Projects
None yet
Development

No branches or pull requests