Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixed slicing in Python interface #202

Merged
merged 3 commits into from
Dec 6, 2016
Merged

Fixed slicing in Python interface #202

merged 3 commits into from
Dec 6, 2016

Conversation

mitchellstern
Copy link
Contributor

@mitchellstern mitchellstern commented Nov 29, 2016

Issue:

In the Expression class, the most recent version of __getitem__ (introduced in this commit) accepted either an integer index (calling pick on that index) or a sequence of indices (calling pickrange on the first two indices). While selecting a single element from a matrix or higher-dimensional tensor using a sequence of indices would be a useful future addition, the semantics of pickrange are those of slicing rather than element selection.

Fix:

The __getitem__ method was modified to accept either an integer index as before or a slice object. Slice features that are currently unsupported give rise to exceptions with informative error messages.

Previous behavior:

>>> import dynet as dy
>>> v = dy.inputVector([0,1,2,3])

Single indices worked correctly:

>>> v[0].value()
0.0
>>> v[3].value()
3.0

Slices were not accepted:

>>> v[0:3].value()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_dynet.pyx", line 629, in _dynet.Expression.__getitem__ (_dynet.cpp:17694)
TypeError: 'slice' object has no attribute '__getitem__'

Index sequences were accepted with unexpected behavior:

>>> v[0,3].value()
[0.0, 1.0, 2.0]
>>> v[0,3,4].value()
[0.0, 1.0, 2.0]
>>> v[0,3,4,"test"].value()
[0.0, 1.0, 2.0]

New behavior:

>>> import dynet as dy
>>> v = dy.inputVector([0,1,2,3])

Single indices are accepted correctly:

>>> v[0].value()
0.0
>>> v[3].value()
3.0

Index sequences are no longer accepted:

>>> v[0,3].value()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_dynet.pyx", line 626, in _dynet.Expression.__getitem__ (_dynet.cpp:17660)
    assert isinstance(index, (int, slice))
AssertionError

Simple slices are accepted:

>>> v[0:3].value()
[0.0, 1.0, 2.0]

Informative error messages are given when attempting to use default endpoints or step sizes:

>>> v[:].value()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_dynet.pyx", line 631, in _dynet.Expression.__getitem__ (_dynet.cpp:17764)
    raise ValueError("Default start and stop indices not yet supported.")
ValueError: Default start and stop indices not yet supported.

>>> v[0:].value()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_dynet.pyx", line 631, in _dynet.Expression.__getitem__ (_dynet.cpp:17764)
    raise ValueError("Default start and stop indices not yet supported.")
ValueError: Default start and stop indices not yet supported.

>>> v[:3].value()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_dynet.pyx", line 631, in _dynet.Expression.__getitem__ (_dynet.cpp:17764)
    raise ValueError("Default start and stop indices not yet supported.")
ValueError: Default start and stop indices not yet supported.

>>> v[0:3:2].value()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "_dynet.pyx", line 633, in _dynet.Expression.__getitem__ (_dynet.cpp:17800)
    raise ValueError("Step sizes not yet supported.")
ValueError: Step sizes not yet supported.

Copy link
Contributor

@yoavg yoavg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi,
This looks good, but a few comments:

  • how about setting missing start and end values to 0 and the last element, respectively? (or just missing start to 0, in case getting the last element is hard to get)
  • do you support negative indices? if not, document it.
  • can you please run this through cython -a and see if any operation is particularly costly (yellow)?

Thanks!

@mitchellstern
Copy link
Contributor Author

mitchellstern commented Nov 30, 2016

Having the start index default to 0 would be easy to add in.

A default end index and negative indexing could also be supported, but require knowing the size of the vector. The current code suggests that this could be obtained in either of the following ways:

  • self.c().dim().size()
  • self.c().dim().rows()

For vectors, size and rows are the same, but if you anticipate extending indexing and slicing to matrices later on, then rows might be the more appropriate quantity here. Do you have a preference between the two? Also, do you have a sense of whether one of the above calls would be sufficiently fast, or whether there is a more direct way to get size or rows for an Expression in Python?

I'll take a look at the output of cython -a and see if the code can be optimized accordingly.

Thanks!

Edit: Size may not be appropriate when the batch dimension is greater than 1, so we probably want to use rows.

@mitchellstern
Copy link
Contributor Author

The latest commit includes support for negative indices and default slice arguments. All operations now use C integers so that the necessary comparisons and index adjustments are fast. In addition, the __getitem__ method will now only produce valid calls to pick and pickrange, providing more informative error messages and avoiding memory overflows that can occur when invalid arguments are given to pick or pickrange directly.

Running cython -a --cplus _dynet.pyx indicates that the only expensive operations involve constructing error messages and throwing errors; in light of the use of C integers, all normal control flow should be efficient.

@yoavg
Copy link
Contributor

yoavg commented Dec 6, 2016

Looks great, thanks!

@yoavg yoavg merged commit e6a5878 into clab:master Dec 6, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants