-
Notifications
You must be signed in to change notification settings - Fork 197
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IME support in actions #1683
Comments
More detail on The documentation for the win32 API has some details here: https://docs.microsoft.com/en-us/windows/win32/intl/composition-string and the proposed values here largely match those in that API. |
The Browser Testing and Tools Working Group just discussed The full IRC log of that discussion<AutomatedTester> topic: Actions IME support<AutomatedTester> github https://github.com//issues/1683 <AutomatedTester> github: https://github.com//issues/1683 <AutomatedTester> jgraham (IRC): This is also relevant to webdriver classic <AutomatedTester> ... the issue here is a proposal on how to handle ime input in webdriver <AutomatedTester> ... IME is input method editor <AutomatedTester> ... it is commonly used in languages where you can't type the characters directly <AutomatedTester> ... [describes examples] <AutomatedTester> ... there are a lot of web compat issues in editor libraries because they can't test IME <AutomatedTester> ... [describes input breakage in Gecko] <AutomatedTester> ... for those who have heard of Interop 22... part of that is working on interop in input <AutomatedTester> ... in webdriver, the lowest level inputs is actions that allows you to send through the keyboard, pointer events and so on <AutomatedTester> ... with IME you press a key and that intercepts and a different event is fired. e.g. A would change it to the keycode and then do composition <AutomatedTester> ... the webpage gets composition events <AutomatedTester> ... [explains different composition methods] <AutomatedTester> ... the proposal is we add a new input type called IME <AutomatedTester> ... this has 2 actions, `compositionUpdate` <AutomatedTester> ... the other action is `compositionEnd` <AutomatedTester> ... so the webdriver specific thing that's not clear how these things hook together <AutomatedTester> ... [explains IME and Keyboard] <AutomatedTester> q+ <jgraham> q? <BrandonWalderman> q+ <jgraham> ack <jgraham> AutomatedTester: Historically WebDriver (Selenium) had IME support built in. It was handled by actions trying to inject directly into the event queue. There was special C++ code required to handle it. That's why we didn't do this and focused on US keyboard input. We did allow actions to handle sending specific unicode characters so you could input final composed characters. <jgraham> AutomatedTester: Required specific install on the machine. <jgraham> AutomatedTester: Is it easier to implement now? <jgraham> AutomatedTester: High level actions seem OK, but is it implementaale? <AutomatedTester> ack next <jgraham> s/aa/ab/ <AutomatedTester> jgraham (IRC): this is a case benefits being supported directly in the browser <AutomatedTester> ... the proposal is it is at the moment... it's a mid layer proposal <AutomatedTester> ... we won't go to the OS IME <AutomatedTester> ... we will provide enough data to the browser so it could inject the relevant events <AutomatedTester> ... this should be implementable and it can be implemented in gecko <LanWei> q+ <AutomatedTester> ... [explains how we need to maintain some states] <AutomatedTester> ack next <AutomatedTester> Brandon Walderman: I support this feature request <AutomatedTester> ... we had an intern do some of this in Chromium for CDP <AutomatedTester> ... the building blocks are already in chromium so it's a case of adding this to chromedriver <AutomatedTester> ack next <AutomatedTester> Lan Wei: I was working on the actions implementation <AutomatedTester> ... we have looked at this and it's very hard to implement <AutomatedTester> ... could you explain the client API <AutomatedTester> jgraham (IRC): so from the point of view of webdriver user <AutomatedTester> ... it doesn't ever interact with an IME on the machine <AutomatedTester> ... we will emulate it <karlcow> q+ to ask about gecko on different platforms <AutomatedTester> Lan Wei: do you have language type as an input <AutomatedTester> jgraham (IRC): the proposal doesnt have a way to handle any configurations... e.g. different IMEs handle different combinations to get a different order of events <AutomatedTester> Lan Wei: do you have any plan on when we want to work on this API? <AutomatedTester> jgraham (IRC): since this is part of Interop 2022, there is pressure to get this done quickly <AutomatedTester> ... we would love feedback now <AutomatedTester> q? <AutomatedTester> ack next <Zakim> karlcow, you wanted to ask about gecko on different platforms <AutomatedTester> karlcow (IRC): I wanted to ask jgraham (IRC) ...do we need a different test per platform? <AutomatedTester> jgraham (IRC): if platform IMEs handle things different then a test per platform? <AutomatedTester> karlcow (IRC): how do you make this universal? <AutomatedTester> jgraham (IRC): this is very hard... <AutomatedTester> ... it won't adress all cases but it's an improvement since we have zero way to test <AutomatedTester> q? <AutomatedTester> Break for 15 minutes |
A list of test scenarios which this proposal is trying to solve will help to understand the scope. IME testing can be very large, but maybe knowing the minimum viable context, it would make it easier to evaluate the effort required for implementing this. |
Not directly addressing @karlcow's request yet, but some additional context: This proposal is intended to work at the level of providing a "virtual IME" whose state is entirely under the control of the test author. Therefore the intent is roughly that if we imagine a general data flow of the form physical input device → OS input handling → IME → browser application, this replaces everything to the left of "browser application" with "virtual IME". So the maximum amount of flexibility we could aim for is to be able to replicate any possible sequence of IME messages/events that the browser could get from the operating system. In practice of course this API is not OS-specific, somewhat higher level, and so it's not reasonable to expect to be able to simulate every possible case in the browser IME handling. But one way to judge whether we're likely to meet the testing requirements for webapps is to verify whether there are any codepaths on the browser side that are commonly triggered by real IMEs but could not be triggered by this API. What is definitively out of scope is being able to invoke specific real IMEs, or simulate input at the OS/hardware level. Although those things do have significant advantages in some cases, they require a very different approach from the current WebDriver virtual input handling. |
WebDriver is currently unable to simulate the action of an IME in user input.
These are widely used, particuarly when inputting scripts where there are far more available characters than keys on a keyboard. That means it's impossible to use WebDriver to adequetely test how web applications behave when inputting these scripts. The behaviour in the face of IME input is also an interopability problem for web browsers, and fixing this is seen as a high priority area for the web.
Conceptually an IME sits between the physical input layer and the application. Typically the IME is activated with some device input e.g. pressing a key on the keyboard. Once triggered the IME generates a candidate composed string that may be updated based on further input, and is at some point committed. During this time, the composed string is typically displayed in application, but styled in a way that makes it distinct from the final input. There is also typically IME-specific UI to suggest different possible completions, but this is quite platform-specific and will be considered out of scope.
In WebDriver the low-level input handling is done through the actions API. This models user input as a set of virtual input devices, which each have an internal state. At each point in time ("tick") an input device can either do nothing ("pause"), or can have an associated action that updates its internal state and causes the relevant events to be emitted to content (e.g. a
keyDown
action on akey
input device will update the internal WebDriver state to signify that the key is depressed, and emit akeydown
event to content).For a given input on a given device the IME can do nothing (i.e. just let the event pass through) or can intercept the event, update its internal state, and cause different events to be emitted instead. For example, consider pressing the "a" key. In the absence of an IME this will cause a
keydown
event withkeyCode
65
, akeypress
event, possibly variousinput
events, and finally akeyup
event also withkeyCode
65
. However if the IME is activated, we get akeydown
event withkeycode
229
, acompositionstart
event, acompositionupdate
event withdata
corresponding to the current IME input selection,input
events, and finally akeyup
event withkeycode
65
. Note in this example that the content never sees akeydown
event withkeycode
65: the fact that the IME intercepted the event changes the key events visible on the page.Later an input (or something not visible to the web page) may cause the composition to be committed, which corresponds to a
compositionend
event.IMEs can apply to non key input e.g. handwriting recognition is a form of IME that depends on pointer input. It may also depend on multiple kinds of input
In terms of the implementation inside the WebDriver spec, the obvious thing would be to add IME as a new kind of input source for actions. However, the fact that it's a layer between the "physical" input devices and the application makes this more complex; to handle cases like "key is pressed and intercepted by IME, other events happen, key is released" we need to a) specify which other inputs in a given tick are being intercepted by the IME and b) Handle the IME-generated state changes after all other inputs (maybe even right at the end of the tick: for something like
pointerMove
which can be spread out into multiple events over time it's not clear how things should work).So a possible proposal is as follows:
We add a new input source type
ime
. That has internal state which is the current composition string.The
ime
input source has two assocaited actions:compositionUpdate
andcompositionEnd
.compositionUpdate
is the main action for updating the composed string. It has the following properties:data
- A string containing the updated value of the composition string. If this is null (or the empty string?) we end the composition.clauses
(optional) - These represent sub-parts of the composition string. Each clause has alength
and atype
. The lengths must add up to the total length ofdata
. Suggested value oftype
are “caret
”, “rawInput
”, “converted
”, “notConverted
”, “targetConverted
” (TODO: clarify the semantics of these). In addition formatting hints may be specified accorind to how the IME would like the range to be handled. These areunderlineColor
,underlineStyle
,backgroundColor
,textColor
. If this is ommitted it's assumed that there's a single clause (TODO: details)handles
(optional) - The input source id for the input source that caused this change in the IME state. If this is provided the internal state of the referenced input source is updated, but the DOM events emitted are those appropriate to the IME instead (e.g. for a keyboard thekeyCode
property becomes229
). If this property is omitted the update to the IME state is not connected to any application-visible input source change (this corresponds to the situation where e.g. the user clicks on a composition string option in a window outside their browser window).A
compositionEnd
action causes the composed string to be emitted has the following properties:data
(optional) - The final composed string to insert. If omitted this is given by thedata
property of the previouscompositionupdate
action.(optional) - As for
compositionupdate`, if committing the composition happens in response to a content-visible input action, this is a reference to the device id for that action.An example of what it looks like on the wire when we press "a" on the keyboard, it generates a composed string "abc", it gets updated to "ABC" by something outside the browser, and it's committed with the space key:
The text was updated successfully, but these errors were encountered: