Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support of escape sequences #125

Merged
merged 2 commits into from
May 29, 2015
Merged

Conversation

ruben-ayrapetyan
Copy link
Contributor

Support of escape sequences with the exception of '\0' character and cases that depend on Unicode support.

Related issue: #50

@ruben-ayrapetyan ruben-ayrapetyan added normal parser Related to the JavaScript parser development Feature implementation labels May 28, 2015
@ruben-ayrapetyan ruben-ayrapetyan added this to the Core ECMA features milestone May 28, 2015
@ruben-ayrapetyan ruben-ayrapetyan changed the title Support escape sequences Support of escape sequences May 28, 2015
@galpeter
Copy link
Contributor

Would it be possible to add intentionally malformed escape sequences for the test cases?

@ruben-ayrapetyan
Copy link
Contributor Author

Currently, we could add the test cases with syntax errors to tests/jerry/fail.
Later, after eval and syntax error feedback would be supported, we could also implement them like the following:

try
{
 eval ('"\u;012"');
 assert (false);
} catch (e)
{
   assert (e instanceof SyntaxError);
}

* false - otherwise.
*/
static bool
is_line_terminator (ecma_char_t c) /**< character */
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need this check too in the RegExp engine. Please declare it in a header, so we don't have to duplicate it. :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@sand1k
Copy link
Contributor

sand1k commented May 29, 2015

LGTM.

@egavrin
Copy link
Contributor

egavrin commented May 29, 2015

@galpeter please check the PR.

// WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
// See the License for the specific language governing permissions and
// limitations under the License.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also add test cases for different string escapes: \t \n \r \b ...
as described in the http://www.ecma-international.org/ecma-262/5.1/#sec-7.8.4 SingleEscapeCharacter rule.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok.

@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-support-escape-sequences branch from 24b7d67 to d379aff Compare May 29, 2015 09:18
@ruben-ayrapetyan
Copy link
Contributor Author

@galpeter, @LaszloLango, I've updated pull request.

@ruben-ayrapetyan ruben-ayrapetyan assigned galpeter and unassigned sand1k May 29, 2015
…ings contained in source code buffer) to lexer token.

JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]
@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-support-escape-sequences branch from d379aff to 8dd45eb Compare May 29, 2015 10:46
break;
}

converted_char = (ecma_char_t) char_code;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving discussion from commit notes (d379aff) to the pull request note.

@galpeter: This will convert the unit32_t to unit8_t or unit16_t is this really what we wanted? I know that we don't have full unicode support for know. Maybe a comment to describe why is this good?

@ruben-ayrapetyan
In CONFIG_ECMA_CHAR_ASCII mode this would convert to uint8_t, ignoring high part of unicode byte pair;
and in CONFIG_ECMA_CHAR_UTF16 mode this would convert to uint16_t.
Would the comment with the text above be ok?
Maybe it is better to use uint16_t instead of uint32_t here?

@galpeter
Will this change after we have full unicode support?

@ruben-ayrapetyan
The function's code would definitely be changed, but the idea probably would be the same, i.e.:
ecma_char_t would be uint8_t for ascii mode, and uint16_t - for utf16.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Then the comment is ok & we should use uint16_t

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. I'll add the comment and change the type according to your request.

@galpeter
Copy link
Contributor

Other parts looks good to me.

…UL>") character and cases that depend on Unicode support.

JerryScript-DCO-1.0-Signed-off-by: Ruben Ayrapetyan [email protected]
@ruben-ayrapetyan ruben-ayrapetyan force-pushed the Ruben-support-escape-sequences branch from 8dd45eb to 8b28cac Compare May 29, 2015 11:03
@ruben-ayrapetyan ruben-ayrapetyan assigned egavrin and unassigned galpeter May 29, 2015
@egavrin
Copy link
Contributor

egavrin commented May 29, 2015

Ok, make push

@ruben-ayrapetyan ruben-ayrapetyan merged commit 8b28cac into master May 29, 2015
@egavrin egavrin deleted the Ruben-support-escape-sequences branch May 29, 2015 11:42
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
development Feature implementation normal parser Related to the JavaScript parser
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants