Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add SubStation Alpha (SSA) support #3060

Merged
merged 20 commits into from
Feb 1, 2021

Conversation

avelad
Copy link
Member

@avelad avelad commented Dec 23, 2020

Add SubStation Alpha (SSA) support

Format info: https://en.wikipedia.org/wiki/SubStation_Alpha

Note: SSA and ASS are supported

@avelad
Copy link
Member Author

avelad commented Dec 23, 2020

Before merging this, it is necessary to previously accept #3036 so that I can add a part that is the same for the conversion to WebVTT in src=.

@avelad
Copy link
Member Author

avelad commented Dec 29, 2020

Merge ready, it is now possible to review this PR.

@avelad
Copy link
Member Author

avelad commented Jan 12, 2021

@joeyparrish , do you think you could find a time this week to review this?

/** @type {!Array.<!shaka.extern.Cue>} */
const cues = [];
const parts = str.split(/\r?\n\s*\r?\n/);
for (const part of parts) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be easier to understand if you split the raw parsing and the name handling into separate parts. For example, have a parse method that produces a nested object map:

{
  "Script Info": {
    "Title": "Foo",
    "Original Script": ""
  },
  "V4+ Styles": {
    "Format": "..."
  }
}

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to think of a way to make it easier, and hope to come back tomorrow with a different implementation.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting a more generic parser that didn't check for tag names. For example, something like:

const ret = {};
let section = null;
for (const line of lines) {
  const match = /\[([^\]]+)\]/.exec(line);
  if (match) {
    section = match[1];
    ret[section] = {};
  } else if (/\s*;/.test(line)) {
    // Skip comments.
  } else {
    const parts = line.split(':');
    ret[section][parts[0].trim()] = parts.slice(1).join(':').trim();
    // You'll need to special-case Dialogue to support multiple cues.
  }
}

I would also suggest putting this in another function to convert the data buffer to the object. Then you can just use data['V4 Styles'] or data['Events'] later.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that it is a popular format, what is the advantage of doing what you propose? At this time, what is supported are known tags and those that we add to sections. Adding unknown sections I don't think has any advantage ... Sorry if I didn't understand you.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code "adds" unknown sections too by ignoring them in the if statements below. I was suggesting we separate the text parsing from the handling of the sections. Like you would split XML parsing from the reading of the elements. By splitting them up, you can focus on reading the tag names without seeing all the regex and string parsing.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, I still don't see it :(. Right now with the last changes the events and styles are already separated, and also the parsing of them is made independent. Adding a generic parse here does not add much since it is a known format.In your example above, script information is added to the object that is not really useful.
On the other hand, even if I do a generic parse, I still have to know the tags.
The execution time of my proposal and yours should be similar.

lib/player.js Show resolved Hide resolved
test/text/ssa_text_parser_unit.js Show resolved Hide resolved
@avelad
Copy link
Member Author

avelad commented Jan 21, 2021

@TheModMaker I have applied all your comments. Can you review again?

lib/text/ssa_text_parser.js Outdated Show resolved Hide resolved
@avelad
Copy link
Member Author

avelad commented Jan 21, 2021

Ready for review!

@@ -33,16 +33,16 @@ shaka.text.LrcTextParser = class {
* @export
*/
parseMedia(data, time) {
return shaka.text.LrcTextParser.getCues_(data, time.segmentEnd);
return shaka.text.LrcTextParser.getCues(data, time.segmentEnd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this just forwards, I suggest just using parseMedia directly.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We cannot do this because getCues is static and is necessary in the player.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of my generic suggestion was to avoid checking if (mimeType == 'foo') and instead use the parser interface directly. Doing something like this:

const factory = TextEngine.findParser(mimeType);
if (factory) {
    const obj = factory();
    cues = obj.parseMedia(data);
}

But you don't have to do that now. You could do it in another PR or I could do it instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm going to address it in another PR, I keep it on my radar. Thanks!

lib/text/lrc_text_parser.js Outdated Show resolved Hide resolved
lib/text/web_vtt_generator.js Show resolved Hide resolved
lib/text/ssa_text_parser.js Outdated Show resolved Hide resolved
/** @type {!Array.<!shaka.extern.Cue>} */
const cues = [];
const parts = str.split(/\r?\n\s*\r?\n/);
for (const part of parts) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was suggesting a more generic parser that didn't check for tag names. For example, something like:

const ret = {};
let section = null;
for (const line of lines) {
  const match = /\[([^\]]+)\]/.exec(line);
  if (match) {
    section = match[1];
    ret[section] = {};
  } else if (/\s*;/.test(line)) {
    // Skip comments.
  } else {
    const parts = line.split(':');
    ret[section][parts[0].trim()] = parts.slice(1).join(':').trim();
    // You'll need to special-case Dialogue to support multiple cues.
  }
}

I would also suggest putting this in another function to convert the data buffer to the object. Then you can just use data['V4 Styles'] or data['Events'] later.

@avelad avelad requested a review from TheModMaker January 25, 2021 16:04
@joeyparrish joeyparrish added this to the v3.1 milestone Jan 25, 2021
lib/text/ssa_text_parser.js Outdated Show resolved Hide resolved
lib/text/ssa_text_parser.js Outdated Show resolved Hide resolved
lib/text/ssa_text_parser.js Outdated Show resolved Hide resolved
lib/text/ssa_text_parser.js Outdated Show resolved Hide resolved
/** @type {!Array.<!shaka.extern.Cue>} */
const cues = [];
const parts = str.split(/\r?\n\s*\r?\n/);
for (const part of parts) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Your code "adds" unknown sections too by ignoring them in the if statements below. I was suggesting we separate the text parsing from the handling of the sections. Like you would split XML parsing from the reading of the elements. By splitting them up, you can focus on reading the tag names without seeing all the regex and string parsing.

@@ -33,16 +33,16 @@ shaka.text.LrcTextParser = class {
* @export
*/
parseMedia(data, time) {
return shaka.text.LrcTextParser.getCues_(data, time.segmentEnd);
return shaka.text.LrcTextParser.getCues(data, time.segmentEnd);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part of my generic suggestion was to avoid checking if (mimeType == 'foo') and instead use the parser interface directly. Doing something like this:

const factory = TextEngine.findParser(mimeType);
if (factory) {
    const obj = factory();
    cues = obj.parseMedia(data);
}

But you don't have to do that now. You could do it in another PR or I could do it instead.

@avelad avelad requested a review from TheModMaker January 27, 2021 19:16
@shaka-bot
Copy link
Collaborator

Test Failure:

Generating Closure dependencies...
Linting JavaScript...

/var/lib/jenkins/workspace/Manual PR Test (local-tests)/lib/text/ssa_text_parser.js
272:11  error  'alpha' is never reassigned. Use 'const' instead  prefer-const

✖ 1 problem (1 error, 0 warnings)
1 error and 0 warnings potentially fixable with the `--fix` option.

END-BUILD: FAILURE
Build step 'Execute shell' marked build as failure

@avelad
Copy link
Member Author

avelad commented Feb 1, 2021

Fixed lint error. Sorry....

@shaka-bot
Copy link
Collaborator

All tests passed!

@michellezhuogg michellezhuogg merged commit 0845843 into shaka-project:master Feb 1, 2021
@avelad avelad deleted the substation-alpha branch February 1, 2021 21:11
shaka-bot pushed a commit that referenced this pull request Feb 3, 2021
1. Added hdr as a property in stream when constructing.
Fixes build failure from commit
7137286 .
PR #3116
Issue #2813

2. Fixed the test error from commit
d3640d1 .
PR #3044
Issue #3029

3. Fixed the new line with no other arguments from commit
0845843 .
PR #3060

Change-Id: I5833e49c1a95172742c4ec820960c9c5a7bf0cca
@github-actions github-actions bot added the status: archived Archived and locked; will not be updated label Jul 25, 2023
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jul 25, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
status: archived Archived and locked; will not be updated
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants