Skip to content
Andrew Mayers edited this page Mar 23, 2014 · 3 revisions

#Superscript Parsing Performance

I have been working on a project containing many strings sent over an existing API. A majority of these strings potentially included superscript and/or subscript text, and the API denoted this by using the HTML <sup> and <sub> tags. I hadn't needed to display any superscript or subscript text in iOS prior to this project, and I wanted to research others' implementation of this. The two most common current implementations of this use specific Unicode super/subscript characters or use NSAttributedString and manually setting the super/subscript ranges.

Enter iOS 7's new initWithData:options:documentAttributes:error: initializer for NSAttributedString. By using [[NSAttributedString alloc] initWithData:[string dataUsingEncoding:NSUnicodeStringEncoding] options:@{NSDocumentTypeDocumentAttribute:NSHTMLTextDocumentType} documentAttributes:nil error:nil] the process can be automated, which is ideal for working with dynamic data. I setup the app to take advantage of this, but quickly noticed some performance issues. In my case, I had approximately 1000 strings that had to be displayed in a table. Some strings were only a few words long, while others were several paragraphs long. Because the data was being displayed in a table I needed to calculate the heights of cells, which meant using boundingRectWithSize:options:context: on each row's attributed string. It also meant that I needed to create each of the attributed strings up front. With this configuration, it took approximately 40 seconds on an iPod Touch just to initialize attributed strings. Another issue with this configuration was that it had to be executed on the main thread, according to Apple's documentation:

The HTML importer should not be called from a background thread (that is, the options dictionary includes NSDocumentTypeDocumentAttribute with a value of NSHTMLTextDocumentType). It will try to synchronize with the main thread, fail, and time out.

Any app that blocks the main thread for a significant amount of time is a poor user experience and will be killed by the Watchdog. This meant that an alternative solution needed to be implemented.

###Solution It was clear that all of the delay was caused by initWithData:options:documentAttributes:error:. Because the implementation details of that method are private, the only way I could improve the performance was to create my own parser, which I have outlined below:

+ (NSAttributedString *)JAM_AttributedStringFromString:(NSString *)string withMainFont:(UIFont *)mainFont superscriptFont:(UIFont *)superscriptFont subscriptFont:(UIFont *)subscriptFont {
	NSMutableArray *superscriptRanges = [NSMutableArray new];
	NSMutableArray *subscriptRanges = [NSMutableArray new];

	NSString *openingScriptPattern = @"(<su[+bp]>)";
	NSString *closingScriptPattern = @"(</su[+bp]>)";

	NSRegularExpression *scriptRegex = [NSRegularExpression regularExpressionWithPattern:openingScriptPattern options:NSRegularExpressionCaseInsensitive error:nil];
	NSRegularExpression *closingScriptRegex = [NSRegularExpression regularExpressionWithPattern:closingScriptPattern options:NSRegularExpressionCaseInsensitive error:nil];

	NSRange range = NSMakeRange(0, string.length);
	NSTextCheckingResult *openingResult = [scriptRegex firstMatchInString:string options:0 range:range];
	NSTextCheckingResult *closingResult = [closingScriptRegex firstMatchInString:string options:0 range:range];

	while (openingResult && closingResult && openingResult.range.location != NSNotFound && closingResult.range.location != NSNotFound) {
    	NSRange finalScriptRange = NSMakeRange(openingResult.range.location, closingResult.range.location - openingResult.range.location - openingResult.range.length);
    	if ([[string substringWithRange:openingResult.range] isEqualToString:@"<sub>"]) {
        [subscriptRanges addObject:[NSValue valueWithRange:finalScriptRange]];
    	} else {
        	[superscriptRanges addObject:[NSValue valueWithRange:finalScriptRange]];
    	}

    	string = [string stringByReplacingCharactersInRange:openingResult.range withString:@""];
    	NSRange closingRange = NSMakeRange(closingResult.range.location - openingResult.range.length, closingResult.range.length);
    	string = [string stringByReplacingCharactersInRange:closingRange withString:@""];

    	range = NSMakeRange(0, string.length);
    	openingResult = [scriptRegex firstMatchInString:string options:0 range:range];
    	closingResult = [closingScriptRegex firstMatchInString:string options:0 range:range];
	}

	NSMutableAttributedString *attributedString = [[NSMutableAttributedString alloc] initWithString:string attributes:@{NSFontAttributeName:mainFont}];
	for (NSValue *value in superscriptRanges) {
    	NSRange range = [value rangeValue];
    	[attributedString addAttributes:@{NSFontAttributeName:superscriptFont, (NSString *)kCTSuperscriptAttributeName:@(1)} range:range];
	}

	for (NSValue *value in subscriptRanges) {
    	NSRange range = [value rangeValue];
    	[attributedString addAttributes:@{NSFontAttributeName:subscriptFont, (NSString *)kCTSuperscriptAttributeName:@(-1)} range:range];
	}

	return attributedString;
}

In addition to the poor performance issues, I was dissatisfied with the way super/script text was dispaying with initWithData:options:documentAttributes:error:. The text to be displayed as super/subscript was shifted up or down but not resized. This doesn't really conform to how most super/subscript text is rendered and also makes any lines that have superscript and subscript both shown around two times the height of a normal line. I consider this excessive, so my category allows the consumer to specify separate fonts for the superscript, subscript and body.

Performance Comparison

I setup a test project in this repo to quantify the performance difference between my implementation and Apple's. It runs 4 differently-sized strings through various numbers of iterations. I ran these tests on two different devices: the first is an iPhone 5s and the 2nd is a 5th generation iPod Touch, both running iOS 7.1.

Test String sizes
  • Small = 36 characters
  • Medium = 157 characters
  • Large = 3176 characters
  • Massive = 15736 characters

|Test | iPhone 5s initWithData in ms | iPhone 5s JAM_AttributedString in ms | iPod Touch initWithData in ms | iPod Touch JAM_AttributedString in ms | Times Improvement | | ------- | ---- | ---- | ---- | ---- | ---- | ---- | | 1 small | 124 | 1 | 27 | 1 | 74.5 | | 10 small | 73 | 1 | 222 | 6 | 54.0 | | 100 small | 794 | 13 | 2393 | 61 | 49.2 | | 1000 small | 11383 | 147 | 39671 | 634 | 69.0 | | 1 medium | 11 | 1 | 78 | 1 | 43.5 | | 10 medium | 75 | 2 | 227 | 11 | 28.1 | | 100 medium | 836 | 22 | 2463 | 116 | 28.6 | | 1000 medium | 11592 | 221 | 41443 | 1180 | 42.8 | | 1 large | 27 | 13 | 83 | 89 | 0.5 | | 10 large | 1025 | 141 | 845 | 899 | 3.1 | | 100 large | 683 | 1349 | 8689 | 9008 | -0.3 | | 1000 large | 32496 | 13186 | N/A* | N/A* | 0.7 | | 1 huge | 1350 | 331 | 2069 | 2197 | 1.5 |

*Note that the N/A for the iPod Touch in the 1000 large test is due to not being able to complete that test due to memory limitations. Also the 1ms in many of the tests was actually reported as 0, but I rounded it up to 1 because it took some amount of time.

In my specific use case, my attributed string generation times decreased from approximately 40 seconds to around 1 second, which, coupled with the fact that this processing can be done on a background thread, is a significant improvement on the the system's current implementation. I encourage you to try this implementation in your projects and see if your performance improves as dramatically as mine did. If you find any issues or have any suggestions, feel free to submit pull requests or file issues.

Clone this wiki locally