-
-
Notifications
You must be signed in to change notification settings - Fork 6
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Failed assertion 'tream' == line when parsing this PDF #50
Comments
I would need to check the specific pdf to see what is wrong but anyway this
method would not work for extracting the text as text is not contained
within the resources block but rather as part of the page's content stream.
I will look into the assertion though as that is unexpected. This will
probably take a few days though
…On Sat, 6 Jan 2024, 22:53 asedilloglatt, ***@***.***> wrote:
I am trying to search the text of this PDF I attached
3776_A_City_Council_24-01-09_Notice_and_Agenda.pdf
<https://github.com/NicolaVerbeeck/dart_pdf_reader/files/13851461/3776_A_City_Council_24-01-09_Notice_and_Agenda.pdf>
Here is the function I wrote to search:
static Future pdfContainsText(
String pdfUriString, String searchText) async {
if (pdfUriString.isEmpty || searchText.isEmpty) {
return true;
}
print(pdfUriString);
return await http
.get(Uri.parse(pdfUriString))
.then((response) {
print(response.contentLength);
return PDFParser(ByteStream(response.bodyBytes)).parse();
})
.then((doc) => doc.catalog)
.then((catalog) => catalog.getPages())
.then((pages) async {
String allText = '';
final firstPage = pages.getPageAtIndex(0);
PDFPageNode? node = firstPage;
while (node != null) {
print(await node.resources);
allText += (await node.resources).toString();
node = node.parent;
}
return allText;
})
.then((pdfTextContent) => pdfTextContent.contains(searchText));
}
I am getting this error when i run my code.
Unhandled exception:
'package:dart_pdf_reader/src/parser/pdf_object_parser.dart': Failed
assertion: line 259 pos 12: ''tream' == line': is not true.
#0 _AssertionError._doThrowNew (dart:core-patch/errors_patch.dart:51:61)
errors_patch.dart:51
#1 <#1>
_AssertionError._throwNew (dart:core-patch/errors_patch.dart:40:5)
errors_patch.dart:40
#2 <#2>
PDFObjectParser._parseStream
(package:dart_pdf_reader/src/parser/pdf_object_parser.dart:259:12)
pdf_object_parser.dart:259
#3 <#3>
PDFObjectParser.parse
(package:dart_pdf_reader/src/parser/pdf_object_parser.dart:29:28)
pdf_object_parser.dart:29
#4 <#4>
parseTrailer (package:dart_pdf_reader/src/parser/pdf_parser.dart:63:10)
pdf_parser.dart:63
—
Reply to this email directly, view it on GitHub
<#50>, or
unsubscribe
<https://github.com/notifications/unsubscribe-auth/AB3LA37OXFZV5XYMMNEJSS3YNHBU5AVCNFSM6AAAAABBP2LB5WVHI2DSMVQWIX3LMV43ASLTON2WKOZSGA3DQOBVGA3TMNQ>
.
You are receiving this because you are subscribed to this thread.Message
ID: ***@***.***>
|
NicolaVerbeeck
added a commit
that referenced
this issue
Jan 7, 2024
NicolaVerbeeck
added a commit
that referenced
this issue
Jan 7, 2024
Release 0.5.1 will fix parsing the PDF. Though as I said, the text is not contained in the resources (which contains things like fonts and images) |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
I am trying to search the text of this PDF I attached 3776_A_City_Council_24-01-09_Notice_and_Agenda.pdf
Here is the function I wrote to search:
static Future pdfContainsText(
String pdfUriString, String searchText) async {
if (pdfUriString.isEmpty || searchText.isEmpty) {
return true;
}
}
I am getting this error when i run my code.
Unhandled exception:
'package:dart_pdf_reader/src/parser/pdf_object_parser.dart': Failed assertion: line 259 pos 12: ''tream' == line': is not true.
#0 _AssertionError._doThrowNew (dart:core-patch/errors_patch.dart:51:61)
errors_patch.dart:51
#1 _AssertionError._throwNew (dart:core-patch/errors_patch.dart:40:5)
errors_patch.dart:40
#2 PDFObjectParser._parseStream (package:dart_pdf_reader/src/parser/pdf_object_parser.dart:259:12)
pdf_object_parser.dart:259
#3 PDFObjectParser.parse (package:dart_pdf_reader/src/parser/pdf_object_parser.dart:29:28)
pdf_object_parser.dart:29
#4 parseTrailer (package:dart_pdf_reader/src/parser/pdf_parser.dart:63:10)
pdf_parser.dart:63
The text was updated successfully, but these errors were encountered: