-
Notifications
You must be signed in to change notification settings - Fork 8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Missing white spaces #5448
Comments
you can try to decrease the params |
I've tried 0.8, 1, 1.2, 1.8, 2 and the default 1.5 for det_db_unclip_ratio. The problem is that this works for some images and spoils the output for others. Below are some examples for the following clean image: 0.8:
1:
1.2:
1.5:
2:
|
@esraa-abdelmaksoud Thanks for using and providing valuable feedback on our products.The space problem is also a problem we hope to focus on in the next version. You can also try to set the params If it still can't solve your problem well, please look forward to our next version upgrade. |
Unfortunately, it didn't help enough and removed a good deal of the text. However, I would like to thank you for your great work. I'm looking forward to trying your next version. Thanks! :) |
The new version (2.5) does a much better job at handling white spaces, but the issue is still not resolved completely. There are situations where white spaces are missing. I have tried different values of |
Hi, I am facing the same issue. Using det_db_unclip_ration and use_dilation=True help in some extend. However, it affects other results. Any way to improve it? |
@tink2123 I am Facing the Same issue with Using |
I used PaddleOCR with these parameters, yielding favorable outcomes in space detection, even for more challenging cases. PaddleOCR(use_angle_cls=True, lang='en', ocr_version='PP-OCRv4', use_space_char=True) |
I solved the issue by adding more data with white spaces, such as images with multiple words containing white spaces between them. Then, I fine-tuned the model and now it works great. |
That is great! I concern that is that real, is that necessary to use whole original dataset for fine-tuning when we just want to improve some characters. |
Hello @nam-leduc Adding original data can be very useful in fine-tuning your model. This is because it enables the model to learn from various text variations, such as different fonts, text sizes, and image effects like blurriness. By doing so, your model's accuracy can be significantly improved when fine-tuning it. However, if you are fine-tuning your model for a specific use case, such as a particular data format or font, you can ignore the original dataset. Your model will be more accurate for similar data on which you fine-tuned it, as it was given limited data to learn from.
|
Hello @asif-ca |
@phuchm It is important to note that white space issues typically occur during the recognition phase rather than in text detection. To address this, it is recommended to fine-tune the recognizer model with additional data that includes white spaces. For example, using images containing white spaces between words can improve the model's ability to detect and recognize white spaces. |
@asif-ca would you be able to share some of the datasets you used to fine-tune? |
thank you this work good |
For Chinese and English interleaved , the space still missing... |
the spacing issue is still not solved |
when PP-OCRv4 will be available for "latin" (multilingual)? the spacing situation is very limiting.. |
I made Thai lang model. but paddleocr don't support the Thai lang yet. how can I use directly the trained thai model? |
System Environment: Windows 10
Version:v2.4
Related components:PP-OCR
Command Code:
PaddleOCR(use_angle_cls=True, lang='en', ocr_version='PP-OCR', use_space_char=True)
Hello,
Before anything, I'd like to say thank you for the great effort you exerted in the creation of this work.
I'm facing a problem that the OCR engine misses white spaces many times even though I'm setting use_space_char to true. The following are some of the images with the output.
In the following image, all spaces in the first line are missing:
In this image, lines 4,6, and 7 are missing spaces.
Is there any configuration I can do to overcome this issue?
The text was updated successfully, but these errors were encountered: