-
Notifications
You must be signed in to change notification settings - Fork 87
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
recognition error improvement #28
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for your pull request, now your proposal is clear. During the review I was thinking about your solution and finally, it seems for me like a workaround for the real problem. In my opinion the main problem is the accuracy of the trained model. With this solution we only put a threshold above the recognized classes in order to hide the possibly wrong values. Tiny-yolov2 is not so accurate, by using another model we can avoid this workaround.
Thank you very much for your contribution!
@@ -45,11 +45,21 @@ | |||
private BorderedText borderedText; | |||
private long lastProcessingTimeMs; | |||
|
|||
// shimatani | |||
private String lastRecognizedClass = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We could create a new type to wrap these values. I'm not sure that the best solution is to declare several helper variables at the beginning of the file. Should all of them be class member?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lastRecognizedClass, matchCount, lastResult, and nowResult were made class members because it is necessary to refer to the previously held value each time the run () function is called in the background.
tts can be declared in private functions.
Since masg1 and msg2 need two to display in two lines and I can not think of a way to return two as the return value of the makeTts () function, I made them class members.
@@ -45,11 +45,21 @@ | |||
private BorderedText borderedText; | |||
private long lastProcessingTimeMs; | |||
|
|||
// shimatani | |||
private String lastRecognizedClass = ""; | |||
private String nowRecognizedClass = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nowRecognizedClass
looks strange, probably currentRecognizedClass
would be better. Probably, lastRecognizedClass and currentRecognizedClass could be stored in a single structure or array.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It turns out that the currentRecognizedClass is better.
By the way, I reviewed the logic of continuous match.
In the last time, even if it was judged as continuous match, the display was not updated and the utterance was not made if it was the same as the previous display result, but this time, even if it is the same as the previous display result, the display is updated every time it is judged as continuous match I would like to make the smartphone speak.
The run () function is now as follows: I reduced the if statement.
private void run () {
final long startTime = SystemClock.uptimeMillis ();
final List <Recognition> results = recognizer.recognizeImage (croppedBitmap);
// lastProcessingTimeMs = SystemClock.uptimeMillis ()-startTime;
overlayView.setResults (results);
// shimatani
Log.d (LOGGING_TAG, "Debug: runInBackground ()");
if (! (results.isEmpty ())) {
nowResult = results.get (0) .getTitle ();
Log.d (LOGGING_TAG, String.format ("Find:% s", nowResult));
if (lastResult.equals (nowResult)) {
matchCount ++;
if (matchCount> 3) {
// match 4 times
matchCount = 0;
Log.d (LOGGING_TAG, "Match 4 times:" + nowResult);
lastRecognizedClass = nowResult;
String tts = makeTts (nowResult);
Log.d (LOGGING_TAG, "makeTts ():" + tts);
if (! (tts.equals (""))) {
speak2 (results, tts);
tts = "";
}
} else {
// nothing to do
}
} else {
matchCount = 0;
}
lastResult = nowResult;
Log.d (LOGGING_TAG, String.format ("matchCount:% d", matchCount));
}
requestRender ();
computing = false;
}
|
||
private String makeTts(String nowRecognizedClass) { | ||
switch (nowRecognizedClass) { | ||
case "C型リモコン": // 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please convert kanji characters to English ones to become readable for everybody.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Display the trash bin number and talk the displayed contents.
case "C-type remote control": // 0
msgInf1 = "C-type remote control case gabage number is 13,";
msgInf2 = "The printed circuit board is number 19";
break;
case "Hands and leather glove": // 1
msgInf1 = "Garmour, cut resistant, skinned";
msgInf2 = "No. 7";
break;
case "marker pen ": // 2
msgInf1 = "marker, fluorescence, water, oil, correction pen,";
msgInf2 = "A mechanical pencil, a ballpoint pen is number 13.";
break;
case "G-type remote control": // 3
msgInf1 = "G-type remote control case garbage number is 13,";
msgInf2 = "The printed circuit board is number 19";
break;
case "E-type remote control": // 4
msgInf1 = "E-type remote control case garbage number is 13,";
msgInf2 = "The printed circuit board is number 19";
break;
case "Button battery": // 5
msgInf1 = "The button battery that does not contain mercury is number 23.";
msgInf2 = "The battery is also number 23";
break;
case "dry cell": // 6
msgInf1 = "The dry cell is number 23";
msgInf2 = "The button battery that does not contain mercury is also number 23.";
break;
case "tie wrap": // 7
msgInf1 = "White tie wrap is 10";
msgInf2 = "A non-white tie wrap is number 12.";
break;
case "PP band": // 8
msgInf1 = "PP band is number 9";
msgInf2 = "";
break;
default:
msgInf1 = "";
msgInf2 = "";
break;
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It depends on very individual circumstances.
private String lastRecognizedClass = ""; | ||
private String nowRecognizedClass = ""; | ||
private String tts = ""; | ||
private String msgInf1 = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What kind of information is represented in the msgInf1
and msgInf2
variables?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because the descriptive text has become long.
The second line in the image below is msgInf1
and the third line is msgInf2
.
https://github.com/keides2/android-yolo-v2/blob/master/sample/%E2%98%85%E7%B2%98%E7%9D%80%E3%83%86%E3%83%BC%E3%83%97%E3%81%A8%E3%83%9C%E3%82%BF%E3%83%B3%E9%9B%BB%E6%B1%A02.png
private String msgInf2 = ""; | ||
private int matchCount = 0; | ||
private String lastResult = ""; | ||
private String nowResult = ""; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nowResult
looks strange, currentResult
would be better name. Probably, lastResult and currentResult could be stored in the same structure (class, array,...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is fine as you say.
|
||
String msgInf = makeTts(nowRecognizedClass); | ||
|
||
lines.add(msgInf1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we need msgInf1 and msgInf2?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is because the descriptive text has become long.
The second line of the display in the lower screen is msg1, the third line is msg2.
|
||
tts = makeTts(nowRecognizedClass); | ||
Log.d(LOGGING_TAG, "makeTts(): " + tts); | ||
if (!(tts.equals(""))) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If the default value for tts is null then you don't need to check its value with equals
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Depending on the specifications of the makeTts()
function, the return value may be null
. In this case, I made it so as not to talk. (I made sure that the speak2()
function does not get an error)
tts = makeTts(nowRecognizedClass); | ||
Log.d(LOGGING_TAG, "makeTts(): " + tts); | ||
if (!(tts.equals(""))) { | ||
speak2(results, tts); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
speak2
method name looks very strange. Is there speak1, speak2, speak3...? What is the difference between the speak methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have prepared speak2 () function to speak two lines of message in Japanese.
I thought it would be better to change the name of the function.
TextToSpeechActivity.java
protected void speak(List<Recognition> results) {
if (!(results.isEmpty() || lastRecognizedClass.equals(results.get(0).getTitle()))) {
lastRecognizedClass = results.get(0).getTitle();
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
textToSpeech.speak(lastRecognizedClass, TextToSpeech.QUEUE_FLUSH, null, null);
} else {
textToSpeech.speak(lastRecognizedClass, TextToSpeech.QUEUE_FLUSH, null);
}
Log.d(LOGGING_TAG, "speak: " + lastRecognizedClass); // shimatani
}
}
protected void speak2(List<Recognition> results, String tts) {
this.tts = tts;
Log.d(LOGGING_TAG, "speak2 entry: " + this.tts);
if (!(results.isEmpty() || this.tts.equals(results.get(0).getTitle()))) {
if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
textToSpeech.speak(this.tts, TextToSpeech.QUEUE_FLUSH, null, null);
} else {
textToSpeech.speak(this.tts, TextToSpeech.QUEUE_FLUSH, null);
}
Log.d(LOGGING_TAG, "speak2: " + this.tts); // shimatani
}
}
} | ||
} | ||
} else { | ||
// nothing to do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So you do not even have to leave here this else
clause.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed the old Japanese C language coding standards. I also think that it is unnecessary now.
|
||
if (!(results.isEmpty()) && lastResult.equals(nowResult)) { | ||
matchCount++; | ||
if (matchCount > 3) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks for me more like a workaround to have a threshold above the recognized classes. The main problem is either the model is not well trained thus it often gives you bad results or the threshold is not properly set in your configuration file.
Another question is why you chose the 3 magic number as comparison value?
Unfortunately, I cannot consider these changes as a good solution for the real problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you say, I think this is not an essential improvement.
I think that it is originally to make a model with high accuracy, but I have taken easier improvement measures than improving the accuracy of the model.
During operation I noticed that this application was immediately judging if it detected an object even before holding it over the object.
Therefore, I tried to judge it as an object (speaking) when it matched four times (the value for comparison is 3).
I will reselect the images for learning and try to improve the accuracy of the model. Please let me know if you know any other effective method.
Hi, |
Yes, that easily can happen because Zoltan is a very common hungarian person name. There are many Zoltans also in Romania because Transilvania was part of Hungary before the Second World War, so there are many hungarian people in Transilvania as well. Enjoy your trip in Budapest, that is a very beautiful city. BTW. what are you doing in Budapest? |
see #18 (comment)