Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

recognition error improvement #28

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

keides2
Copy link
Collaborator

@keides2 keides2 commented Jul 26, 2019

@keides2 keides2 changed the title recognization error improvement recognition error improvement Jul 26, 2019
Copy link
Owner

@szaza szaza left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your pull request, now your proposal is clear. During the review I was thinking about your solution and finally, it seems for me like a workaround for the real problem. In my opinion the main problem is the accuracy of the trained model. With this solution we only put a threshold above the recognized classes in order to hide the possibly wrong values. Tiny-yolov2 is not so accurate, by using another model we can avoid this workaround.
Thank you very much for your contribution!

@@ -45,11 +45,21 @@
private BorderedText borderedText;
private long lastProcessingTimeMs;

// shimatani
private String lastRecognizedClass = "";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could create a new type to wrap these values. I'm not sure that the best solution is to declare several helper variables at the beginning of the file. Should all of them be class member?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lastRecognizedClass, matchCount, lastResult, and nowResult were made class members because it is necessary to refer to the previously held value each time the run () function is called in the background.

tts can be declared in private functions.

Since masg1 and msg2 need two to display in two lines and I can not think of a way to return two as the return value of the makeTts () function, I made them class members.

@@ -45,11 +45,21 @@
private BorderedText borderedText;
private long lastProcessingTimeMs;

// shimatani
private String lastRecognizedClass = "";
private String nowRecognizedClass = "";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nowRecognizedClass looks strange, probably currentRecognizedClass would be better. Probably, lastRecognizedClass and currentRecognizedClass could be stored in a single structure or array.

Copy link
Collaborator Author

@keides2 keides2 Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out that the currentRecognizedClass is better.
By the way, I reviewed the logic of continuous match.
In the last time, even if it was judged as continuous match, the display was not updated and the utterance was not made if it was the same as the previous display result, but this time, even if it is the same as the previous display result, the display is updated every time it is judged as continuous match I would like to make the smartphone speak.

The run () function is now as follows: I reduced the if statement.

    private void run () {
        final long startTime = SystemClock.uptimeMillis ();
        final List <Recognition> results = recognizer.recognizeImage (croppedBitmap);
        // lastProcessingTimeMs = SystemClock.uptimeMillis ()-startTime;
        overlayView.setResults (results);

        // shimatani
        Log.d (LOGGING_TAG, "Debug: runInBackground ()");

        if (! (results.isEmpty ())) {
            nowResult = results.get (0) .getTitle ();
            Log.d (LOGGING_TAG, String.format ("Find:% s", nowResult));

            if (lastResult.equals (nowResult)) {
                matchCount ++;
                if (matchCount> 3) {
                    // match 4 times
                    matchCount = 0;

                    Log.d (LOGGING_TAG, "Match 4 times:" + nowResult);
                    lastRecognizedClass = nowResult;

                    String tts = makeTts (nowResult);
                    Log.d (LOGGING_TAG, "makeTts ():" + tts);
                    if (! (tts.equals (""))) {
                        speak2 (results, tts);
                        tts = "";
                    }
                } else {
                    // nothing to do
                }
            } else {
                matchCount = 0;
            }
            lastResult = nowResult;
            Log.d (LOGGING_TAG, String.format ("matchCount:% d", matchCount));
        }

        requestRender ();
        computing = false;
    }


private String makeTts(String nowRecognizedClass) {
switch (nowRecognizedClass) {
case "C型リモコン": // 0
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please convert kanji characters to English ones to become readable for everybody.

Copy link
Collaborator Author

@keides2 keides2 Jul 30, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Display the trash bin number and talk the displayed contents.

            case "C-type remote control": // 0
                msgInf1 = "C-type remote control case gabage number is 13,";
                msgInf2 = "The printed circuit board is number 19";
                break;
            case "Hands and leather glove": // 1
                msgInf1 = "Garmour, cut resistant, skinned";
                msgInf2 = "No. 7";
                break;
            case "marker pen ": // 2
                msgInf1 = "marker, fluorescence, water, oil, correction pen,";
                msgInf2 = "A mechanical pencil, a ballpoint pen is number 13.";
                break;
            case "G-type remote control": // 3
                msgInf1 = "G-type remote control case garbage number is 13,";
                msgInf2 = "The printed circuit board is number 19";
                break;
            case "E-type remote control": // 4
                msgInf1 = "E-type remote control case garbage number is 13,";
                msgInf2 = "The printed circuit board is number 19";
                break;
            case "Button battery": // 5
                msgInf1 = "The button battery that does not contain mercury is number 23.";
                msgInf2 = "The battery is also number 23";
                break;
            case "dry cell": // 6
                msgInf1 = "The dry cell is number 23";
                msgInf2 = "The button battery that does not contain mercury is also number 23.";
                break;
            case "tie wrap": // 7
                msgInf1 = "White tie wrap is 10";
                msgInf2 = "A non-white tie wrap is number 12.";
                break;
            case "PP band": // 8
                msgInf1 = "PP band is number 9";
                msgInf2 = "";
                break;
            default:
                msgInf1 = "";
                msgInf2 = "";
                break;

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on very individual circumstances.

private String lastRecognizedClass = "";
private String nowRecognizedClass = "";
private String tts = "";
private String msgInf1 = "";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What kind of information is represented in the msgInf1 and msgInf2 variables?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because the descriptive text has become long.
The second line in the image below is msgInf1 and the third line is msgInf2.
https://github.com/keides2/android-yolo-v2/blob/master/sample/%E2%98%85%E7%B2%98%E7%9D%80%E3%83%86%E3%83%BC%E3%83%97%E3%81%A8%E3%83%9C%E3%82%BF%E3%83%B3%E9%9B%BB%E6%B1%A02.png

private String msgInf2 = "";
private int matchCount = 0;
private String lastResult = "";
private String nowResult = "";
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nowResult looks strange, currentResult would be better name. Probably, lastResult and currentResult could be stored in the same structure (class, array,...)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is fine as you say.


String msgInf = makeTts(nowRecognizedClass);

lines.add(msgInf1);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need msgInf1 and msgInf2?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is because the descriptive text has become long.
The second line of the display in the lower screen is msg1, the third line is msg2.

img


tts = makeTts(nowRecognizedClass);
Log.d(LOGGING_TAG, "makeTts(): " + tts);
if (!(tts.equals(""))) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the default value for tts is null then you don't need to check its value with equals.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Depending on the specifications of the makeTts()function, the return value may be null. In this case, I made it so as not to talk. (I made sure that the speak2() function does not get an error)

tts = makeTts(nowRecognizedClass);
Log.d(LOGGING_TAG, "makeTts(): " + tts);
if (!(tts.equals(""))) {
speak2(results, tts);
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

speak2 method name looks very strange. Is there speak1, speak2, speak3...? What is the difference between the speak methods?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have prepared speak2 () function to speak two lines of message in Japanese.
I thought it would be better to change the name of the function.

TextToSpeechActivity.java

    protected void speak(List<Recognition> results) {
        if (!(results.isEmpty() || lastRecognizedClass.equals(results.get(0).getTitle()))) {
            lastRecognizedClass = results.get(0).getTitle();
            if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
                textToSpeech.speak(lastRecognizedClass, TextToSpeech.QUEUE_FLUSH, null, null);
            } else {
                textToSpeech.speak(lastRecognizedClass, TextToSpeech.QUEUE_FLUSH, null);
            }
            Log.d(LOGGING_TAG, "speak: " + lastRecognizedClass);    // shimatani
        }
    }

    protected void speak2(List<Recognition> results, String tts) {
        this.tts = tts;
        Log.d(LOGGING_TAG, "speak2 entry: " + this.tts);
        if (!(results.isEmpty() || this.tts.equals(results.get(0).getTitle()))) {
            if (Build.VERSION.SDK_INT >= Build.VERSION_CODES.LOLLIPOP) {
                textToSpeech.speak(this.tts, TextToSpeech.QUEUE_FLUSH, null, null);
            } else {
                textToSpeech.speak(this.tts, TextToSpeech.QUEUE_FLUSH, null);
            }
            Log.d(LOGGING_TAG, "speak2: " + this.tts);    // shimatani
        }
    }

}
}
} else {
// nothing to do
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So you do not even have to leave here this else clause.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I followed the old Japanese C language coding standards. I also think that it is unnecessary now.


if (!(results.isEmpty()) && lastResult.equals(nowResult)) {
matchCount++;
if (matchCount > 3) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks for me more like a workaround to have a threshold above the recognized classes. The main problem is either the model is not well trained thus it often gives you bad results or the threshold is not properly set in your configuration file.
Another question is why you chose the 3 magic number as comparison value?
Unfortunately, I cannot consider these changes as a good solution for the real problem.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As you say, I think this is not an essential improvement.

I think that it is originally to make a model with high accuracy, but I have taken easier improvement measures than improving the accuracy of the model.

During operation I noticed that this application was immediately judging if it detected an object even before holding it over the object.
Therefore, I tried to judge it as an object (speaking) when it matched four times (the value for comparison is 3).

I will reselect the images for learning and try to improve the accuracy of the model. Please let me know if you know any other effective method.

@keides2
Copy link
Collaborator Author

keides2 commented Nov 3, 2019

Hi,
Excuse me for a personal topic. I am currently traveling to Central Europe and will go to Budapest tomorrow. By the way, the name of the bus driver is zoltan!

@szaza
Copy link
Owner

szaza commented Nov 3, 2019

Yes, that easily can happen because Zoltan is a very common hungarian person name. There are many Zoltans also in Romania because Transilvania was part of Hungary before the Second World War, so there are many hungarian people in Transilvania as well. Enjoy your trip in Budapest, that is a very beautiful city. BTW. what are you doing in Budapest?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants