Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix AMO calibration 10 #529

Closed
valegagge opened this issue Oct 25, 2024 · 16 comments
Closed

Fix AMO calibration 10 #529

valegagge opened this issue Oct 25, 2024 · 16 comments
Assignees

Comments

@valegagge
Copy link
Member

valegagge commented Oct 25, 2024

During the experiments on the robot, we experienced a lot of issues related to the AMO calibration 10.

In this issue we want to address two of these problems.

Problem 1: after the second run of YRI the joint calibrated without reaching the hw-stop

We got to reproduce the problem on the AMO setup. see here.

for those who haven't access to that repo I report here what we noticed:

When calibrating our joint with this calibration type 10 I've seen that the first time I calibrate it the joint reaches the hard stop but if I close the yarprobotinterface and restart it I can see that the joint does not reach the hard stop position but it tells that it is correctly calibrated and this makes the offset and the zero position to be completely wrong. In order to make that not happening we need to switch off and on again the boards. Moreover, sometimes I can overcome the problem by just increasing the pwm given as parameter to the calibrator

Problem 2: the joint calibrates, but could not reach the startup position.

For example, on the robot, we noticed that the knee reached the hw-stop (leg fully extended) but it could not reach the startup position (bent-leg). We still aren't able to reproduce this behavior on the setup.

TIPS -How to fix the bug

Already with @MSECode we browsed the code and with the help of @ale-git we understood we need to put the correct value of this threshold.

It is important to notice that this type of calibration is used also for the calibration of motors that need a different threshold value .

We don't know how this bug affects the calibration behaviour, but for sure it is a bug and we need to fix it.

Dod

We want to:

  1. be able to find a procedure to reproduce the issues
  2. fix the issue
  3. test the fw on the setup and on the robot
  4. produce a PR

cc @MSECode

@MSECode
Copy link
Contributor

MSECode commented Nov 11, 2024

Today, we have done some test when calibrating the joint using the AMO encoder configured using calibration type 10.
Specifically, we focused on reproducing the bug number 1, which means that when we run the calibration without restarting the boards.
As previously observed we were able to get the joint calibrated but without reaching the hard stop limit and therefore with the zero at a different position from the desired one.
The raw data we have collected show exactly what we have described in the bug. As you can see in the first graph, we reach the hard-stop limit, which is our case is set to -100 degree and after that we move to the zero position. Moreover, it is possible to observe that the joint value given by the controller are following the raw values given by the encoder. However, one thing that it is not totally clear is related to the offset that we have between raw and processed values that is different from the 100 degrees we are setting.
Then, in the second graph, that shows the joint calibrating after just restarting the yri and not the boards. we can see that:

  • first of all, the processed joint values are not following the raw values in the first second (basically while getting the joint calibrated)
  • then, it is not clear why the joint values are starting from ~60 degrees, which is a random offset different from both the position reached by the joint before closing and from whatever set by us at configuration level

First calibration --> Hard limit reached but need to check why we have that offset on the raw values

Image

Image

Second calibration --> Hard limit not reached but joint calibrated and random offsets

Image

Image

@MSECode MSECode closed this as completed Nov 11, 2024
@MSECode MSECode reopened this Nov 11, 2024
@MSECode
Copy link
Contributor

MSECode commented Nov 11, 2024

Then, I've noticed that the zero value of each AbsEncoder instance was not reset to 0 in the init, but it is just initialized in the AbsEncoder constructor. I'm not sure if that was an error or if it was somehow chosen to do so for a particular reason.
Anyway, resetting it to zero led to removing the offset we used to have initially on the joint position when restarting the yri without restarting the boards.
To verify that, I've also tried to move the joint to ~-70 degrees before closing and letting it to go in park position.
From the images you can see the difference:

zero not reset in init() zero reset and moved to -70 zero reset and run after moved to -70
Image Image Image

However, it is still unclear why the joint position does not get updated while the raw position is changing when restarting just the yri. Probably is still due to the fact that we are not resetting some other variables, probably some flags?

@valegagge
Copy link
Member Author

Good catch @MSECode !

However, it is still unclear why the joint position does not get updated while the raw position is changing when restarting just the yri. Probably is still due to the fact that we are not resetting some other variables, probably some flags?

Probably yes! something related to the abs_encoder calibration flag.... this is just a guess...

@MSECode
Copy link
Contributor

MSECode commented Nov 13, 2024

Hi @valegagge,
since it is quite cumbersome to debug the code using debugger and brakepoints due to the fact that the yri thread stops when we are at the breakpoint, I've added some debug message to investigate parts of the code I was interested in.
Thus, debugging that code I've realized that the first time we start the yri (basically at each board restarting), we can read the joint position converted to icubDegrees on the DO phase of the controller and specifically in the method JointSet_do_odometry both during (in green) and after calibration (in blue).

Image

While if we restart the yri without switching off and on the boards it is clear to see from the logs underlines in this image that the joint position read by the odometry is always zero. While we are restarting to read only after the calibration phase, which as the other tests is not spanning till the hardstop position.
On the contrary the raw position read in s_eo_motioncontrol_updatedPositionsFromEncoders is always available.

Image

Thus, this test confirm what we were observing the other day through the dumped data.

After some other tests, I've noticed that when the boards are not restarted, the method AbsEncoder_position_init_aea gets continuously called also after that the calibration procedure ends. In this manner the value of the distance that we need for estimating the position gets cleaned out and thus I suppose that this is the core of the problem. I'll leave here the logs of the yri that show the problem.
In test2 you have more debug logs that highlight the method call that rise the problem.

log-calibration-10-noboardrestart-test2.log
log-calib-10-restart-test2.log
log-calibration-10-noboardrestart-test1.log
log-calib-10-restart-test1.log

@valegagge
Copy link
Member Author

valegagge commented Nov 15, 2024

For the second problem: we have some logs of the robot here.

@MSECode
Copy link
Contributor

MSECode commented Nov 15, 2024

After some tests and debug lines used we understood what is most likely to be the core of the problem related to the fact that sometimes the calibration process is failing, i.e. the joint is not reaching the hard-stop position.
Then, I'd like to anticipate that there are 2 other things that we have not fully understood yet and so that need to be investigated more. Anyway, they seems to be less important and not blocking for completing the calibration procedure.
Thus, regarding the core problem, we have found out that was basically due to the fact that at some restarts (undesired thing is that this is not completely deterministic --> one of the small problem I was talking about) the time for initializing the AbsEncoder object takes more time than when the EMS, actually the motion control, receives the command of starting the calibration. Therefore, this leads to a not correct initialization of the AbsEncoder, which cannot probably update well the position of the joint failing to reach the hard-stop position.
Now, I'll add some more details about the whole pipeline I'm referring to.
The calibration command is received on the fw by the motion-control service from a callback function triggered by the parametricCalibratorEth at this lines: https://github.com/robotology/icub-main/blob/master/src/libraries/icubmod/parametricCalibratorEth/parametricCalibratorEth.cpp#L668-L674.
Exploiting the debug trace window with the keil debugger and the embot::core functions I've found out that the calibration command is received by the motion-control ~170ms after the motion-control service is started. And one part of the problem is related to this timing, since I've seen that the whole initialization procedure of the AbsEncoder made on this method: https://github.com/robotology/icub-firmware/blob/master/emBODY/eBcode/arch-arm/embobj/plus/mc/AbsEncoder.c#L309-L341 might take more than that time. If that happens the calibration procedure finishes but setting the zero of the joint to an undesired position.
The questions now can be the following:

  1. why the initialization of the AbsEncoder does not have a fixed time execution? This is due to the fact that we are waiting to have at least 3 perfectly equal values of the raw position before saying that we are reading valid data and we set the not_initialized flag to FALSE only after this happens. During this time the joint is not moving but it is natural to understand that we can have fluctuation in the readings and since we are not currently using any threshold for accepting the validity of the 3 consecutive positions it might take some time to have those. Thus, we can think to introduce a sort of threshold in order to get rid of those variations but still not considering possible spikes.
  2. why the motion control is starting the calibrating procedure without having everything initialized? This is due to the fact that there's no check neither on icub-main nor on the firmware. So we can think to introduce it.

In order to test my thought I added a huge delay of 3 seconds to the sending of the calibration command on icub-main just to be sure that the AbsEncoder was already instantiated and to have a fixed time to check the logs and dumped values.
Then, I'll add here some of the logs I've collected from the trace window on keil. If you look at this debug line: DebugCode: So now o->state.bits.not_initialized is you can see that the relative time between the first time we enter the method AbsEncoder_position_init_aea and when we set o->state.bits.not_initialized to FALSE changes and sometimes it can be higher than the 170ms I was talking about (To check this time I just checked the difference on the logs between the timestamp we have at the log eo_motioncontrol_Start and DebugCode: Called AbsEncoder_start_hard_stop_calibrate at relative timestamp. Since those are timestamp related to the starting of the debug session you just need to do some math).

As a side note I would say that when the calibration was not completing correctly the joint also was skipping the parking at the stop of the session.
Than, the smaller problems I was referring to are the following:

  • why we have decided to accept at least 3 values and not less or more and why we never thought about adding a threshold to discard just the spikes
  • why sometimes is taking much more time than other times to find those equal values? (One order of magnitude sometimes)
  • why there's no check done by the calibrator that everything has already started
  • why we never experienced that (or we think it is like this) on the robot that using calibration 10 as well
  • why the joint does not reach the hard-stop position when the calibration starts before the initialization ends if the condition for completing the calibration type 10 is to have the joint, i.e. the encoder still. I supposed that I need to move until I reach hard stop position, i.e. I'm still and the encoder position, whatever it is, is always the same, independently of the initialization. Then, because of that, I would read wrongly but I should anyway reach the hard-stop.

As you can see in these logs, why the controller is considering the joint calibrated at this point since:

  1. the position is still changing
  2. we are not initialized already
  3. we did not reach the hard-stop

Image

Image

Finally, if you look at the logs the position we are reading in the initialization phase (which are in iCubDegrees) have really little variations so this tells us that the encoder at the end is reading fine.

The different sessions are divided by some white lines.

some-logs-for-calibration10-starting.log

So this is my analysis, we can know discuss about how to made the code more robust and how to correctly manage the state machine of the calibration also thinking about doing that with other calibration types and/or encoders.
This is the branch were I've added the debug lines if needed: https://github.com/MSECode/icub-firmware/tree/feature/sendFullResAmoRawValue

cc: @valegagge @maggia80

@maggia80
Copy link
Contributor

@MSECode thanks for the explanation. I would certainly add a threshold. There is already a threshold parameter used for the home position, we could use the same principle, inserting the threshold in one of the not-used parameters of the calibrator.

Image

@valegagge
Copy link
Member Author

Hi @MSECode ,
as already discussed f2f and @maggia80 suggested you can use a threshold instead of the 3 exact values. I think you can use the tolerance parameter in the encoder configuration that you already have in the ABSencoder object during the init phase since the MC has already received the configuration parameters for sure.

If during the tests you'll notice that for the init phase you need a different tolerance value, we can add it to the calibration parameters but will sound a little strange to me.

In addition, the dead_zone parameter is also in the configuration file, but it is related to the PID and, therefore, isn't suitable for this case.

Finally, I think could be interesting to understand why @ale-git implemented the check on the exact values instead of using a threshold.

@MSECode
Copy link
Contributor

MSECode commented Nov 18, 2024

That's good.
I was thinking to use the tolerance as well. Moreover, it is probably better to use a parameter that is related to the encoder more than to the calibration type.
For the doubt I still have and for more insights about the procedure we can have a talk with @ale-git

@MSECode
Copy link
Contributor

MSECode commented Nov 18, 2024

As a side note I would say that when the calibration was not completing correctly the joint also was skipping the parking at the stop of the session. Than, the smaller problems I was referring to are the following:

  • why we have decided to accept at least 3 values and not less or more and why we never thought about adding a threshold to discard just the spikes
  • why sometimes is taking much more time than other times to find those equal values? (One order of magnitude sometimes)
  • why there's no check done by the calibrator that everything has already started
  • why we never experienced that (or we think it is like this) on the robot that using calibration 10 as well
  • why the joint does not reach the hard-stop position when the calibration starts before the initialization ends if the condition for completing the calibration type 10 is to have the joint, i.e. the encoder still. I supposed that I need to move until I reach hard stop position, i.e. I'm still and the encoder position, whatever it is, is always the same, independently of the initialization. Then, because of that, I would read wrongly but I should anyway reach the hard-stop.

Considering those problems risen in this comment after a brief discussion w/ @valegagge, we have chosen to add some other debug lines to understand a couple of things more. Specifically, we realized from the trace logs of the previous runs that when the calibration was failing, that used to happen more or less exactly 1 second after the reception of the command from the callback. And considering that out time_window is 1000ms long, then I focused on checking the values of delta and partial_space when the method AbsEncoder_is_still returns TRUE.
As one can imagine, we discovered that we are actually returning TRUE after 1 second, therefore we are thinking to have reached the hard-stop, because position_sure, delta and thus partial_space are all zero. This happens because, if we are failing to initialize the AbsEcoder before receiving the calibration command, we are not updating the position_sure value and thus delta will remain to zero, which will always be smaller than partial_space, making the method AbsEncoder_is_still returning TRUE.
Thus, with this we are clarifying the doubts:

  • why the joint does not reach the hard-stop position when the calibration starts before the initialization ends if the condition for completing the calibration type 10 is to have the joint, i.e. the encoder still. I supposed that I need to move until I reach hard stop position, i.e. I'm still and the encoder position, whatever it is, is always the same, independently of the initialization. Then, because of that, I would read wrongly but I should anyway reach the hard-stop.

From this logs you can see how the method is exiting with TRUE:

DebugCode: In AbsEncoder is STILL: 1 at time: S11:m610:u766 with position_sure: 0, delta: 0 and partial_space: 0
DebugCode: In AbsEncoder is STILL: 1 at time: S11:m610:u766 with position_sure: 0, delta: 0 and partial_space: 0
[DEBUG] (EO? tsk11 @S11:m610:u944)-> {0x4000001 p16 0x0001, p64 0x0000000000000001, dev 0, adr 0}: DEBUG: tag01. INFO = SET CONTROLMODE

Furthermore, as you can see from this lines:

if (o->state.bits.not_initialized)
{
AbsEncoder_position_init(o, position);
o->velocity = 0;
return;
}
and
int16_t delta = position - o->position_sure;
, until we are not initialized we are skipping the updates of the value of position_sure. Thus, this finally explains the errors we were observing.

@MSECode
Copy link
Contributor

MSECode commented Nov 19, 2024

After discussion on the open points w/ @ale-git and @valegagge we clarify all the doubts that have been previously left open and applying some changes to the code we have now a procedure for the calibration type 10 that works always fine, even when we decide to restart or not the boards after the de-initialization of the services.
The changes we have brought to the fw can be summarized in the following points:

  • first of all the controller does not start to move the joint to the hard-stop for its calibration until the AbsEncoder is not initialized. Therefore, even if the calibration command is received at a certain point from icub-main (we left that part of the state machine as it is), the controller will continuo to loop in the wait_for_calibration_10 phase until the AbsEncoder is initialized
  • having the AbsEncoder initialized means that we were able to read at least 10 consecutive valid positions, where the valid position is considered as it is when the delta between the previous and the current position is smaller than the minimum tolerance, which is defined here: https://github.com/robotology/robots-configuration/blob/devel/ergoCubSN002/hardware/motorControl/left_arm-eb2-j0_1-mc_service.xml#L43 (that parameter was already available on the fw and we decided to use that since it is based not on the calibration itself but it is encoder dependent. This is better since the same calibration can be done with different types of encoder)
  • since now the check between previous and current position is done using the minimum tolerance instead of looking for the exact equality between those position we decide to rise the minimum number of consecutive valid positions from 3 to 10 (note that a new position position is calculated at each loop of the controller, thus at 1ms in the latest configuration)
  • when checking if the encoder reached the hard-stop limit we reduced the time window from 1 second to 200ms so that we can be faster and the space window from 12000 iCubDegrees to 1000 iCubDegrees, which is a more reasonable value and closer to the tolerance.

I've tested the modified on the bench setup we are using for tests and is working good.
I'll add here some logs showing the behaviour of the fw during 2 runs made without restarting the boards and a graph showing how the joint position in calibration is now updating following the raw positions. (The small displacement you notice between raw (rescaled to degrees) and converted positions got from the telemetry is due to the fact that, since we are using the AMO encoder as an incremental encoder, I had some flickering in reading the raw position at the zero we decided to set. Therefore, I transpose that small error for the whole conversion done in matlab.

Image

log-calibration10-fixed.log

@valegagge
Copy link
Member Author

Hi @MSECode ,
maybe it is worth explaining how you calculate the joint position from raw value.

In addition please add the script you use to plot the data in the code branch of https://github.com/icub-tech-iit/study-encoders/.

thanks

@MSECode
Copy link
Contributor

MSECode commented Nov 22, 2024

Hi @MSECode , maybe it is worth explaining how you calculate the joint position from raw value.

In addition please add the script you use to plot the data in the code branch of https://github.com/icub-tech-iit/study-encoders/.

thanks

So, the green dashed line in the graph, which is the raw position of the encoder rescaled to degrees, has been calculated with this formula:

-(wrapTo180((aux_raw(:) * 360 ) / (resolution_amo*amo_sectors) - offset_realDegrees_amo)) -startup_position_threshold
where we basically first rescaled the values to Degrees by divided for the full resolution of the AMO encoder (encoder resolution, i.e. 2^14, * number of sectors, i.e. 64) and then we wrapped those values between -180 and +180 (this is done because we are using those limits on the configuration of the joint for our experiments). Finally we removed that startup_position_threshold, which is the raw position, rescaled to degrees, of the joint when the setup is started (this is needed since we are using the AMO as an incremental encoder for these experiments).
Finally, the minus at the front, is needed for recovering from the fact that the encoder has a negative resolution in the configuration in order to see joint position that is positive to the kinematics of the system.

@MSECode
Copy link
Contributor

MSECode commented Nov 22, 2024

On the test bench we have observed that everything works as expected. Therefore, I'm gonna open a PR. However, I'll keep it in draft since we need to test the changes on the robot too. As a matter of fact, since we have modified some of the temporizations in the calibration phase and we just tested them on a setup composed of a single joint, we need to check that on the robot the calibration sequence of the joints is still kept. That is, in fact, a delicate part considering that, in order to minimize efforts on the links, we are moving the joint following a specific order.

@valegagge
Copy link
Member Author

Hi @MSECode,
please open an issue to address the test on the robot and one with low priority to study the change of position value on reception of the trajectory command. I'm referring to this:
Image

Thanks

@MSECode
Copy link
Contributor

MSECode commented Nov 26, 2024

Closing this issue.
Fixes described will be addressed in this issue where we are reporting the tests that are going to be carried out on the robot: #537
Problem 2 mentioned in the issue body will be instead addressed in this issue: #536

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants