Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Push Notifications are not being resent after I/O problems are resolved. #14

Closed
agarajh opened this issue Nov 11, 2013 · 13 comments
Closed
Milestone

Comments

@agarajh
Copy link

agarajh commented Nov 11, 2013

I disabled any connection to the internet. By doing this while trying to push one notification i get the following error:

2013-11-11 19:18:51,928 | push | DEBUG | ApnsClientThread
|ApnsClientThread-1 beginning connection process.
2013-11-11 19:18:51,939 | push | ERROR | ApnsClientThread
|ApnsClientThread-1 failed to connect to APNs gateway.
java.nio.channels.UnresolvedAddressException: null

Pushy tries then infinitely to establish a connection to APNS.

I enabled the connection to the internet: Now Pushy logs following:

2013-11-11 19:18:51,939 | push | DEBUG | ApnsClientThread
|ApnsClientThread-1 beginning connection process.
2013-11-11 19:18:54,272 | push | DEBUG | ApnsClientThread
|ApnsClientThread-1 connected.
2013-11-11 19:18:54,273 | push | DEBUG | ApnsClientThread
|ApnsClientThread-1 waiting for TLS handshake.
2013-11-11 19:18:55,052 | push | DEBUG | ApnsClientThread
|ApnsClientThread-1 successfully completed TLS handshake.

So up to now everything is working like predicted. Pushy managed to reconnect to the APNS server after internet connection was active again.

Nevertheless, the push notification which was sent while the internet connection was inactive, is not being resent. It seems like the push notification was never enqueued. I am not getting any error here, still the old push notification gets lost forever. All other new push notification requests are being sent to APNS successfully.

Thank you a lot for your time and I hope I was clear enough to describe the issue i am facing.

Regards,
Henri

@jchambers
Copy link
Owner

To clarify, the steps here are:

  1. Create a new PushManager.
  2. Enqueue a notification
  3. Start the PushManager with no network connection
  4. Open a network connection, allowing the PushManager to connect

Is that right? Also, which version of Pushy are you using?

Thanks!

@agarajh
Copy link
Author

agarajh commented Nov 11, 2013

Hi, in my application I have a class which creates only ONE instance of Pushy PushManager for all the incoming requests.

Steps I am following:

1. Create a new PushManager.
2. Start the PushManager with no network connection. 
3. Then I enqueue a notification. 
4. Then open the network connection. 

Why should I start the PushManager after enqueieung a push notification?

I am using the 0.1.2.SNAPSHOT version, cause the 0.1.1 kills the main thread.

@jchambers
Copy link
Owner

Thanks for the details!

Why should I start the PushManager after enqueieung a push notification?

No need to do that. Sorry; I didn't mean to be prescriptive. I was just wondering if I had the sequence right.

@agarajh
Copy link
Author

agarajh commented Nov 11, 2013

I just noticed while testing following strange behaviour:

  1. I start the backend server.
  2. I stop internet connection
  3. Send a notification.
  4. Reconnect again-- notification IS being resent to APNS.

Now:

  1. I stop AGAIN the internet connection
  2. I send another notification
  3. Reconnect again.. No Notification IS being resent. However new requests are being sent to APNS.

It seems notifications are just being resent the first time a reconnection happens. Beginning from the second time, no old push notifications are being dequeued, resent..

@jchambers
Copy link
Owner

I'm having a hard time reproducing this; could I trouble you to set your logging level to trace and post the log output as a gist?

@agarajh
Copy link
Author

agarajh commented Nov 12, 2013

Hi, please find as gist the log output with logging level trace
https://gist.github.com/agarajh/57e20c55098aaad3d595

So I tried following:

  1. The first time at 2013-11-12 11:11:50,023 i sent a push notification with invalid registration id. There was no internet connection. At 2013-11-12 11:11:51,733 internet connection was enabled. The push notification is being resent. I am getting at 2013-11-12 11:12:02,028 an INVALID_TOKEN message from APNS. So push notification was resent and is working as expected.
  2. I try now to send a second notification with a VALID device token while there is no internet connection at 2013-11-12 11:12:28,338. Internet was disabled. I enable internet at 2013-11-12 11:12:51,184. Push notification is not being sent.

Best Regards, Henri

@jchambers
Copy link
Owner

Please pardon the delay in getting back to this issue. It appears that the reproduction case here is perhaps a bit simpler than initially reported:

  1. Establish a connection to the APNs gateway.
  2. Disable the internet connection. We'll still think the connection is open, and the write (which only goes to the OS-controlled outbound buffer, not necessarily to the wire) will succeed as long as the buffer isn't flooded.
  3. Close the connection. The data written to the buffer, but not actually sent, appears to be lost.

@agarajh in the gist you posted, these two lines are right next to each other:

2013-11-12 11:12:28,434 | drivexone-push  | TRACE | ApnsClientThread | ApnsClientThread-1 successfully wrote notification 0
2013-11-12 11:12:51,161 | drivexone-push  | DEBUG | ApnsClientThread | ApnsClientThread-1 waiting for connection to close.

I recognize those are ~15 seconds apart; do you know what triggered the connection closure at 2013-11-12 11:12:51,161? In my testing, the message will still be sent if the internet connection is restored before the TCP connection is closed.

@jchambers
Copy link
Owner

Also, @agarajh, as a point of clarification, is this a thing that happened to you in the wild, or were you just enabling/disabling your internet connection to simulate loss of connectivity?

@agarajh
Copy link
Author

agarajh commented Nov 27, 2013

Hi,
i was enabling and disabling internet connection exactly

Mit freundlichem Gruß

HENRI AGARAJ
Consultant

SHS VIVEON GmbH
Bennigsen-Platz 1 . 40474 Düsseldorf . Germany
T +49 211 913 133 - 70
F +49 211 913 133 - 10
M +49 162 29 79 – 436
[email protected]://SHS-VIVEON.com


Sitz der Gesellschaft: Neuss
Amtsgericht: Neuss HRB 16292
Geschäftsführung: Stefan Berndt von Bülow, Lars Gentara, Karl-Peter Schmid, Dr. Wolfgang Wilke


Die in dieser E-Mail enthaltenen Informationen sind vertraulich und ggf. rechtlich geschützt. Bitte benachrichtigen Sie den Absender, falls Sie nicht der beabsichtigte Empfänger sein sollten, und löschen Sie bitte diese Nachricht umgehend aus Ihrem System. Das unerlaubte Kopieren, die Offenlegung sowie die Weitergabe dieser E-Mail sind nicht gestattet.
This email may contain trade secrets or privileged, undisclosed or otherwise confidential information. If you have received this email in error, please inform us immediately and destroy the original transmittal. Any unauthorized copying, disclosure or distribution of this email is not permitted.

Am 27.11.2013 um 17:52 schrieb "Jon Chambers" <[email protected]mailto:[email protected]>:

Also, @agarajhhttps://github.com/agarajh, as a point of clarification, is this a thing that happened to you in the wild, or were you just enabling/disabling your internet connection to simulate loss of connectivity?


Reply to this email directly or view it on GitHubhttps://github.com//issues/14#issuecomment-29400189.

@jchambers
Copy link
Owner

I've done some poking at this; I think we're doing all we can here, and this is a fundamentally unresolvable problem given the design of the APNs protocol. All we know -- and all we CAN know -- is that we successfully handed the notification off to the OS. Because APNs never acknowledges notifications in the affirmative, we have no way of telling the difference between a notification that was successfully handled by the APNs gateway and a notification that never even arrived at the gateway.

I'm afraid the best we can do for now is document the issue (see c8e7fa1). Please let me know if you feel that I've misunderstood the issue or if you think there's a better solution to be had. Thanks kindly, and I'm sorry I don't have something better to offer!

EDIT: To clarify, I don't think this is a bug unique to Pushy. As far as I can tell, any APNs client is likely to have this issue.

@jpswain
Copy link

jpswain commented Mar 24, 2014

Hi @jchambers,

We are looking at using Pushy and wanted to clarify that I understand this thread.

Is the issue that @agarajh is talking about b/c Apple is not acknowledging receipt of notifications at the application level, and therefore that it is possible to experience an error while sending, and then end up in a situation where we don't know which notifications to resend?

Please forgive me if this is dumb or I'm totally missing something,
the main thing I want to clarify is this:
If I send all my notifications and do not catch any exceptions while enqueueing them, does that guarantee they were at least received by the APNS server without any TCP failures?

I was hoping to find in the Pushy lib source code clear evidence that if an error might have happened then it would definitely be thrown and propagated to the client code. I saw that for the 2 failures that @agarajh demonstrated in his gist, that he logged a Netty scoket channel doConnect error (this seems to imply that at very least we should definitely know if something might have gone wrong), but from looking at the source code (I have not experimented with Pushy hands-on myself yet), I don't see anywhere that Pushy informs the client code of the fact that notifications might have been lost.
(i'm specifically looking at the code around here:
log.error(String.format("%s failed to connect to APNs gateway.", this.getName()), connectFuture.cause());

Again, sorry if I've misunderstood, we are excited to try out your lib, but want to make sure we will be able to reliably confirm that no queued notifications are lost without us at least knowing that some notifications might have been lost.

Thanks!!
Jamie

@jchambers
Copy link
Owner

If I send all my notifications and do not catch any exceptions while enqueueing them, does that guarantee they were at least received by the APNS server without any TCP failures?

Well, yes and no. It's true that we won't necessarily know in a timely manner what happened to a notification after it left our control. That said, Pushy offers the following guarantees:

  • In v0.2:
    • If a notification is enqueued for sending and the PushManager is shut down without a timeout, all notifications will be delivered to and processed by the APNs gateway, rejected by the APNs gateway, or returned by the PushManager's shutdown method in a list of unsent notifications.
  • In v0.3 (to be released soon):
    • If a notification is taken from the public queue and the PushManager is shut down without a timeout, all notifications will be either delivered to and processed by the APNs gateway or rejected by the APNs gateway.

In both cases, we're able to offer those guarantees because we close connections by sending known-bad notifications to the gateway. Because notifications are processed by the gateway in order, we know that rejecting our known-bad notification means that everything before that notification was processed successfully, while everything sent after that notification needs to be re-sent. Until the gateway rejects a notification, we don't know anything about the state of any sent notifications.

So, in short, we will eventually know what happened to everything that went out the door.

I don't see anywhere that Pushy informs the client code of the fact that notifications might have been lost.

That's true, but again, that's because we can't tell in any way except by closing the connection with a known-bad notification. Three things can happen when we try to send a notification:

  1. We fail to write the notification; this is a local problem, and we re-enqueue the notification to be sent again later. Callers are not informed when this happens because we handle retransmission internally and will give the notification back at shutdown time if we couldn't ever get it out the door.
  2. The notification successfully goes out the door and is processed by the APNs gateway after some unknown amount of time. We won't know this is true until some later notification is rejected by the gateway. Callers are never informed if this happens.
  3. The notification successfully goes out the door, but is rejected by the APNs gateway after some unknown amount of time. We won't know this is true until we receive the rejection from the gateway, but this tells us what happened to all prior and subsequent notifications on that connection. Callers are informed via a RejectedNotificationListener in this case.

As for connection failure, I'd call your attention to #51. That will go in to v0.3 (again, soon).

Hope that helps!

@jchambers
Copy link
Owner

…and, to follow up, the case @agarajh ran into is a little weird. What happened there is that we got the notification out the door, but then got stuck in that "waiting for an unknown amount of time" phase. The shutdown in that case DID have a timeout, so the guarantees about notification state didn't apply.

Had the PushManager been shut down without a timeout, the shutdown attempt would have blocked until the internet connection was restored and the state of all notifications was known.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants