Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API usage by external users #1154

Open
tovari opened this issue Aug 10, 2021 · 5 comments
Open

API usage by external users #1154

tovari opened this issue Aug 10, 2021 · 5 comments
Labels

Comments

@tovari
Copy link

tovari commented Aug 10, 2021

#1026 implementation allows to track API usage on each endpoints. The dashboard tracks also the api calls made by the frontend, which makes the external API usage not trackable.
We need also the ability to track only the external API calls.

@gulfaraz
Copy link
Member

I counted API requests made from each IP addresses to learn that all go-api requests are made from the same IP address 0.0.0.0

screenshot-portal azure com-2021 08 20-15_12_24
The above chart was created using the below query with data from the last 7 days.

requests
| summarize hits = count() by client_IP
| render columnchart

I investigated further by making some GET requests for sample logs. There are client details which the current analytics system captures. The screenshots below are direct API hits made from my laptop in The Netherlands.

Screenshot 2021-08-20 at 15 08 42

There appears to be some masking going on which removes my info before the request reaches the Django app.

@batpad my guess is this is happening either in the docker network layer or the load balancer (as you suggested)

Alternatively, a less graceful approach is to explicitly tag each call made from the go-frontend in fetch. Then in the go-api we capture this as a custom dimension.

@nanometrenat
Copy link
Contributor

FYI previous ticket #572 (comment) speaks to what logs are available on the Django servers - I don't have access to the IM mailbox anymore to check, but I'm pretty sure we managed to get a list of IP addresses from those logs at that time (Feb 2020) - the problem we had was that we couldn't differentiate between API calls from the user's browser (i.e. from using the site) vs API calls via other means.

@batpad
Copy link
Collaborator

batpad commented Aug 23, 2021

@gulfaraz do we know exactly where these logs are derived from? This would make sense for logs that were being emitted by the Django App. However, in these cases where there's something masking the originating IP address there "should" always be an X-Forwarded-For header added that should contain the real IP address. From rough reading online, it seems like the Azure logs should use the X-Forwarded-For header to determine the actual Client IP when available, but of course, this is not working for us some-how.

This would take a bit more investigation - it could possibly be one of a few different things:

  • In the best case scenario, just a change in filter to use the X-Forwarded-For header to determine the actual Client IP
  • In the more likely case, the X-Forwarded-For header is either not being applied correctly, or being dropped by the web-server

If the logs above are parsing the access logs generated by the gunicorn server running the application, it seems like it might require some config to get it to log the X-Forwarded-For IP rather than the proxy IP: https://docs.gunicorn.org/en/stable/deploy.html

Not 100% sure of the best way to debug this - I guess a starting point would be knowing exactly where that chart is trying to read the Client IP from, and work backwards from there.

@gulfaraz
Copy link
Member

do we know exactly where these logs are derived from?
a starting point would be knowing exactly where that chart is trying to read the Client IP from, and work backwards from there

Azure uses the API requests' IP address to find client_Cityclient_StateOrProvince, and client_CountryOrRegion using GeoLite2 from MaxMind

This would make sense for logs that were being emitted by the Django App. However, in these cases where there's something masking the originating IP address there "should" always be an X-Forwarded-For header added that should contain the real IP address. From rough reading online, it seems like the Azure logs should use the X-Forwarded-For header to determine the actual Client IP when available, but of course, this is not working for us some-how.

Looks like the server drops the X-Forwarded-For header to maintain user privacy. The IP address isn't collected locally when the X-Forwarded-For header is set.

  • In the best case scenario, just a change in filter to use the X-Forwarded-For header to determine the actual Client IP
  • In the more likely case, the X-Forwarded-For header is either not being applied correctly, or being dropped by the web-server

Azure may be masking the IP address. I suggest disabling any masking on Azure's side before trying the above actions.

I tried to disable masking using these steps but I don't seem to have the required permissions.

@batpad
Copy link
Collaborator

batpad commented Aug 27, 2021

@gulfaraz - this is some solid digging into this.

It would be nice to rule out Azure masking the IP address. It definitely seems like these logs are all coming from Azure and it's not parsing logs being emitted by the django app, so I don't think this is a django issue.

The Azure masking seems the most likely to me :( - if we can definitely rule out Azure masking the IP, then am happy to get on a call or so to try and delve into this more - definitely a mystery I'm quite interested in solving as well, thanks much for digging into this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

4 participants