Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Identify which node a response comes from #21282

Closed
john-thomas-dotcms opened this issue Nov 19, 2021 · 7 comments · Fixed by #21524
Closed

Identify which node a response comes from #21282

john-thomas-dotcms opened this issue Nov 19, 2021 · 7 comments · Fixed by #21524

Comments

@john-thomas-dotcms
Copy link
Contributor

john-thomas-dotcms commented Nov 19, 2021

Is your feature request related to a problem? Please describe.

There are many types of support issues where identifying which node responded to a request is critical to resolving the issue (to identify a node which isn't responding correctly, identify out-of-sync nodes, etc.). Currently there is no standard way to do this; it's been done by allowing access to each node in the cluster via separate URLs, modifying load balancer configuration, and other methods, but none of these methods works well, and none of them work universally for all systems.

This all means that each time this information is needed, a custom solution has to be implemented (and then usually rolled back after the problem has been solved).

Describe the solution you'd like

We need a way for dotCMS to identify which node in a cluster each specific response is sent from. This is what I suggest, but there are multiple ways this could be implemented:

  • Add a response header which identifies which node the response was sent from.
  • Enable or disable the response header to all responses via a configuration property (for example, RESPONSE_HEADER_ADD_NODE_ID=true).
  • Allow enabling the response header in an individual request by adding a URL parameter (for example, ?nodeid=true).

Describe alternatives you've considered

dotCMS Cloud Engineering has considered implementing something of this nature on the load balancer for all dotCMS Cloud customers. However, this isn't a workable option within our current infrastructure (ALB), and there are undoubtedly on-premise customers with similar limitations.

Other methods, such as bypassing the LB to access nodes directly, present performance and reliability issues, so they can't be implemented for any customers on more than a temporary (custom) basis.

Additional context

The identifier used in the header doesn't matter, as long as:

  • It's unique to each node.
  • It can be easily used to identify the node within the infrastructure.

Ideally, the node identifier should change if the server or container running on the node is changed (especially with k8s, where multiple containers may be started from the same image).

  • But if necessary, a unique ID (for the server/image) and a timestamp (for when the server/container was started) could potentially be both used to ensure identification of unique containers.

The Site from the license screen seems to fit most of these needs, so that might be an appropriate identifier to use. If the Site doesn't change when k8s shuts down and restarts a container, then it might make sense to send a timestamp (or maybe Server ID) in addition to the Site.

@jcastro-dotcms
Copy link
Contributor

jcastro-dotcms commented Nov 19, 2021

One of the customers requesting this feature: https://dotcms.zendesk.com/agent/tickets/105784

@wezell
Copy link
Contributor

wezell commented Dec 7, 2021

We should send a x-dot-server header that also includes the shorty serverId. This can be disabled by passing the environmental variable RESPONSE_HEADER_ADD_NODE_ID=false

@wezell
Copy link
Contributor

wezell commented Jan 11, 2022

So we should send 2 pieces of information, the shorty serverId and the node name, which is found here:

Screen Shot 2022-01-11 at 11 15 39 AM

  • The inclusion of the node name should be disable-able via config
  • We need to make sure that the node name is properly cached.

The header should look something like:
{nodeName} | {serverId}

x-dot-server: dotcms-node-1 | dde63dc7

@jdotcms jdotcms self-assigned this Jan 12, 2022
jdotcms added a commit that referenced this issue Jan 12, 2022
jdotcms added a commit that referenced this issue Jan 12, 2022
jdotcms added a commit that referenced this issue Jan 12, 2022
jdotcms added a commit that referenced this issue Jan 12, 2022
jdotcms added a commit that referenced this issue Jan 12, 2022
@jdotcms
Copy link
Contributor

jdotcms commented Jan 12, 2022

Needs doc:
RESPONSE_HEADER_ADD_NODE_ID=false

This one will avoids the inclusion of the header at all on the http response

RESPONSE_HEADER_ADD_NODE_ID_INCLUDE_NODE_NAME=false

This one will avoids the inclusion of the node name into the x-dot-server header, in summary it will returns on the response:

x-dot-server: unknown|{serverId}

@wezell
Copy link
Contributor

wezell commented Jan 13, 2022

So, to be clear
RESPONSE_HEADER_ADD_NODE_ID=false means no header will be added and
RESPONSE_HEADER_ADD_NODE_ID_INCLUDE_NODE_NAME=false means it will only show the nodeId and not Node Name.

@erickgonzalez erickgonzalez linked a pull request Jan 18, 2022 that will close this issue
jdotcms added a commit that referenced this issue Jan 20, 2022
jdotcms added a commit that referenced this issue Jan 25, 2022
jdotcms added a commit that referenced this issue Jan 26, 2022
fmontes pushed a commit that referenced this issue Jan 28, 2022
* #21282 adding the metadata interceptor

* #21282 adding unit test

* #21282 feedback and unit test

* #21282 adding more doc

* #21282 unit test feedback

* #21282 adding the ability to show the node id by query string

* #21282 feedback done

* #21282 fixing unit test

* #21282 fixing a test again
fmontes pushed a commit that referenced this issue Feb 2, 2022
* Modify dotcmsReleaseVersion, coreWebReleaseVersion, webComponentsReleaseVersion and dot-cicd branch version

* Modify dotcmsReleaseVersion to 22.02, coreWebReleaseVersion, webComponentsReleaseVersion to rc and dot-cicd branch version to release-22.02

* Update branch in git submodule to release-22.02

* Enhance code freeze automated process

* #21522 change log to debug (#21523)

* #21434 the code was expecting always a String, but the value could be a number in case the type of the text is mark as a whole number or so, the toString handles both cases (#21541)

* fix: sort after clicking any available filter (#21542)

* #20774 If a user with restricted access views their folders in site browser first, they can see the Add Content button when they navigate to other folders (#21539)

* display add content button only when user has permission

* add spacing

* #21516 changes to file system metadata provider (#21517)

* #21516 changes to file system metadata provider

* fix test

Co-authored-by: nollymar <[email protected]>
Co-authored-by: Erick Gonzalez <[email protected]>

* Issue 20892 update docker clean (#21530)

* #20892 update docker clean

* #20892 getting APR to work

* #20892 newlines added

* #21454 cp log4j

* #21454 get custom starter loading

* #21454 get custom starter loading

* Printing out message when default starter is used

* Updating java-base version

* #20892 Fixing curl tests

* #20892 Fixing broken tests

* #20892 Fixing typo in felix load dir

* Setting DOT_CICD_BRANCH to release-22.02

Co-authored-by: Will Ezell <[email protected]>
Co-authored-by: nollymar <[email protected]>

* Updating java-base version

* #21565 removes osgi proxy servlet reference (#21566)

* #21565 removes osgi proxy servlet reference

* #21565 removing class, fixing red

* #21120 fix Time machine looks miss-aligned when you are using the system in spanish (#21563)

* #19236 UI: Scrolling error uploading multiple files

* #21223 Radio Button Field Transforms Values on Export

* #21204 [Site Copy] : Copying a Site randomly fails

* #21537 [Static Publishing] : Problem with multi-language contents

* #21445 support dotassets in graphql (#21573) (#21595)

* #21445 support dotassets in graphql

* #21445 Postman test

* #21298 We are not showing the progress bar when you are adding a new copied site

* Updating properties in docker compose examples (#21608)

* Updating properties

* Including a new example for PP

* Fixing conf error on PP example

Co-authored-by: nollymar <[email protected]>

* #21454 Copying server.xml to the right path (#21593)

* #21454 Copying server.xml to the right path

* #21454 server.xml was renamed

* #21454 Rolling back last change

* #21572 Adding support to gid=0

* #21454 Removing unused code

* #21572 Fixing group name

* Improving doc

Co-authored-by: nollymar <[email protected]>

* dotCMS/core #21436 Create a direct function to be called after angular confirmation dialog (#21615)

* #21122 Removing daisy diff (#21607)

* #21282 Identify which node a response comes from

* #21282 adding the metadata interceptor

* #21282 adding unit test

* #21282 feedback and unit test

* #21282 adding more doc

* #21282 unit test feedback

* #21282 adding the ability to show the node id by query string

* #21282 feedback done

* #21282 fixing unit test

* #21282 fixing a test again

* #21599 Push Publishing Workflow Dialog is missing timezoneId

* #21122 Replace What's changed with the htmldiff js library

* #21122 adding first draft of the page render versions

* #21122 moving to sequencial way to retrieve the pages

* #21122 refactoring

* #21122 adding unit test for diff

* #21122 adding unit test for diff

* #21122 adding unit test for diff

* test

* #21122 feedback done

Co-authored-by: Erick Gonzalez <[email protected]>

* #21446 IT failure ResetPasswordResourceIntegrationTest#test_resetPassword_success

* #21634 Unable to generate password for users (#21643)

* Issue 21488 support smtps in mailer class (#21625)

* #21488 Adding missing test in MainSuite

* #21488 Changing the method used to send emails. The previous one used smtp protocol by default

* #21488 Adding support for smtps

* #21488 The way we send messages was changed to support smtps

* #21488 Making code cleaner

* #21488 Sending the right host and port to get the connection

* #21488 Applying code review suggestions

Co-authored-by: nollymar <[email protected]>

* #21644  fix from email on recovery password (#21645)

* revert master values

Co-authored-by: victoralfaro-dotcms <[email protected]>
Co-authored-by: Will Ezell <[email protected]>
Co-authored-by: Jonathan <[email protected]>
Co-authored-by: Rafael Velazco <[email protected]>
Co-authored-by: nollymar <[email protected]>
Co-authored-by: Nollymar Longa <[email protected]>
Co-authored-by: alfredo-dotcms <[email protected]>
Co-authored-by: Daniel Silva <[email protected]>
Co-authored-by: Humberto Morera <[email protected]>
Co-authored-by: Fabrizzio Araya <[email protected]>
@fmontes
Copy link
Member

fmontes commented Feb 4, 2022

image

@bryanboza
Copy link
Contributor

Fixed, tested on release-22.02 // Docker // FF

By default those are the results:
image

After turn of the follow properties:

RESPONSE_HEADER_ADD_NODE_ID=false
RESPONSE_HEADER_ADD_NODE_ID_INCLUDE_NODE_NAME=false

We are not including the header
image

@wezell wezell closed this as completed Feb 8, 2022
fmontes pushed a commit that referenced this issue Feb 10, 2022
* Modify dotcmsReleaseVersion, coreWebReleaseVersion, webComponentsReleaseVersion and dot-cicd branch version

* Modify dotcmsReleaseVersion to 22.02, coreWebReleaseVersion, webComponentsReleaseVersion to rc and dot-cicd branch version to release-22.02

* Update branch in git submodule to release-22.02

* Enhance code freeze automated process

* #21522 change log to debug (#21523)

* #21434 the code was expecting always a String, but the value could be a number in case the type of the text is mark as a whole number or so, the toString handles both cases (#21541)

* fix: sort after clicking any available filter (#21542)

* #20774 If a user with restricted access views their folders in site browser first, they can see the Add Content button when they navigate to other folders (#21539)

* display add content button only when user has permission

* add spacing

* #21516 changes to file system metadata provider (#21517)

* #21516 changes to file system metadata provider

* fix test

Co-authored-by: nollymar <[email protected]>
Co-authored-by: Erick Gonzalez <[email protected]>

* Issue 20892 update docker clean (#21530)

* #20892 update docker clean

* #20892 getting APR to work

* #20892 newlines added

* #21454 cp log4j

* #21454 get custom starter loading

* #21454 get custom starter loading

* Printing out message when default starter is used

* Updating java-base version

* #20892 Fixing curl tests

* #20892 Fixing broken tests

* #20892 Fixing typo in felix load dir

* Setting DOT_CICD_BRANCH to release-22.02

Co-authored-by: Will Ezell <[email protected]>
Co-authored-by: nollymar <[email protected]>

* Updating java-base version

* #21565 removes osgi proxy servlet reference (#21566)

* #21565 removes osgi proxy servlet reference

* #21565 removing class, fixing red

* #21120 fix Time machine looks miss-aligned when you are using the system in spanish (#21563)

* #19236 UI: Scrolling error uploading multiple files

* #21223 Radio Button Field Transforms Values on Export

* #21204 [Site Copy] : Copying a Site randomly fails

* #21537 [Static Publishing] : Problem with multi-language contents

* #21445 support dotassets in graphql (#21573) (#21595)

* #21445 support dotassets in graphql

* #21445 Postman test

* #21298 We are not showing the progress bar when you are adding a new copied site

* Updating properties in docker compose examples (#21608)

* Updating properties

* Including a new example for PP

* Fixing conf error on PP example

Co-authored-by: nollymar <[email protected]>

* #21454 Copying server.xml to the right path (#21593)

* #21454 Copying server.xml to the right path

* #21454 server.xml was renamed

* #21454 Rolling back last change

* #21572 Adding support to gid=0

* #21454 Removing unused code

* #21572 Fixing group name

* Improving doc

Co-authored-by: nollymar <[email protected]>

* dotCMS/core #21436 Create a direct function to be called after angular confirmation dialog (#21615)

* #21122 Removing daisy diff (#21607)

* #21282 Identify which node a response comes from

* #21282 adding the metadata interceptor

* #21282 adding unit test

* #21282 feedback and unit test

* #21282 adding more doc

* #21282 unit test feedback

* #21282 adding the ability to show the node id by query string

* #21282 feedback done

* #21282 fixing unit test

* #21282 fixing a test again

* #21599 Push Publishing Workflow Dialog is missing timezoneId

* #21122 Replace What's changed with the htmldiff js library

* #21122 adding first draft of the page render versions

* #21122 moving to sequencial way to retrieve the pages

* #21122 refactoring

* #21122 adding unit test for diff

* #21122 adding unit test for diff

* #21122 adding unit test for diff

* test

* #21122 feedback done

Co-authored-by: Erick Gonzalez <[email protected]>

* #21446 IT failure ResetPasswordResourceIntegrationTest#test_resetPassword_success

* #21634 Unable to generate password for users (#21643)

* Issue 21488 support smtps in mailer class (#21625)

* #21488 Adding missing test in MainSuite

* #21488 Changing the method used to send emails. The previous one used smtp protocol by default

* #21488 Adding support for smtps

* #21488 The way we send messages was changed to support smtps

* #21488 Making code cleaner

* #21488 Sending the right host and port to get the connection

* #21488 Applying code review suggestions

Co-authored-by: nollymar <[email protected]>

* #21644  fix from email on recovery password (#21645)

* 20651 labels and docker fixes for Maintenance -> Tools (#21606)

* (#20651): Adding docker support for specific arch

* Adding feedback to refactor process execution to its own class. Docker images changes to include a postgres base files to be able to runa DB dump

* Allowing pg_dump only for postgres installations

* Allowing pg_dump only for postgres installations

* Removing unnecessary assertion

* #21557 SAML Warn message when you hit the dotCMS frontend

* #21628 Noisy log using edit mode

* #21628 change log to debug

* #21628 add property to turn off logging

* #21629 Bring Back do not update the content in edit mode.  (#21648)

* send save event when getting previous version

* set timeout

* #21656 Once a fileAsset is saved. the file itself can not be replaced using Save

* #21656 fix saved workflow image

* #21656 adding webp filter so images are smaller

* #21636 Generate a new starter for 22.02

* #21636 update starter

* #21636 update starter

* (#19967): updateing host inodes to new remote host identifier (#21680)

* revert master values

* Reinstate CMS_JAVA_OPTS (#21687)

* Reinstate CMS_JAVA_OPTS

We removed the `CMS_JAVA_OPTS` setting which was added last to the java options and allowed a user to override anything that was set before it.

* Removing CATALINA_OPTS from docker-compoe-examples

* Updating variable name used to configure smtp host

Co-authored-by: nollymar <[email protected]>

Co-authored-by: victoralfaro-dotcms <[email protected]>
Co-authored-by: Will Ezell <[email protected]>
Co-authored-by: Jonathan <[email protected]>
Co-authored-by: Rafael Velazco <[email protected]>
Co-authored-by: nollymar <[email protected]>
Co-authored-by: Nollymar Longa <[email protected]>
Co-authored-by: alfredo-dotcms <[email protected]>
Co-authored-by: Daniel Silva <[email protected]>
Co-authored-by: Humberto Morera <[email protected]>
Co-authored-by: Fabrizzio Araya <[email protected]>
@erickgonzalez erickgonzalez added LTS: Needs Backport Ticket that will be added to LTS and removed Next LTS Release labels Dec 16, 2022
@erickgonzalez erickgonzalez removed the LTS: Needs Backport Ticket that will be added to LTS label Apr 3, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment