Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Appliance systemd overhaul #1266

Closed
wants to merge 13 commits into from
Closed

[WIP] Appliance systemd overhaul #1266

wants to merge 13 commits into from

Conversation

ghost
Copy link

@ghost ghost commented Dec 18, 2017

Fixes #1215, fixes #1265, fixes #1267, and improves the reliability of our systemd configuration.

This change introduces a new systemd target, psc-ready.target, that component services relying on psc tokens may use as a Requires dependency.

A new service, vic-appliance-wait-psc-config.service, has be been introduced that mimics the functionality of systemd's wait-network-online, where the waiting unit is triggered as active when a psc configuration is present. This replaces the use of path files in the vic 1.3 ova.

Changes also introduce another target, vic-appliance.target that is used as the boot-level target as a systemd best practice.

@andrewtchin
Copy link
Contributor

After rebooting I got this in get_token:

Dec 18 21:27:42 Photon bash[1754]: [main] INFO com.vmware.vim.sso.client.impl.SecurityTokenServiceImpl - Successfully acquired token for user: {Name: engine-b8df7699-36fd-4d70-86e1-acfd59b9eeec, Domain: vsphere.local}
Dec 18 21:27:42 Photon bash[1754]: [13][I][2017-12-18T21:27:42.693Z][1][SsoPscCommand][getToken][Done! Token stored in '/etc/vmware/psc/engine/tokens.properties'.]
Dec 18 21:27:42 Photon bash[1754]: [0][I][2017-12-18T21:27:42.961Z][1][SsoPscCommand][getToken][Initializing...]
Dec 18 21:27:42 Photon bash[1754]: [1][I][2017-12-18T21:27:42.998Z][1][PscSettings][validateConfig][PSC configuration:]
Dec 18 21:27:42 Photon bash[1754]: [2][I][2017-12-18T21:27:42.998Z][1][PscSettings][validateConfig][- client: admiral]
Dec 18 21:27:42 Photon bash[1754]: [3][I][2017-12-18T21:27:42.998Z][1][PscSettings][validateConfig][- domainController: sc-rdops-vm09-dhcp-21-232.eng.vmware.com]
Dec 18 21:27:42 Photon bash[1754]: [4][I][2017-12-18T21:27:42.998Z][1][PscSettings][validateConfig][- domainControllerPort: 443]
Dec 18 21:27:42 Photon bash[1754]: [5][I][2017-12-18T21:27:42.998Z][1][PscSettings][validateConfig][- keystoreFile: /etc/vmware/psc/admiral/psc-config.keystore]
Dec 18 21:27:42 Photon bash[1754]: [6][I][2017-12-18T21:27:42.999Z][1][PscSettings][validateConfig][- resourceServer: rs_admiral]
Dec 18 21:27:42 Photon bash[1754]: [7][I][2017-12-18T21:27:42.999Z][1][PscSettings][validateConfig][- solutionUser: admiral-3991add5-c965-4ccb-80d4-29f2e84edec4]
Dec 18 21:27:42 Photon bash[1754]: [8][I][2017-12-18T21:27:42.999Z][1][PscSettings][validateConfig][- tenant: vsphere.local]
Dec 18 21:27:43 Photon bash[1754]: [9][I][2017-12-18T21:27:42.999Z][1][PscSettings][validateConfig][- version: 6.0]
Dec 18 21:27:43 Photon bash[1754]: [10][I][2017-12-18T21:27:43.130Z][1][SsoPscCommand][getToken][Getting SSO cert...]
Dec 18 21:27:43 Photon bash[1754]: [11][I][2017-12-18T21:27:43.290Z][1][SsoPscCommand][getToken][Building STSAdapter...]
Dec 18 21:27:43 Photon bash[1754]: [12][I][2017-12-18T21:27:43.795Z][1][SsoPscCommand][getToken][Getting SAML token...]
Dec 18 21:27:44 Photon bash[1754]: [main] WARN com.vmware.vim.sso.client.impl.SoapBindingImpl - Could not load VECS keystore: java.security.KeyStoreException: VKS not found
Dec 18 21:27:44 Photon bash[1754]: [main] INFO com.vmware.identity.token.impl.Util - Reading resources from zip file path=[/etc/vmware/admiral/admiral-auth-psc-1.2.0-SNAPSHOT-command.jar]
Dec 18 21:27:44 Photon bash[1754]: [main] INFO com.vmware.identity.token.impl.Util - Reading resources from decoded zip file path=[/etc/vmware/admiral/admiral-auth-psc-1.2.0-SNAPSHOT-command.jar]
Dec 18 21:27:45 Photon bash[1754]: [main] WARN com.vmware.vim.sso.client.impl.SiteAffinityServiceDiscovery - CDC not configured java.lang.NoClassDefFoundError: com/vmware/identity/cdc/CdcFactory
Dec 18 21:27:45 Photon bash[1754]: [main] INFO com.vmware.identity.token.impl.Util - Reading resources from zip file path=[/etc/vmware/admiral/admiral-auth-psc-1.2.0-SNAPSHOT-command.jar]
Dec 18 21:27:45 Photon bash[1754]: [main] INFO com.vmware.identity.token.impl.Util - Reading resources from decoded zip file path=[/etc/vmware/admiral/admiral-auth-psc-1.2.0-SNAPSHOT-command.jar]
Dec 18 21:27:45 Photon bash[1754]: [main] INFO com.vmware.identity.token.impl.SamlTokenImpl - SAML token for SubjectNameId [[email protected], format=http://schemas.xmlsoap.org/claims/UPN] successfully parsed fr
om Element
Dec 18 21:27:45 Photon bash[1754]: [main] INFO com.vmware.vim.sso.client.impl.SecurityTokenServiceImpl - Successfully acquired token for user: {Name: admiral-3991add5-c965-4ccb-80d4-29f2e84edec4, Domain: vsphere.local}
Dec 18 21:27:45 Photon bash[1754]: [13][I][2017-12-18T21:27:45.670Z][1][SsoPscCommand][getToken][Done! Token stored in '/etc/vmware/psc/admiral/tokens.properties'.]
Dec 18 21:27:45 Photon systemd[1]: Started PSC Get Token.
-- Reboot --
Dec 18 21:32:54 Photon systemd[1]: Starting PSC Get Token...
Dec 18 21:32:56 Photon bash[223]: [0][I][2017-12-18T21:32:56.797Z][1][SsoPscCommand][getToken][Initializing...]
Dec 18 21:32:57 Photon bash[223]: [1][I][2017-12-18T21:32:57.005Z][1][PscSettings][validateConfig][PSC configuration:]
Dec 18 21:32:57 Photon bash[223]: [2][I][2017-12-18T21:32:57.006Z][1][PscSettings][validateConfig][- client: harbor]
Dec 18 21:32:57 Photon bash[223]: [3][I][2017-12-18T21:32:57.006Z][1][PscSettings][validateConfig][- domainController: sc-rdops-vm09-dhcp-21-232.eng.vmware.com]
Dec 18 21:32:57 Photon bash[223]: [4][I][2017-12-18T21:32:57.006Z][1][PscSettings][validateConfig][- domainControllerPort: 443]
Dec 18 21:32:57 Photon bash[223]: [5][I][2017-12-18T21:32:57.007Z][1][PscSettings][validateConfig][- keystoreFile: /etc/vmware/psc/harbor/psc-config.keystore]
Dec 18 21:32:57 Photon bash[223]: [6][I][2017-12-18T21:32:57.007Z][1][PscSettings][validateConfig][- resourceServer: rs_admiral]
Dec 18 21:32:57 Photon bash[223]: [7][I][2017-12-18T21:32:57.008Z][1][PscSettings][validateConfig][- solutionUser: harbor-a9c5575b-1ba2-4708-bd45-15d1384b9da9]
Dec 18 21:32:57 Photon bash[223]: [8][I][2017-12-18T21:32:57.008Z][1][PscSettings][validateConfig][- tenant: vsphere.local]
Dec 18 21:32:57 Photon bash[223]: [9][I][2017-12-18T21:32:57.008Z][1][PscSettings][validateConfig][- version: 6.0]
Dec 18 21:32:57 Photon bash[223]: [10][I][2017-12-18T21:32:57.657Z][1][SsoPscCommand][getToken][Getting SSO cert...]
Dec 18 21:33:51 Photon bash[223]: Exception in thread "main" java.net.UnknownHostException: sc-rdops-vm09-dhcp-21-232.eng.vmware.com
Dec 18 21:33:51 Photon bash[223]: at java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:184)
Dec 18 21:33:51 Photon bash[223]: at java.net.SocksSocketImpl.connect(SocksSocketImpl.java:392)
Dec 18 21:33:51 Photon bash[223]: at java.net.Socket.connect(Socket.java:589)
Dec 18 21:33:51 Photon bash[223]: at sun.security.ssl.SSLSocketImpl.connect(SSLSocketImpl.java:673)
Dec 18 21:33:51 Photon bash[223]: at com.vmware.admiral.auth.idm.psc.saml.sso.platform.security.SslCertificateResolver.connect(SslCertificateResolver.java:96)
Dec 18 21:33:51 Photon bash[223]: at com.vmware.admiral.auth.idm.psc.saml.util.SsoPscCommand.getSsoCertificate(SsoPscCommand.java:190)
Dec 18 21:33:51 Photon bash[223]: at com.vmware.admiral.auth.idm.psc.saml.util.SsoPscCommand.getToken(SsoPscCommand.java:158)
Dec 18 21:33:51 Photon bash[223]: at com.vmware.admiral.auth.idm.psc.util.PscCommand.execute(PscCommand.java:39)
Dec 18 21:33:51 Photon bash[223]: at com.vmware.admiral.auth.idm.psc.util.PscCommand.main(PscCommand.java:47)
Dec 18 21:33:51 Photon systemd[1]: get_token.service: Main process exited, code=exited, status=1/FAILURE
Dec 18 21:33:51 Photon systemd[1]: Failed to start PSC Get Token.
Dec 18 21:33:51 Photon systemd[1]: get_token.service: Unit entered failed state.
Dec 18 21:33:51 Photon systemd[1]: get_token.service: Failed with result 'exit-code'.

This caused Admiral and Harbor to not start.
I tried to reinitialize from the getting started page but I think that might not regenerate the tokens:

Dec 18 21:36:27 Photon start_fileserver.sh[861]: time="2017-12-18T21:36:27Z" level=info msg="Skipping registering harbor with PSC since PSC config file is present"
Dec 18 21:36:27 Photon start_fileserver.sh[861]: time="2017-12-18T21:36:27Z" level=info msg="Skipping registering engine with PSC since PSC config file is present"
Dec 18 21:36:27 Photon start_fileserver.sh[861]: time="2017-12-18T21:36:27Z" level=info msg="Skipping registering admiral with PSC since PSC config file is present"

I did systemctl restart get_token and that was successful (can we add a "done" log message to the end of that script?)
Then I restarted admiral and harbor and they seem to be fine.

So I think we need reinit to refresh tokens and restart all dependent services

@@ -1,11 +1,12 @@
[Unit]
Description=PSC Get Token
Documentation=http://github.com/vmware/vic-product/installer
Before=harbor_startup.service
Requires=vic-appliance-wait-psc-config.service
After=vic-vic-appliance-wait-psc-config.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed.

[Unit]
Description=Wait for PSC token to be present.
Documentation=https://github.com/vmware/vic-product
Before=get_token.service
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is something upstream of this After network online? I think the issue I commented on with get_token results from not having network to lookup a hostname.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah - great point. Actually it seems like we should wait here for vic-appliance-ready.service which would guarantee network connectivity.

@ghost ghost requested a review from mdharamadas1 December 19, 2017 20:11
Copy link
Contributor

@mdharamadas1 mdharamadas1 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Test changes look good!

*** Test Cases ***
Verify OVA services
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@morris-jason Just realized that you need to update 1-01-Install.md by removing corresponding test steps.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks.

Documentation=http://github.com/vmware/vic-product/installer

[Path]
PathModified=/etc/vmware/psc/admiral/psc-config.properties
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should this watch the paths for harbor and engine too? unless that might trigger this multiple times

@@ -89,11 +87,6 @@ func registerWithPSC(ctx context.Context) error {
// Register all VIC components with PSC
cmdName := "/usr/bin/java"
for _, client := range []string{"harbor", "engine", "admiral"} {
pscConfFile := filepath.Join(pscConfDir, client, pscConfFileName)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When this is removed, what happens when the user uses the Re-initialize the VIC appliance button? get_token.service would get activated, however, I don't see where we restart Admiral and Harbor. IIRC, at least Admiral needs to be restarted after re-registering with PSC, otherwise the Admiral URL shows a blank/error page.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If possible I think we should have the reinit button refresh tokens and restart all services (maybe stop services before refresh)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

confirmed that reinit runs get_token but doesn't restart services right now

Copy link
Author

@ghost ghost Dec 20, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@anchal-agrawal @andrewtchin I've added a line to the get_token service to restart harbor and admiral after get_token.sh executes.

@@ -16,7 +16,6 @@
Documentation Test 3-01 Admiral UI
Resource ../../resources/Util.robot
Test Timeout 20 minutes
Test Setup Run Keyword Setup Base State
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this expected change where there is no SSO login required now?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was just testing changes. That one didn't help :)

Copy link
Contributor

@anchal-agrawal anchal-agrawal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The change to restart Admiral and Harbor every hour is a big behavioral change and needs discussion. While we have to restart the services every time we register with PSC, we don't every time we run get_token. It's also not optimal to restart the services every hour, as there may be users actively using them at the time.

[Service]
Type=oneshot
ExecStart=/usr/bin/bash /etc/vmware/psc/get_token.sh
ExecStartPost=/usr/bin/systemctl --no-block restart admiral harbor
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The get_token service runs every hour and this would restart Admiral and Harbor every time.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the services don't need to be restarted when the token is refreshed maybe we can have a registration unit separate from get_token that restarts the service after registration?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andrewtchin that was my intent, I'll move this out, it was for testing. I'll mark this WIP and ping @anchal-agrawal when it's ready for final review. Sorry for the confusion.

@ghost ghost changed the title Appliance systemd overhaul [WIP] Appliance systemd overhaul Jan 8, 2018
jsonmorris and others added 11 commits January 9, 2018 11:55
Minor unit name fixes

Revert bad changes to component services
Update test markdown

Fix bad test variable name

Fix broken keyword

Add extra time for selenium tests

Bump wait time some more

Sleep before adding default users
Dont block on harbor/admiral restart

Final fixes for startup units
Jason Morris added 2 commits January 9, 2018 12:22
@ghost ghost closed this Jan 9, 2018
@ghost
Copy link
Author

ghost commented Jan 9, 2018

Closing this and opening #1266 and hopefully drone starts behaving.

@ghost ghost mentioned this pull request Jan 9, 2018
This pull request was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
5 participants