Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iso seg ops files #1109

Merged
merged 11 commits into from
Oct 27, 2023
Merged

Iso seg ops files #1109

merged 11 commits into from
Oct 27, 2023

Conversation

reneighbor
Copy link
Member

@reneighbor reneighbor commented Jul 31, 2023

Please take a moment to review the questions before submitting the PR

WHAT is this change about?

This PR is updating the ops files for isolation cells and isolation segment routers. Previously, the ops files had been out of date and non-functional with the current version of cf-deployment.

We also move the files out of test, as they are not just used for testing. Lastly we removed a the ops file for a bosh lite environment, as bosh-lite is now deprecated.

What customer problem is being addressed? Use customer persona to define the problem e.g. Alana is unable to...

  • A resource for creating isolation segments and isolation segment routers.
  • A resource for TAS developers to test functionality on isolation segments

Please provide any contextual information.

https://www.pivotaltracker.com/story/show/183738848

Has a cf-deployment including this change passed cf-acceptance-tests?

  • YES
  • NO n/a because these ops files are not currently used in CATs.

Does this PR introduce a breaking change? Please take a moment to read through the examples before answering the question.

  • YES - please choose the category from below. Feel free to provide additional details.
  • NO

How should this change be described in cf-deployment release notes?

Update ops files for creating isolation cells and isolation segment routers.

Does this PR introduce a new BOSH release into the base cf-deployment.yml manifest or any ops-files?

  • YES - please specify
  • NO

Does this PR make a change to an experimental or GA'd feature/component?

  • experimental feature/component
  • GA'd feature/component

Please provide Acceptance Criteria for this change?

Target a cf-deployment environment:

  • bosh deploy cf.yml -o operations/add-persistent-isolation-segment-diego-cell.yml -o operations/add-persistent-isolation-segment-router.yml
  • cf create-isolation-segment persistent_isolation_segment
  • cf create-org o && cf target -o o
  • cf create-space s && cf target -s s
  • cf enable-org-isolation o persistent_isolation_segment
  • cf set-org-default-isolation-segment o persistent_isolation_segment
  • cf push dora ~/workspace/cf-acceptance-tests/assets/dora

Verifying isolated cell:

  • cf app dora --> [see that dora is in the persistent_isolation_segment iso seg]

Verifying isolated router:

  • bosh vms --> copy iso seg router IP
  • bosh ssh iso-seg-router
  • curl 10.0.1.18 -H "Host: dora.shamrockgreen.cf-app.com --> [confirm you get a response, "Hi, I'm Dora!"]

What is the level of urgency for publishing this change?

  • Urgent - unblocks current or future work
  • Slightly Less than Urgent

Tag your pair, your PM, and/or team!

@tas-runtime-ecosystem

@ctlong
Copy link
Member

ctlong commented Aug 1, 2023

@reneighbor we only accept PRs to the develop branch except in special cases. Can you please update the base for this PR to develop or explain why this is a special case.

Also, it appears from the description that the desired end result was to update certain ops files and move them out of the operations/test folder, but the final result of this PR is just to remove certain ops files. Are some changes still missing? Or is the PR description out-of-date?

@reneighbor reneighbor changed the base branch from main to develop August 1, 2023 20:23
@reneighbor
Copy link
Member Author

@ctlong Wow, thanks for the fast turnaround!! I'd forgotten to commit the files in their new location. Thanks.

Copy link
Member

@ctlong ctlong left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reneighbor the ops files tests will need to be updated, in the units/tests directory. Let me know if you need help with that.

Also, I took a look and some of these ops files are being used in CF-D CI... Can you elaborate more on why they don't work at present? I'm curious because our pipelines haven't had any problems related to these ops files as far as I can tell.

@reneighbor
Copy link
Member Author

@ctlong Updated tests and docs:

bosh: null | cf:  (stratus)
~/workspace/cf-deployment/units |iso-seg-ops-files U:2?:3| rc
$ ./test --run TestStandard/add-persistent-isolation-segment-diego-cell.yml
Starting unit tests...
ok      github.com/cf-deployment/units/tests/addons_test        0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/backup_and_restore_test    0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/experimental_test  0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/iaas_test  0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/inline_test        0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/standard_test      22.290s
ok      github.com/cf-deployment/units/tests/test_test  0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/semantic_test      0.003s [no tests to run]

bosh: null | cf:  (stratus)
~/workspace/cf-deployment/units |iso-seg-ops-files U:2?:3| rc
$ ./test --run TestStandard/add-persistent-isolation-segment-router.yml
Starting unit tests...
ok      github.com/cf-deployment/units/tests/addons_test        0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/backup_and_restore_test    0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/experimental_test  0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/iaas_test  0.004s [no tests to run]
ok      github.com/cf-deployment/units/tests/inline_test        0.004s [no tests to run]
ok      github.com/cf-deployment/units/tests/standard_test      17.311s
ok      github.com/cf-deployment/units/tests/test_test  0.003s [no tests to run]
ok      github.com/cf-deployment/units/tests/semantic_test      0.003s [no tests to run]

According to the story, we don't have tests that deploy an iso seg in the CF-D CI (or in the Runtime team's CI), hence the ops files have been allowed to go out of date.

@ctlong ctlong self-assigned this Aug 21, 2023
@ctlong
Copy link
Member

ctlong commented Aug 21, 2023

@reneighbor unless I'm misunderstanding something, the story doesn't appear to be accurate when it says "CI for cf-d isn't ensuring that any of our jobs can compile/deploy under isolation".

✅ CF-D CI does apply the following ops files in the fresh-luna fanout:

  • operations/test/add-persistent-isolation-segment-diego-cell.yml
  • operations/test/use-cflinuxfs4-compat-isolation-segment-diego-cell.yml
  • operations/test/add-persistent-isolation-segment-router.yml
  • operations/rename-isolation-segment-network.yml

I can understand that these ops files may be out-of-date and should be updated. It doesn't seem appropriate to call them non-functional though, since they're not failing deploys in CI.

@ctlong
Copy link
Member

ctlong commented Aug 21, 2023

Also CATs does include tests for isolation segments, so can you please validate that a CF-Deployment with your updated ops files applied does pass CATs with the following configuration options set:

  • "include_isolation_segments": true
  • "include_routing_isolation_segments": true

⚡ It would be awesome if y'all could write a new CATs test in one of those suites that fails with the current iteration of these ops files (in CF-D develop) so that we can catch regressions in these ops files in the future. Not gonna block this PR on that though.

@ctlong ctlong assigned reneighbor and unassigned ctlong Aug 21, 2023
@reneighbor
Copy link
Member Author

Hi @ctlong !

We discovered that the pre-existing isolation segment ops files don't install silk-cni but instead use a pre-silk networking layer. To prove this, we ran a test that pushes 2 apps on an isolation segment and attempts to run c2c between them.

With old ops files:

  [2023-09-25 22:56:03.01 (UTC)]> curl -k -H Expect: -s http://client-app.iso-seg.lightmediumorchid.cf-app.com/curl/10.254.0.10/8080
  {"stdout":"","stderr":"*   Trying 10.254.0.10:8080...\n  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0
 --:--:-- --:--:-- --:--:--     0* connect to 10.254.0.10 port 8080 failed: Connection refused\n* Failed to connect to 10.254.0.10 port 8080 after 0 ms: Connection refused\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n* Closing
 connection 0\ncurl: (7) Failed to connect to 10.254.0.10 port 8080 after 0 ms: Connection refused\n","return_code":7}{"stdout":"","stderr":"*   Trying 10.254.0.10:8080...\n  %!T(MISSING)otal    %!R(MISSING)eceived %!X(MISSING)ferd  Average Speed   Time    Time
  Time  Current\n                                 Dload  Upload   Total   Spent    Left  Speed\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* connect to 10.254.0.10 port 8080 failed: Connection refused\n* Failed to connect to 10
.254.0.10 port 8080 after 0 ms: Connection refused\n\r  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0\n* Closing connection 0\ncurl: (7) Failed to connect to 10.254.0.10 port 8080 after 0 ms: Connection refused\n","return_code":7}{ *
   Trying 10.254.0.10:8080...
  %!T(MISSING)otal    %!R(MISSING)eceived %!X(MISSING)ferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0* connect to 10.254.0.10 port 8080 failed: Connection refused
* Failed to connect to 10.254.0.10 port 8080 after 0 ms: Connection refused
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
* Closing connection 0
curl: (7) Failed to connect to 10.254.0.10 port 8080 after 0 ms: Connection refused
 7}  [FAILED] in [It] - /home/pivotal/workspace/cf-acceptance-tests/isolation_segments/isolation_segments.go:299 @ 09/25/23 22:56:04.02

With PR'd ops files, the test passes.

@ctlong ctlong assigned ctlong and unassigned reneighbor Sep 27, 2023
@ctlong
Copy link
Member

ctlong commented Oct 20, 2023

@reneighbor this mostly lgtm then. Thank you for responding to my questions.

The last thing I'm wondering is if the ops files should be moved out of operations/test... This section of the README explicitly calls out that some ops files were placed in that directory not because they're meant for testing, but rather as an example because the change they're trying to capture is not easily generalizable. What do you think?

@ctlong ctlong assigned reneighbor and unassigned ctlong Oct 20, 2023
@reneighbor
Copy link
Member Author

@ctlong Thanks a lot, and for real, thanks for being super careful with PR reviews.

Re moving the ops files outside of test. I hadn't seen the README. Hm, it seems like the main reason given for why the ops file is not generalizable is that it's not repeatable (I believe because the names and tags will be duplicated).

If that's the case, I think customers are smart enough to use the ops files as an example; for example, the windows2019-cell ops file is one that couldn't be repeatable, but it's outside of test.

It's a good reminder to remove that example from the README though, which I'll do; feel free LMK if you think the files should stay in test.

ctlong
ctlong previously approved these changes Oct 27, 2023
@ctlong ctlong assigned ctlong and unassigned reneighbor Oct 27, 2023
@ard-wg-gitbot
Copy link
Contributor

Hello friend, it looks like your pull request has failed one or more of our checks. Please take a look! 👀

ctlong
ctlong previously approved these changes Oct 27, 2023
@ard-wg-gitbot
Copy link
Contributor

Hello friend, it looks like your pull request has failed one or more of our checks. Please take a look! 👀

@ctlong ctlong merged commit 902b8b6 into cloudfoundry:develop Oct 27, 2023
moleske added a commit to cloudfoundry/capi-ci that referenced this pull request Oct 31, 2023
moleske added a commit to cloudfoundry/capi-ci that referenced this pull request Oct 31, 2023
moleske added a commit to cloudfoundry/cli that referenced this pull request Oct 31, 2023
moleske added a commit to cloudfoundry/cli that referenced this pull request Oct 31, 2023
moleske added a commit to cloudfoundry/cli that referenced this pull request Oct 31, 2023
ccjaimes pushed a commit to cloudfoundry/cli that referenced this pull request Oct 31, 2023
ccjaimes pushed a commit to cloudfoundry/cli that referenced this pull request Oct 31, 2023
ccjaimes pushed a commit to cloudfoundry/cli that referenced this pull request Oct 31, 2023
* Update path for add-persistent-isolation-segment-diego-cell.yml

- cloudfoundry/cf-deployment#1109 changed the path of the file to no longer include test

* Put back old test/add-persistent-isolation-segment-diego-cell.yml path

- older cf-d's still use it
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants