Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

support sonic-mgmt multi-dut gen-mg for LC #3419

Closed
wants to merge 184 commits into from

Conversation

saravanansv
Copy link
Contributor

@saravanansv saravanansv commented Apr 29, 2021

  1. support adding inband data in LC's minigraph like below
    VoqInbandInterfaces fields come from testbed.yaml (fields voq_inband_intf, voq_inband_type, voq_inband_ip) to lab/veos inventory files to minigraph.

  2. System ports
    Data flows from
    a. port_config.ini (new fileds required are: numVoq, coreId, corePortId), (existing fields: name, speed)
    b. switchId = running asic_id count across all linecards in the chassis
    c. systemPortId = running systemport count across all linecards in the chassis
    Each dut adds its own system-ports to its own ansible_facts.
    config_sonic_basedon_testbed.yml loops through all duts system-ports to create all_sysports

  3. DeviceProperty
    <a:DeviceProperty>
    <a:Name>SwitchType</a:Name>
    <a:Reference i:nil="true"/>
    <a:Value>voq</a:Value>
    </a:DeviceProperty>
    <a:DeviceProperty>
    <a:Name>MaxCores</a:Name>
    <a:Reference i:nil="true"/>
    <a:Value>16</a:Value>
    </a:DeviceProperty>
    <a:DeviceProperty>
    <a:Name>SwitchId</a:Name>
    <a:Reference i:nil="true"/>
    <a:Value>0</a:Value>
    </a:DeviceProperty>

a. switch type, maxcores are directly from testbed.yaml to inventory files to minigraph
b. start_switchid is calculated for each linecard and set into inventory files from TestbedProcessing.py based on num_asics from previous linecards

Description of PR

Summary:
Fixes # (issue)

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Approach

What is the motivation for this PR?

How did you do it?

How did you verify/test it?

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

@saravanansv saravanansv requested a review from a team as a code owner April 29, 2021 02:52
@lgtm-com
Copy link

lgtm-com bot commented Apr 29, 2021

This pull request introduces 12 alerts when merging 2ad8b8c3b806ed2705e14e38648cb0a4fdb79338 into 7a30b8d - view on LGTM.com

new alerts:

  • 11 for Except block handles 'BaseException'
  • 1 for Unused local variable

@@ -48,7 +48,8 @@
### Here are the expectation of files of device port_config.ini located, in case changed please modify it here
FILE_PATH = '/usr/share/sonic/device'
PORTMAP_FILE = 'port_config.ini'
ALLOWED_HEADER = ['name', 'lanes', 'alias', 'index', 'asic_port_name', 'role', 'speed']
ALLOWED_HEADER = ['name', 'lanes', 'alias', 'index', 'asic_port_name', 'role', 'speed',
'coreid', 'coreportid', 'numvoq']
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Adding extra fields 'coreid', 'coreportid', and 'numvoq' in port_config.ini will require updating port_config.ini for all platforms across all vendors. Has this been discussed and approved in sonic chassis subgroup.

Would sonic-buildimage be compatible with these added fields in port_config.ini.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I emailed sonic chassis subgroup there was no opposition.

After looking at asic_port_name usage only in multi-asic-vs port_config.ini file,
(e.g device/virtual/x86_64-kvm_x86_64-r0/msft_multi_asic_vs/2/port_config.ini) I wanted to follow similar logic of adding coreid, coreportid, numvoq only to the applicable VoQ sku's port_config.ini and it shouldn't be required in all skus.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As long as compatibility is handled in sonic-buildimage - it is good.

try: #get num_asics
num_asic = dev.get("num_asic")
if num_asic is not None:
entry += "\tnum_asic=" + str( num_asic )
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe the variable name in the inventory/lab file should be 'num_asics', and not 'num_asic'

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the variable num_asics is already used in config_sonic_basedon_testbed.yml.
I can rename the new num_asic to num_asics if there is no conflict in my diffs with existing assumptions of num_asics

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuvarnaMeenakshi should there be num_asic and also num_asics ?
In most of the places it uses {{ num_asic }} and in one place it uses {{ num_asics }} .
Why do we need both ? should it be just one ?

ansible/config_sonic_basedon_testbed.yml: port_alias: hwsku="{{ hwsku }}" num_asic="{{ num_asics }}"
ansible/roles/vm_set/tasks/kickstart_vm.yml: num_asic: "{{ hostvars[vm_name]['num_asics'] | default(1) }}"
ansible/roles/vm_set/tasks/kickstart_vm.yml: num_asic={{ num_asic }}
ansible/roles/vm_set/tasks/kickstart_vm.yml: num_asic={{ num_asic }}
ansible/roles/vm_set/tasks/start_sonic_vm.yml: num_asic: "{{ hostvars[dut_name]['num_asics'] | default(1) }}"
ansible/roles/vm_set/tasks/start_sonic_vm.yml: port_alias: hwsku={{ hostvars[dut_name].hwsku }} num_asic={{ num_asic }}
ansible/roles/vm_set/tasks/start_sonic_vm.yml: num_asic={{ num_asic }}
ansible/library/port_alias.py: port_alias: hwsku='ACS-MSN2700' num_asic=1
ansible/library/port_alias.py: num_asic=dict(type='int', required=False)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@SuvarnaMeenakshi should there be num_asic and also num_asics ?
In most of the places it uses {{ num_asic }} and in one place it uses {{ num_asics }} .
Why do we need both ? should it be just one ?

ansible/config_sonic_basedon_testbed.yml: port_alias: hwsku="{{ hwsku }}" num_asic="{{ num_asics }}"
ansible/roles/vm_set/tasks/kickstart_vm.yml: num_asic: "{{ hostvars[vm_name]['num_asics'] | default(1) }}"
ansible/roles/vm_set/tasks/kickstart_vm.yml: num_asic={{ num_asic }}
ansible/roles/vm_set/tasks/kickstart_vm.yml: num_asic={{ num_asic }}
ansible/roles/vm_set/tasks/start_sonic_vm.yml: num_asic: "{{ hostvars[dut_name]['num_asics'] | default(1) }}"
ansible/roles/vm_set/tasks/start_sonic_vm.yml: port_alias: hwsku={{ hostvars[dut_name].hwsku }} num_asic={{ num_asic }}
ansible/roles/vm_set/tasks/start_sonic_vm.yml: num_asic={{ num_asic }}
ansible/library/port_alias.py: port_alias: hwsku='ACS-MSN2700' num_asic=1
ansible/library/port_alias.py: num_asic=dict(type='int', required=False)

The "num_asics" is defind in ansible/veos_vtb and ansible/lab.
https://github.com/Azure/sonic-mgmt/blob/master/ansible/veos_vtb#L133
When we are doing a "gen-mg", we get the number of asics defined in inventory file instead of getting it from the device.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible/roles/vm_set/tasks/kickstart_vm.yml uses {{ num_asic }} and not {{ num_asics }}
where is the num_asic coming from ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ansible/roles/vm_set/tasks/kickstart_vm.yml uses {{ num_asic }} and not {{ num_asics }}
where is the num_asic coming from ?

https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/vm_set/tasks/kickstart_vm.yml#L53 - Here a variable "num_asic" is created which gets the data from "num_asics" in inventory file.

num_asic: "{{ hostvars[vm_name]['num_asics'] | default(1) }}"

The variable num_asic is referenced every where else in the file.
Ex: https://github.com/Azure/sonic-mgmt/blob/master/ansible/roles/vm_set/tasks/kickstart_vm.yml#L63 - here num_asic variable defined in line 53 above is used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_asics and num_asic are very close and leads to confusion.
should num_asic be renamed to num_asic_vm in kickstart_vm.yml and num_asic_dut in start_sonic_vm.yml ? or just use num_asics everywhere ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_asics and num_asic are very close and leads to confusion.
should num_asic be renamed to num_asic_vm in kickstart_vm.yml and num_asic_dut in start_sonic_vm.yml ? or just use num_asics everywhere ?

could use num_asics everywhere if that makes it easier, as the variable is locally used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can rename all the num_asic to num_asics part of #3245 or should that be done in a separate PR ?

Copy link
Contributor

@SuvarnaMeenakshi SuvarnaMeenakshi May 12, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest having separate PR if #3245 is not impacted apart from config_sonic_basedon_testbed.yml change, else you could use the same PR.

@saravanansv saravanansv force-pushed the master-chassistestlc branch from 2ad8b8c to 79dc080 Compare April 30, 2021 20:09
@lgtm-com
Copy link

lgtm-com bot commented May 1, 2021

This pull request introduces 12 alerts when merging 79dc080336c17a2bcf7188a88ab79b7be3ab1053 into b1f3518 - view on LGTM.com

new alerts:

  • 12 for Except block handles 'BaseException'

@anshuv-mfst
Copy link

Hi @yxieca - could you please assign MSFT reviewers, thanks.

@yxieca yxieca requested review from arlakshm and abdosi May 5, 2021 16:34
@yxieca
Copy link
Collaborator

yxieca commented May 5, 2021

@arlakshm, @abdosi please review. Thanks!

@saravanansv saravanansv force-pushed the master-chassistestlc branch from 79dc080 to c14a2e5 Compare May 6, 2021 00:12
@lgtm-com
Copy link

lgtm-com bot commented May 6, 2021

This pull request introduces 12 alerts when merging ba220a0b614e0b6ea09a31ef66fc17373c81d967 into 541056c - view on LGTM.com

new alerts:

  • 12 for Except block handles 'BaseException'

1. support adding inband data in LC's minigraph.
VoqInbandInterfaces fields come from testbed.yaml (fields voq_inband_intf, voq_inband_type, voq_inband_ip) to lab/veos inventory files to minigraph.
Data flows from testbed.yaml (fields voq_inband_intf, voq_inband_type, voq_inband_ip) to lab/veos inventory files to minigraph.

2. System ports

Data flows from
a. port_config.ini (new fileds required are: numVoq, coreId, corePortId), (existing fields: name, speed)
b. switchId = running asic_id count across all linecards in the chassis
c. systemPortId = running systemport count across all linecards in the chassis
Each dut adds its own system-ports to its own ansible_facts.
config_sonic_basedon_testbed.yml loops through all duts system-ports to create all_sysports
d. added changes to port_alias to generate system ports for Recirc ports as well

3. DeviceProperty
   <a:DeviceProperty>
     <a:Name>SwitchType</a:Name>
     <a:Reference i:nil="true"/>
     <a:Value>voq</a:Value>
   </a:DeviceProperty>
   <a:DeviceProperty>
     <a:Name>MaxCores</a:Name>
     <a:Reference i:nil="true"/>
     <a:Value>16</a:Value>
   </a:DeviceProperty>
   <a:DeviceProperty>
     <a:Name>SwitchId</a:Name>
     <a:Reference i:nil="true"/>
     <a:Value>0</a:Value>
   </a:DeviceProperty>

a. switch type, maxcores are directly from testbed.yaml to inventory files to minigraph
b. start_switchid is calculated for each linecard and set into inventory files from TestbedProcessing.py based on num_asics from previous linecards
@saravanansv saravanansv force-pushed the master-chassistestlc branch from ba220a0 to d9de985 Compare May 6, 2021 02:13
@lgtm-com
Copy link

lgtm-com bot commented May 6, 2021

This pull request introduces 13 alerts and fixes 5 when merging 2e20093f1420bf6dea1f93844ff012d3a6ba3781 into 66bfe6b - view on LGTM.com

new alerts:

  • 12 for Except block handles 'BaseException'
  • 1 for Syntax error

fixed alerts:

  • 5 for Unused import

bingwang-ms and others added 11 commits May 6, 2021 12:05
…onic-net#3198)

This PR implements mux_cable flap counter in mux_simulator_server.
The flap counter can be retrieved and cleared by HTTP request.

Signed-off-by: bingwang <[email protected]>
…net#3074)

* Enhance dynamic buffer calculation test

Signed-off-by: Stephen Sun <[email protected]>

What is the motivation for this PR?
* Ensure the buffer pool size is recovered after all the configurations have been reverted
      - The buffer pool size is verified based on the value fetched from the switch at the beginning of the test. Afterwards, it will calculate the new buffer pool size according to the configuration change.
      - The configurations changed during the test will be restored at the end of the test and the switch will recalculate the buffer pool size according to the restored value. However, it can take a few seconds for buffer pool size to be recovered in rare conditions.
      - If the next test starts immediately after the previous one finished, it is possible that the buffer pool size is not fully recovered, which results in the next test verifies the buffer pool size based on a wrong value, which fails the test.
* Adjust the buffer pool size calculation while toggling the shared headroom pool and there are 400G ports on the system.
* Add test case for removing PGs when a port is administratively down.
* Bug fixes:
     - in check_buffer_profiles_for_shp: the check passes only if all lossless profiles satisfy the condition
     - in QoS test, the shared headroom pool size should be fetched from APPL_DB or CONFIG_DB according to whether buffer information is in APPL_DB or CONFIG_DB

How did you do it?
* Ensure the buffer pool size is recovered after all the configurations have been reverted
      - Change the scope of the methods _check_pool_size and _get_pool_size_from_asic_db from check_pool_size to global so that they can be called by other methods.
      - Check the buffer pool size after all configuration changes have been restored at the end of each test.
* Adjust the buffer pool size calculation while toggling the shared headroom pool and there are 400G ports on the system.
      - Introduce a new method _fetch_size_difference_for_400g_ports to calculate the difference in buffer pool sizes between the modes with/without shared headroom pool enabled.
      - Take the difference into consideration when calculating the buffer pool size during toggling the shared headroom pool
* Add test case for removing PGs when a port is administratively down.
      - Introduce a new method test_port_admin_down for it.

How did you verify/test it?
Run the regression test.
Description of PR
Remove unused template file which is causing merge issues from windows
because of case insensitive duplicate file name presence

Approach
What is the motivation for this PR?
cleanup

How did you do it?
remove unused file

How did you verify/test it?
clone, find and grep

Co-authored-by: Rama Sasthri, Kristipati <[email protected]>
* This PR updates 'get_crm_facts' function.

The command 'crm show resource all' is not available for some time after config reload.
This PR adds a retry logic for handling this scenario.

Signed-off-by: bingwang <[email protected]>
…-net#3111)

* Implement server -> standby ToR orchagent test cases.

Signed-off-by: bingwang <[email protected]>
What is the motivation for this PR?
XCVRD runs some methods which attempt to read from cable hardware. This hardware doesn't exist for mux simulator, so XCVRD writes many errors to the syslog.

How did you do it?
Implement mocks for all methods in sonic_y_cable to prevent errors from being generated.

How did you verify/test it?
Inject new mux_simulator_client to dual ToR device with changes from sonic-net/sonic-platform-common#181. Verify PMON and mux containers stay up, and show mux status output looks normal. Initiate CLI switchover and confirm show mux status output reflects switchover.

Signed-off-by: Lawrence Lee <[email protected]>
…as5835_54t model (sonic-net#2961)

What is the motivation for this PR?
Fix failed test cases in as5835_54t model.

How did you do it?
Check output of aclshow -a command and ensure that one or more rules are able to displayed and values of counters are not N/A.

How did you verify/test it?
Run the modified code in as5835_54t model to ensure that the problem is fixed.
…run on chassis (sonic-net#2985)

What is the motivation for this PR?
Add new test cases for api for test plan introduced by PR sonic-net#2695 sonic-net#2695 (refer to sections 2 to 6)
Need to run existing api tests against a Sonice chassis but all of the tests were using duthost and hence will always run one of duthost of chassis i.e supervisor or line card. Converted these tests to use the enum_* fixture per hwsku for dut selection. Instead of running gather facts as fixture get facts within compare function for selected duthost.

How did you do it?
Following changes made to api helper files in folder sonic-mgmt/tests/common/helpers/platform_api/
chassis.py:
add new api calls: get_module_index, get_supervisor_slot, get_my_slot, is_modular_chassis

fan_drawer.py:
add new api call: get_maximum_consumed_power

module.py:
add new api calls: get_description,	get_slot, get_type,	get_oper_status, get_midplane_ip, 
is_midplane_reachable, get_maximum_consumed_power, reboot, set_admin_state

psu.py:
add new api calls: get_maximum_supplied_power, set_status_master_led, get_status_master_led

thermal.py:
add new api calls: get_minimum_recorded, get_maximum_recorded
changes maded to api tests to support Sonic Chassis and add tests for new apis:

conftest.py:
changes to run tests on  Sonic chassis
change  plaftorm_api_conn to support multidut by using duthosts and enum_rand_one_per_hwsku_hostname to select dut at function level, 
change from getting ip from eth0 to duthost.mgp_ip to extract ip address for DUT 

change start_platform_service  to support multidut by using duthosts and enum_rand_one_per_hwsku_hostname to select dut at function level,
change from getting ip from eth0 to duthost.mgp_ip to extract ip address for DUT 

change in stop_platform_api_service: to support on all duts where the service was started
for all test modules under api these are common changes :
Remove gather_facts fixture and get facts within compare_value_with_platform_facts per duthost from test
changes changes to run tests on Sonic chassis, Replace duthost with duthosts and enum_rand_one_per_hwsku_hostname for multidut environment for all tests

other changes per module:

test_chassis.py:
add tests for new apis for chassis: get_module_index, get_supervisor_slot, get_my_slot, is_modular_chassis

test_component.py:
remove range since image_list is list for image in range(image_list) 

test_fan_drawer.py:
add test for new apis for fan_drawer: get_maximum_consumed_power

test_module.py
add tests for new apis for module_base: get_description,	get_slot, get_type,	get_oper_status, get_midplane_ip, 
is_midplane_reachable, get_maximum_consumed_power, reboot, set_admin_state

test_psu.py
changes changes to run tests on  Sonic chassis,
add tests for new apis: get_maximum_supplied_power, set_status_master_led, get_status_master_led

test_thermal.py
changes changes to run tests on  Sonic chassis, 
add tests for new apis for thermal: get_minimum_recorded, get_maximum_recorded

How did you verify/test it?
Validated the modified tests against chassis.
…et#3248)

The test_bgp_gr_helper.py script should can be executed on t0  topology too.
This change is to update its topology marker from t0 to any.

Signed-off-by: Xin Wang <[email protected]>
vaibhavhd and others added 19 commits May 6, 2021 12:08
While creating the tunnel, some params are mandatory. So if the tunnel is created with the first attribute, it could result in failure. So it is essential that we apply the config in single-shot.
Therefore, this fix uses configgen and takes json input to apply config at once.
…onic-net#3411)

#### What is the motivation for this PR?

On pmon docker there are typically 2 thermalctl processes. One being the
parent process of the other. In order to restart the thermalctl process,
the existing code would assume that the process with the smallest pid is
the parent process. This is not always the case.

#### How did you do it?
A cleaner way is to user superivosorctl to restart the thermalctl processes
```
docker exec -i pmon bash -c 'supervisorctl restart thermalctl'
```
Modified the function "restart_thermal_control_daemon" in thermal_control_test_helpers.py with the above methodology to restart the thermalctl processes in the pmon docker
…net#3073)

What is the motivation for this PR?
Changes added to test_dip_sip to support multi asic platforms

How did you do it?
Change the test test_dip_sip to run on every ASIC connected to the external devices.

Signed-off-by: Arvindsrinivasan Lakshminarasimhan <[email protected]>
What is the motivation for this PR?
Change the tests in the file iface_namingmode.py to support multi asic

How did you do it?
Update the CLIs and redis-cmd to support multi asic platforms

Signed-off-by: Arvindsrinivasan Lakshminarasimhan <[email protected]>
…et#3412)

What is the motivation for this PR?
In common/plugins/pdu_controller/init.py pdu_controller fixture always assume that pdu controller hosts exists in inventory, I am adding change to handle case where pdu controller hosts is empty

How did you do it?
Add if condition that handles if pdu_host_list is empty when read from inventory

How did you verify/test it?
Test when pdu_host list does not exist none is returned
…ed encapsulated packets (sonic-net#3397)

* [vxlan/vnet] Enhance VNET test to verify source VxLAN port for received encapsulated packets

Signed-off-by: Andriy Yurkiv <[email protected]>
test_wr_arp fails for some platforms/images.
This is due to the response packet in Ferret script is generated for the incorrect destination port.
Check if a vxlan_port is configured in the DUT. If yes, use this port while building the packet. Otherwise, use the default value.
Verified in 201911 and master image, and the test_wr_arp passed.
…c-net#3436)

When configuring MUX_CABLE table in config_db, all interfaces are configured individually (for ipv4, ipv6 and state), and this leads to overwriting these values iteratively.
Use sonic-cfggen to configure MUX_CABLE table at once with json file (which contains config for each interface).
* Loganalyzer needs to be disabled for reboot scenarios to avoid test flakiness

Signed-off-by: Neetha John <[email protected]>
* Add and delete routes

* Add interface test
…050CX3-32S-D48C8 (sonic-net#3153)

* Updated qos/mmu test parameters for TD3 in single and dual TOR setups
* fix snmp test phy_entity

the test "test_remove_insert_fan_and_check_fan_info" needs to pass to function _check_psu_status_after_power_off the creds of all devices and not from itself.

Signed-off-by: Anton <[email protected]>

* fix system_health tests

the tests "test_device_checker" and "test_system_health_config" still using method "_file_exist", which was removed in the commit - sonic-net@3012631
Approach
What is the motivation for this PR?
When the mux simulator sanity check failed, it was not triggering any recovery actions.

How did you do it?
Add the 'host' field to the check results dictionary
Add 'mux_simulator' to the list of checks that can trigger a config reload

How did you verify/test it?
Set the DUT to fail the mux simulator sanity check, and run a test and make sure the check triggers a config reload and passes on the second attempt.

Signed-off-by: Lawrence Lee <[email protected]>
…onic-net#3398)

What is the motivation for this PR?
This PR aims to test the feature of monitoring critical processes by Monit in 20191130 image.

How did you do it?
The logic of this script is:
Step 1: Manually generate the expected regex of alerting messages for critical processes of containers in each namespace
Step 2: Kill each critical process one by one of containers in namespaces
Step 3: Wait for 70 seconds and loganalyzer will check whether the alerting messages fired by Monit in syslog match the regex created at Step 1.
Step 4: Post-check and restart containers by running the command sudo config reload.

How did you verify/test it?
I tested this pytest script on physical devices: str-n3164-acs-2 (Multi-ASIC) and str-msn2700-03 (Single ASIC). For the device
str-n3164-acs-2, I tested the 20191130 image. For device str-msn2700-03, I tested 202012 and 20191130.70 images

Any platform specific information?
N/A
Approach
What is the motivation for this PR?
Add azure pipeline script to bring up multi-asic testbed and run kvm tests.
How did you do it?
Add azure pipeline script :
Bring up multi-asic KVM testbed with 201911 image as 201911 image is currently stable for multi-asic
Deploy minigraph
Run multi-asic KVM tests
…gy. (sonic-net#3162)

How did you do it?
Modified ansible/roles/vm_set/tasks/add-topo.yml, ansible/roles/vm_set/tasks/remove-topo.yml, ansible/roles/vm_set/tasks/main.yml to create and bind IxANVL container in a PTF32 topology with SONIC-VS DUT.
Added a new configuration - "ixanvl-vs-conf" in testbed.csv and testbed.yaml for this.

Found one issue during PTF32 topology deployment as the bind was throwing errors as there was no VMs needed for this topology and hence was not defined in topology ymls. Fixed that error by making "vm_names" to be an optional argument in
ansible/roles/vm_set/library/vm_topology.py main function. Made subsequent changes in that file for this purpose.

How did you verify/test it?
By running testbed-cli.sh script for add-topo and remove-topo for ixanvl-vs-conf configuration.
What is the motivation for this PR?
Need to make the FIB tests work against a T2 chassis.

How did you do it?
Even though a T2 chassis has multiple DUTs (linecards), we have a single PTF instance that has all the injected ptf ports that connect to all the DUTs frontpanel ports. For the fib test, the src port and the expected rcv ports are from this list of all the
injected ptf ports.

In a T2 chassis, we have multiple linecards, and routes learnt from a linecard are distributed over the fabric to the other linecards. But, the fib on the other linecards points to the inband recyle port Ethernet-IB0. Therefore, when generating the fib_info_file
for the linecards, if the route is learned over the inband recyle port, we have to figure out which other linecard this route was learnt on, and use its frontpanel ports as the outgoing ports for this route.

Since a route learnt across the fabric will have an outgoing port on another linecard, and this route could be learnt from multiple linecards (as is the case with routes announced from T1 VMs in a T2 topology), the logic to validate the src_mac of the received packet in fib_test/hash_test has been modified to not check for the src_mac in the expected packet for the verify_packet_any_port call. The src mac is compared to the target_mac defined in ptf_test_port_map.json for the dut that has the rcvd_port. If it doesn't match then we fail.

How did you verify/test it?
Tested against a T2 chassis.
What is the motivation for this PR?
SONiC already set ARP cache limit quit high, test shouldn't need to increase according to its needs unless the threshold is
actually lower then needed.

How did you do it?
Skip ARP cache limit setting if the current limit is higher than requested limit.

How did you verify/test it?
Run crm test and watch ARP limit never changes during tests.

Signed-off-by: Ying Xie [email protected]
1. support adding inband data in LC's minigraph.
VoqInbandInterfaces fields come from testbed.yaml (fields voq_inband_intf, voq_inband_type, voq_inband_ip) to lab/veos inventory files to minigraph.
Data flows from testbed.yaml (fields voq_inband_intf, voq_inband_type, voq_inband_ip) to lab/veos inventory files to minigraph.

2. System ports

Data flows from
a. port_config.ini (new fileds required are: numVoq, coreId, corePortId), (existing fields: name, speed)
b. switchId = running asic_id count across all linecards in the chassis
c. systemPortId = running systemport count across all linecards in the chassis
Each dut adds its own system-ports to its own ansible_facts.
config_sonic_basedon_testbed.yml loops through all duts system-ports to create all_sysports
d. added changes to port_alias to generate system ports for Recirc ports as well

3. DeviceProperty
   <a:DeviceProperty>
     <a:Name>SwitchType</a:Name>
     <a:Reference i:nil="true"/>
     <a:Value>voq</a:Value>
   </a:DeviceProperty>
   <a:DeviceProperty>
     <a:Name>MaxCores</a:Name>
     <a:Reference i:nil="true"/>
     <a:Value>16</a:Value>
   </a:DeviceProperty>
   <a:DeviceProperty>
     <a:Name>SwitchId</a:Name>
     <a:Reference i:nil="true"/>
     <a:Value>0</a:Value>
   </a:DeviceProperty>

a. switch type, maxcores are directly from testbed.yaml to inventory files to minigraph
b. start_switchid is calculated for each linecard and set into inventory files from TestbedProcessing.py based on num_asics from previous linecards
@saravanansv
Copy link
Contributor Author

Closed this PR.
Will use #3245 to have a common PR for both Fabric and LC gen-minigraph support for VoQ

@shubav
Copy link
Contributor

shubav commented May 6, 2021

Closed this PR.
Will use #3245 to have a common PR for both Fabric and LC gen-minigraph support for VoQ

Does this mean that 3245 needs a relook for the fabric side too? Or is just a matter or merging these changes into 3245? Just for my understanding. Thanks.

@saravanansv
Copy link
Contributor Author

Closed this PR.
Will use #3245 to have a common PR for both Fabric and LC gen-minigraph support for VoQ

Does this mean that 3245 needs a relook for the fabric side too? Or is just a matter or merging these changes into 3245? Just for my understanding. Thanks.

I'm waiting for an all-pending tests to finish in 3245.
After that will patch the diffs for Linecard from here to 3245.
The existing diffs in 3245 for fabric will not be affected and continue to work.

@saravanansv
Copy link
Contributor Author

Closed this PR.
Will use #3245 to have a common PR for both Fabric and LC gen-minigraph support for VoQ

Does this mean that 3245 needs a relook for the fabric side too? Or is just a matter or merging these changes into 3245? Just for my understanding. Thanks.

I'm waiting for an all-pending tests to finish in 3245.
After that will patch the diffs for Linecard from here to 3245.
The existing diffs in 3245 for fabric will not be affected and continue to work.

3245 now has the linecard gen-mg changes with 1e0663a

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.