From 46606cb2d6089dc473025d681a45757343539c6b Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Mon, 26 Sep 2022 23:18:48 +0200
Subject: [PATCH 001/833] sparse: Add a guard for netinet/ip6.h header on
FreeBSD.
Same as arpa/inet.h, the netinet/ip6.h on FreeBSD requires
netinet/in.h to be included first. So, adding a similar guard.
Also fixing one instance where this is not respected at the moment.
We do have FreeBSD CI these days, but it is still nice to have
a more clear error message.
Fixes: b2befd5bb2db ("sparse: Add guards to prevent FreeBSD-incompatible #include order.")
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
include/sparse/netinet/ip6.h | 4 ++++
lib/netdev-offload-dpdk.c | 1 +
2 files changed, 5 insertions(+)
diff --git a/include/sparse/netinet/ip6.h b/include/sparse/netinet/ip6.h
index bfa637a4604..b2b6f47d9e2 100644
--- a/include/sparse/netinet/ip6.h
+++ b/include/sparse/netinet/ip6.h
@@ -18,6 +18,10 @@
#error "Use this header only with sparse. It is not a correct implementation."
#endif
+#ifndef NETINET_IN_H_INCLUDED
+#error "Must include before for FreeBSD support"
+#endif
+
#ifndef __NETINET_IP6_SPARSE
#define __NETINET_IP6_SPARSE 1
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index cceefbc5075..80a64a6cc06 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -17,6 +17,7 @@
#include
#include
+#include
#include
#include
#include
From b8932f5b339c731ed8962316330fa770ec1b4f5b Mon Sep 17 00:00:00 2001
From: Mike Pattrick
Date: Tue, 27 Sep 2022 12:04:53 -0400
Subject: [PATCH 002/833] vconn: Allow ECONNREFUSED in refuse connection test.
The "tcp vconn - refuse connection" test may fail due to a Connection
Refused error. The network stack returns ECONNREFUSED on a reset
connection in SYN_SENT state and EPIPE or ECONNRESET in all other
cases.
2022-09-19T17:45:48Z|00001|socket_util|INFO|0:127.0.0.1: listening on
port 34189
2022-09-19T17:45:48Z|00002|poll_loop|DBG|wakeup due to [POLLOUT][
POLLERR][POLLHUP] on fd 4 (127.0.0.1:47140<->) at ../lib/stream-fd.
c:153
test-vconn: unexpected vconn_connect() return value 111 (Connection
refused)
../../tests/vconn.at:21: exit code was 1, expected 0
530. vconn.at:21: 530. tcp vconn - refuse connection (vconn.at:21):
FAILED (vconn.at:21)
This was observed from a CI system, and isn't a common case.
Acked-by: Eelco Chaudron
Signed-off-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
tests/test-vconn.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/tests/test-vconn.c b/tests/test-vconn.c
index fc8ce4a2c0e..96c89bd4e68 100644
--- a/tests/test-vconn.c
+++ b/tests/test-vconn.c
@@ -157,6 +157,7 @@ test_refuse_connection(struct ovs_cmdl_context *ctx)
error = vconn_connect_block(vconn, (TIMEOUT - 2) * 1000);
if (!strcmp(type, "tcp")) {
if (error != ECONNRESET && error != EPIPE && error != ETIMEDOUT
+ && error != ECONNREFUSED
#ifdef _WIN32
&& error != WSAECONNRESET
#endif
From 691c5a5defc4f67b0932c71d80a517c46c711859 Mon Sep 17 00:00:00 2001
From: Fengqi Li
Date: Fri, 30 Sep 2022 09:09:28 +0800
Subject: [PATCH 003/833] daemon-unix: Fix file descriptor leak when monitor
restarts child.
When segmentation fault occurred in ovn-northd, monitor will try to
restart the ovn-northd daemon process every 10s.
Assume the following scenarios: There is a segmentation fault and
the ovn-northd daemon process does not restart properly every time.
New fds are created each time the ovn-northd daemon process is
restarted by the monitor process, but old fds(fd[0]) owned by
the monitor process was not closed properly. One pipe leak for
each restart of the ovn-northd daemon process. After a long time
file descriptors were exhausted.
Fixes: e2ed6fbeb18c ("fatal-signal: Catch SIGSEGV and print backtrace.")
Signed-off-by: Fengqi Li
Signed-off-by: Ilya Maximets
---
lib/daemon-unix.c | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lib/daemon-unix.c b/lib/daemon-unix.c
index 52f3d4bc635..1a7ba427d7a 100644
--- a/lib/daemon-unix.c
+++ b/lib/daemon-unix.c
@@ -396,6 +396,8 @@ monitor_daemon(pid_t daemon_pid)
}
log_received_backtrace(daemonize_fd);
+ close(daemonize_fd);
+ daemonize_fd = -1;
/* Throttle restarts to no more than once every 10 seconds. */
if (time(NULL) < last_restart + 10) {
From 6c47354069ef26a4e89fd3832e148ae86a57d44d Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 6 Oct 2022 22:06:18 +0200
Subject: [PATCH 004/833] AUTHORS: Add Fengqi Li.
Signed-off-by: Ilya Maximets
---
AUTHORS.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/AUTHORS.rst b/AUTHORS.rst
index f4184be8fc4..c13cf60c5e8 100644
--- a/AUTHORS.rst
+++ b/AUTHORS.rst
@@ -162,6 +162,7 @@ Ethan J. Jackson ejj@eecs.berkeley.edu
Ethan Rahn erahn@arista.com
Eziz Durdyyev ezizdurdy@gmail.com
Fabrizio D'Angelo fdangelo@redhat.com
+Fengqi Li lifengqi@inspur.com
Flavio Fernandes flavio@flaviof.com
Flavio Leitner fbl@redhat.com
Francesco Fusco ffusco@redhat.com
From 1a9482d53347de04be5ef1ac557cc0e33b5be1fb Mon Sep 17 00:00:00 2001
From: Timothy Redaelli
Date: Thu, 22 Sep 2022 15:40:32 +0200
Subject: [PATCH 005/833] dhparams: Fix .c file generation with OpenSSL >= 3.0.
Since OpenSSL upstream commit 1696b8909bbe
("Remove -C from dhparam,dsaparam,ecparam") "openssl dhparam" doesn't
support -C anymore.
This commit changes generate-dhparams-c to generate dhparams.c by parsing
"openssl dhparam -in "$1" -text -noout" output directly.
The generated file won't be used on OpenSSL >= 3.0, but it's still
needed to be generated if OVS is built on OpenSSL < 3.0.
Signed-off-by: Timothy Redaelli
Signed-off-by: Ilya Maximets
---
build-aux/generate-dhparams-c | 79 +++++++++++++++++++++++++++++++----
1 file changed, 71 insertions(+), 8 deletions(-)
diff --git a/build-aux/generate-dhparams-c b/build-aux/generate-dhparams-c
index 1884c99e1f0..a80db6207c4 100755
--- a/build-aux/generate-dhparams-c
+++ b/build-aux/generate-dhparams-c
@@ -1,5 +1,74 @@
#! /bin/sh -e
+dhparam_to_c() {
+ local bits
+ local get_p=0
+ local line
+ local nl="
+"
+ local p
+ local i=0
+ while read -r line; do
+ case "$line" in
+ *"DH Parameters: "*)
+ bits=${line#*DH Parameters: (}
+ bits=${bits% bit)}
+ continue
+ ;;
+ "P:"|"prime:")
+ get_p=1
+ continue
+ ;;
+ "G: "*|"generator: "*)
+ g=${line#*(}
+ g=${g%)}
+ g=$(printf "0x%.2X" "$g")
+ continue
+ ;;
+ esac
+ if [ "$get_p" = 1 ]; then
+ IFS=":"
+ for x in $line; do
+ [ -z "$p" ] && [ "$x" = "00" ] && continue
+ [ $i -ge 10 ] && i=0
+ [ $i -eq 0 ] && p="$p$nl "
+ x=0x$x
+ p=$(printf "%s 0x%.2X," "$p" "$x")
+ i=$((i + 1))
+ done
+ unset IFS
+ fi
+ done <
Date: Thu, 22 Sep 2022 15:40:33 +0200
Subject: [PATCH 006/833] Add support for OpenSSL 3.0 functions.
In OpenSSL 3.0 some functions were deprecated and replaced.
This commit adds some #ifdef to build without warning on both
OpenSSL 1.x and OpenSSL 3.x.
For OpenSSL 3.x, the default built-in DH parameters are used (as
suggested by SSL_CTX_set_dh_auto manpage).
Signed-off-by: Timothy Redaelli
Signed-off-by: Ilya Maximets
---
build-aux/generate-dhparams-c | 2 ++
lib/dhparams.c | 2 ++
lib/stream-ssl.c | 12 ++++++++++++
3 files changed, 16 insertions(+)
diff --git a/build-aux/generate-dhparams-c b/build-aux/generate-dhparams-c
index a80db6207c4..aca1dbca910 100755
--- a/build-aux/generate-dhparams-c
+++ b/build-aux/generate-dhparams-c
@@ -78,6 +78,7 @@ cat <<'EOF'
#include "lib/dhparams.h"
#include "openvswitch/util.h"
+#if OPENSSL_VERSION_NUMBER < 0x3000000fL
static int
my_DH_set0_pqg(DH *dh, BIGNUM *p, const BIGNUM **q OVS_UNUSED, BIGNUM *g)
{
@@ -93,3 +94,4 @@ my_DH_set0_pqg(DH *dh, BIGNUM *p, const BIGNUM **q OVS_UNUSED, BIGNUM *g)
EOF
dhparam_to_c lib/dh2048.pem
dhparam_to_c lib/dh4096.pem
+echo "#endif"
diff --git a/lib/dhparams.c b/lib/dhparams.c
index 85123863fc5..50209d5d813 100644
--- a/lib/dhparams.c
+++ b/lib/dhparams.c
@@ -6,6 +6,7 @@
#include "lib/dhparams.h"
#include "openvswitch/util.h"
+#if OPENSSL_VERSION_NUMBER < 0x3000000fL
static int
my_DH_set0_pqg(DH *dh, BIGNUM *p, const BIGNUM **q OVS_UNUSED, BIGNUM *g)
{
@@ -142,3 +143,4 @@ DH *get_dh4096(void)
}
return dh;
}
+#endif
diff --git a/lib/stream-ssl.c b/lib/stream-ssl.c
index f4fe3432e77..62da9febb66 100644
--- a/lib/stream-ssl.c
+++ b/lib/stream-ssl.c
@@ -193,7 +193,9 @@ static void ssl_clear_txbuf(struct ssl_stream *);
static void interpret_queued_ssl_error(const char *function);
static int interpret_ssl_error(const char *function, int ret, int error,
int *want);
+#if OPENSSL_VERSION_NUMBER < 0x3000000fL
static DH *tmp_dh_callback(SSL *ssl, int is_export OVS_UNUSED, int keylength);
+#endif
static void log_ca_cert(const char *file_name, X509 *cert);
static void stream_ssl_set_ca_cert_file__(const char *file_name,
bool bootstrap, bool force);
@@ -471,7 +473,11 @@ static char *
get_peer_common_name(const struct ssl_stream *sslv)
{
char *peer_name = NULL;
+#if OPENSSL_VERSION_NUMBER < 0x3000000fL
X509 *peer_cert = SSL_get_peer_certificate(sslv->ssl);
+#else
+ X509 *peer_cert = SSL_get1_peer_certificate(sslv->ssl);
+#endif
if (!peer_cert) {
return NULL;
}
@@ -1070,7 +1076,11 @@ do_ssl_init(void)
return ENOPROTOOPT;
}
SSL_CTX_set_options(ctx, SSL_OP_NO_SSLv2 | SSL_OP_NO_SSLv3);
+#if OPENSSL_VERSION_NUMBER < 0x3000000fL
SSL_CTX_set_tmp_dh_callback(ctx, tmp_dh_callback);
+#else
+ SSL_CTX_set_dh_auto(ctx, 1);
+#endif
SSL_CTX_set_mode(ctx, SSL_MODE_ENABLE_PARTIAL_WRITE);
SSL_CTX_set_mode(ctx, SSL_MODE_ACCEPT_MOVING_WRITE_BUFFER);
SSL_CTX_set_verify(ctx, SSL_VERIFY_PEER | SSL_VERIFY_FAIL_IF_NO_PEER_CERT,
@@ -1081,6 +1091,7 @@ do_ssl_init(void)
return 0;
}
+#if OPENSSL_VERSION_NUMBER < 0x3000000fL
static DH *
tmp_dh_callback(SSL *ssl OVS_UNUSED, int is_export OVS_UNUSED, int keylength)
{
@@ -1112,6 +1123,7 @@ tmp_dh_callback(SSL *ssl OVS_UNUSED, int is_export OVS_UNUSED, int keylength)
keylength);
return NULL;
}
+#endif
/* Returns true if SSL is at least partially configured. */
bool
From 0b21e234312ee25d52051375f2ca386212d4e609 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Fri, 1 Jul 2022 13:11:16 +0200
Subject: [PATCH 007/833] json: Fix deep copy of objects and arrays.
When reference counting for json objects was introduced the
old json_clone() function became json_deep_clone(), but it
still calls shallow json_clone() while cloning objects and
arrays not really producing a deep copy.
Fixing that by making other functions to perform a deep copy
as well. There are no users for this functionality inside
OVS right now, but OVS exports this functionality externally.
'ovstest test-json' extended to test both versions of a clone
on provided inputs.
Fixes: 9854d473adea ("json: Use reference counting in JSON objects")
Acked-by: Dumitru Ceara
Signed-off-by: Ilya Maximets
---
lib/json.c | 16 +++---
tests/test-json.c | 124 ++++++++++++++++++++++++++++++++++++++++++++--
2 files changed, 128 insertions(+), 12 deletions(-)
diff --git a/lib/json.c b/lib/json.c
index 3267a619633..aded8bb0159 100644
--- a/lib/json.c
+++ b/lib/json.c
@@ -420,8 +420,8 @@ json_destroy_array(struct json_array *array)
free(array->elems);
}
-static struct json *json_clone_object(const struct shash *object);
-static struct json *json_clone_array(const struct json_array *array);
+static struct json *json_deep_clone_object(const struct shash *object);
+static struct json *json_deep_clone_array(const struct json_array *array);
/* Returns a deep copy of 'json'. */
struct json *
@@ -429,10 +429,10 @@ json_deep_clone(const struct json *json)
{
switch (json->type) {
case JSON_OBJECT:
- return json_clone_object(json->object);
+ return json_deep_clone_object(json->object);
case JSON_ARRAY:
- return json_clone_array(&json->array);
+ return json_deep_clone_array(&json->array);
case JSON_STRING:
return json_string_create(json->string);
@@ -464,7 +464,7 @@ json_nullable_clone(const struct json *json)
}
static struct json *
-json_clone_object(const struct shash *object)
+json_deep_clone_object(const struct shash *object)
{
struct shash_node *node;
struct json *json;
@@ -472,20 +472,20 @@ json_clone_object(const struct shash *object)
json = json_object_create();
SHASH_FOR_EACH (node, object) {
struct json *value = node->data;
- json_object_put(json, node->name, json_clone(value));
+ json_object_put(json, node->name, json_deep_clone(value));
}
return json;
}
static struct json *
-json_clone_array(const struct json_array *array)
+json_deep_clone_array(const struct json_array *array)
{
struct json **elems;
size_t i;
elems = xmalloc(array->n * sizeof *elems);
for (i = 0; i < array->n; i++) {
- elems[i] = json_clone(array->elems[i]);
+ elems[i] = json_deep_clone(array->elems[i]);
}
return json_array_create(elems, array->n);
}
diff --git a/tests/test-json.c b/tests/test-json.c
index a2f4332e77b..6cf5eb75def 100644
--- a/tests/test-json.c
+++ b/tests/test-json.c
@@ -34,8 +34,123 @@ static int pretty = 0;
* instead of exactly one object or array. */
static int multiple = 0;
+static void test_json_equal(const struct json *a, const struct json *b,
+ bool allow_the_same);
+
+static void
+test_json_equal_object(const struct shash *a, const struct shash *b,
+ bool allow_the_same)
+{
+ struct shash_node *a_node;
+
+ ovs_assert(allow_the_same || a != b);
+
+ if (a == b) {
+ return;
+ }
+
+ ovs_assert(shash_count(a) == shash_count(b));
+
+ SHASH_FOR_EACH (a_node, a) {
+ struct shash_node *b_node = shash_find(b, a_node->name);
+
+ ovs_assert(b_node);
+ test_json_equal(a_node->data, b_node->data, allow_the_same);
+ }
+}
+
+static void
+test_json_equal_array(const struct json_array *a, const struct json_array *b,
+ bool allow_the_same)
+{
+ ovs_assert(allow_the_same || a != b);
+
+ if (a == b) {
+ return;
+ }
+
+ ovs_assert(a->n == b->n);
+
+ for (size_t i = 0; i < a->n; i++) {
+ test_json_equal(a->elems[i], b->elems[i], allow_the_same);
+ }
+}
+
+static void
+test_json_equal(const struct json *a, const struct json *b,
+ bool allow_the_same)
+{
+ ovs_assert(allow_the_same || a != b);
+ ovs_assert(a && b);
+
+ if (a == b) {
+ ovs_assert(a->count > 1);
+ return;
+ }
+
+ ovs_assert(a->type == b->type);
+
+ switch (a->type) {
+ case JSON_OBJECT:
+ test_json_equal_object(a->object, b->object, allow_the_same);
+ return;
+
+ case JSON_ARRAY:
+ test_json_equal_array(&a->array, &b->array, allow_the_same);
+ return;
+
+ case JSON_STRING:
+ case JSON_SERIALIZED_OBJECT:
+ ovs_assert(a->string != b->string);
+ ovs_assert(!strcmp(a->string, b->string));
+ return;
+
+ case JSON_NULL:
+ case JSON_FALSE:
+ case JSON_TRUE:
+ return;
+
+ case JSON_INTEGER:
+ ovs_assert(a->integer == b->integer);
+ return;
+
+ case JSON_REAL:
+ ovs_assert(a->real == b->real);
+ return;
+
+ case JSON_N_TYPES:
+ default:
+ OVS_NOT_REACHED();
+ }
+}
+
+static void
+test_json_clone(struct json *json)
+{
+ struct json *copy, *deep_copy;
+
+ copy = json_clone(json);
+
+ ovs_assert(json_equal(json, copy));
+ test_json_equal(json, copy, true);
+ ovs_assert(json->count == 2);
+
+ json_destroy(copy);
+ ovs_assert(json->count == 1);
+
+ deep_copy = json_deep_clone(json);
+
+ ovs_assert(json_equal(json, deep_copy));
+ test_json_equal(json, deep_copy, false);
+ ovs_assert(json->count == 1);
+ ovs_assert(deep_copy->count == 1);
+
+ json_destroy(deep_copy);
+ ovs_assert(json->count == 1);
+}
+
static bool
-print_and_free_json(struct json *json)
+print_test_and_free_json(struct json *json)
{
bool ok;
if (json->type == JSON_STRING) {
@@ -47,6 +162,7 @@ print_and_free_json(struct json *json)
free(s);
ok = true;
}
+ test_json_clone(json);
json_destroy(json);
return ok;
}
@@ -89,7 +205,7 @@ parse_multiple(FILE *stream)
used += json_parser_feed(parser, &buffer[used], n - used);
if (used < n) {
- if (!print_and_free_json(json_parser_finish(parser))) {
+ if (!print_test_and_free_json(json_parser_finish(parser))) {
ok = false;
}
parser = NULL;
@@ -97,7 +213,7 @@ parse_multiple(FILE *stream)
}
}
if (parser) {
- if (!print_and_free_json(json_parser_finish(parser))) {
+ if (!print_test_and_free_json(json_parser_finish(parser))) {
ok = false;
}
}
@@ -150,7 +266,7 @@ test_json_main(int argc, char *argv[])
if (multiple) {
ok = parse_multiple(stream);
} else {
- ok = print_and_free_json(json_from_stream(stream));
+ ok = print_test_and_free_json(json_from_stream(stream));
}
fclose(stream);
From 96b26dce1da18f00dcad2e14bc058158fffa313f Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 1 Sep 2022 17:42:49 +0200
Subject: [PATCH 008/833] ofproto-dpif-upcall: Print more data on unassociated
datapath ports.
When OVS fails to find an OpenFlow port for a packet received
from the upcall it just prints the warning like this:
|INFO|received packet on unassociated datapath port N
However, during the flow translation more information is available
as if the recirculation id wasn't found or it was a packet from
unknown tunnel port. Printing that information might be useful
to understand the origin of the problem.
Port translation functions already support extended error strings,
we just need to pass a variable where to store them.
With the change the output may be:
|INFO|received packet on unassociated datapath port N
(no OpenFlow port for datapath port N)
or
|INFO|received packet on unassociated datapath port N
(no OpenFlow tunnel port for this packet)
or
|INFO|received packet on unassociated datapath port N
(no recirculation data for recirc_id M)
Unfortunately, there is no good way to trigger this code from
current unit tests.
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
ofproto/ofproto-dpif-upcall.c | 27 +++++++++++++++++++--------
ofproto/ofproto-dpif-xlate.c | 6 ++++--
ofproto/ofproto-dpif-xlate.h | 2 +-
3 files changed, 24 insertions(+), 11 deletions(-)
diff --git a/ofproto/ofproto-dpif-upcall.c b/ofproto/ofproto-dpif-upcall.c
index 7ad728adffd..ad96354966f 100644
--- a/ofproto/ofproto-dpif-upcall.c
+++ b/ofproto/ofproto-dpif-upcall.c
@@ -402,7 +402,8 @@ static int upcall_receive(struct upcall *, const struct dpif_backer *,
const struct dp_packet *packet, enum dpif_upcall_type,
const struct nlattr *userdata, const struct flow *,
const unsigned int mru,
- const ovs_u128 *ufid, const unsigned pmd_id);
+ const ovs_u128 *ufid, const unsigned pmd_id,
+ char **errorp);
static void upcall_uninit(struct upcall *);
static void udpif_flow_rebalance(struct udpif *udpif);
@@ -827,6 +828,7 @@ recv_upcalls(struct handler *handler)
struct upcall *upcall = &upcalls[n_upcalls];
struct flow *flow = &flows[n_upcalls];
unsigned int mru = 0;
+ char *errorp = NULL;
uint64_t hash = 0;
int error;
@@ -853,7 +855,7 @@ recv_upcalls(struct handler *handler)
error = upcall_receive(upcall, udpif->backer, &dupcall->packet,
dupcall->type, dupcall->userdata, flow, mru,
- &dupcall->ufid, PMD_ID_NULL);
+ &dupcall->ufid, PMD_ID_NULL, &errorp);
if (error) {
if (error == ENODEV) {
/* Received packet on datapath port for which we couldn't
@@ -864,8 +866,11 @@ recv_upcalls(struct handler *handler)
dupcall->key_len, NULL, 0, NULL, 0,
&dupcall->ufid, PMD_ID_NULL, NULL);
VLOG_INFO_RL(&rl, "received packet on unassociated datapath "
- "port %"PRIu32, flow->in_port.odp_port);
+ "port %"PRIu32"%s%s%s", flow->in_port.odp_port,
+ errorp ? " (" : "", errorp ? errorp : "",
+ errorp ? ")" : "");
}
+ free(errorp);
goto free_dupcall;
}
@@ -1151,7 +1156,8 @@ upcall_receive(struct upcall *upcall, const struct dpif_backer *backer,
const struct dp_packet *packet, enum dpif_upcall_type type,
const struct nlattr *userdata, const struct flow *flow,
const unsigned int mru,
- const ovs_u128 *ufid, const unsigned pmd_id)
+ const ovs_u128 *ufid, const unsigned pmd_id,
+ char **errorp)
{
int error;
@@ -1160,7 +1166,8 @@ upcall_receive(struct upcall *upcall, const struct dpif_backer *backer,
return EAGAIN;
} else if (upcall->type == MISS_UPCALL) {
error = xlate_lookup(backer, flow, &upcall->ofproto, &upcall->ipfix,
- &upcall->sflow, NULL, &upcall->ofp_in_port);
+ &upcall->sflow, NULL, &upcall->ofp_in_port,
+ errorp);
if (error) {
return error;
}
@@ -1168,7 +1175,11 @@ upcall_receive(struct upcall *upcall, const struct dpif_backer *backer,
struct ofproto_dpif *ofproto
= ofproto_dpif_lookup_by_uuid(&upcall->cookie.ofproto_uuid);
if (!ofproto) {
- VLOG_INFO_RL(&rl, "upcall could not find ofproto");
+ if (errorp) {
+ *errorp = xstrdup("upcall could not find ofproto");
+ } else {
+ VLOG_INFO_RL(&rl, "upcall could not find ofproto");
+ }
return ENODEV;
}
upcall->ofproto = ofproto;
@@ -1358,7 +1369,7 @@ upcall_cb(const struct dp_packet *packet, const struct flow *flow, ovs_u128 *ufi
atomic_read_relaxed(&enable_megaflows, &megaflow);
error = upcall_receive(&upcall, udpif->backer, packet, type, userdata,
- flow, 0, ufid, pmd_id);
+ flow, 0, ufid, pmd_id, NULL);
if (error) {
return error;
}
@@ -2154,7 +2165,7 @@ xlate_key(struct udpif *udpif, const struct nlattr *key, unsigned int len,
}
error = xlate_lookup(udpif->backer, &ctx->flow, &ofproto, NULL, NULL,
- ctx->netflow, &ofp_in_port);
+ ctx->netflow, &ofp_in_port, NULL);
if (error) {
return error;
}
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index ab6f39bb264..3b9b26da171 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1603,17 +1603,19 @@ xlate_lookup_ofproto(const struct dpif_backer *backer, const struct flow *flow,
* be taken.
*
* Returns 0 if successful, ENODEV if the parsed flow has no associated ofproto.
+ * Sets an extended error string to 'errorp'. Callers are responsible for
+ * freeing that string.
*/
int
xlate_lookup(const struct dpif_backer *backer, const struct flow *flow,
struct ofproto_dpif **ofprotop, struct dpif_ipfix **ipfix,
struct dpif_sflow **sflow, struct netflow **netflow,
- ofp_port_t *ofp_in_port)
+ ofp_port_t *ofp_in_port, char **errorp)
{
struct ofproto_dpif *ofproto;
const struct xport *xport;
- ofproto = xlate_lookup_ofproto_(backer, flow, ofp_in_port, &xport, NULL);
+ ofproto = xlate_lookup_ofproto_(backer, flow, ofp_in_port, &xport, errorp);
if (!ofproto) {
return ENODEV;
diff --git a/ofproto/ofproto-dpif-xlate.h b/ofproto/ofproto-dpif-xlate.h
index c1af477c496..05b46fb26b1 100644
--- a/ofproto/ofproto-dpif-xlate.h
+++ b/ofproto/ofproto-dpif-xlate.h
@@ -209,7 +209,7 @@ struct ofproto_dpif * xlate_lookup_ofproto(const struct dpif_backer *,
int xlate_lookup(const struct dpif_backer *, const struct flow *,
struct ofproto_dpif **, struct dpif_ipfix **,
struct dpif_sflow **, struct netflow **,
- ofp_port_t *ofp_in_port);
+ ofp_port_t *ofp_in_port, char **errorp);
const char *xlate_strerror(enum xlate_error error);
From ccd26e79e5d24dd19e59d53337b51ce167966530 Mon Sep 17 00:00:00 2001
From: Lin Huang
Date: Thu, 6 Oct 2022 15:11:08 +0800
Subject: [PATCH 009/833] ovs-tcpdump: Fix bond port unable to capture jumbo
frames.
Currently the ovs-tcpdump utility creates a tap port to capture the
frames of a bond port.
If a user want to capture the packets from the bond port which member
interface's mtu is more than 1500. By default the utility creates a
tap port which mtu is 1500, regardless the member interface's mtu config.
So that user can't get the bond port frames which mtu is lager than 1500.
This patch fix this issue by checking the member interface's mtu and
set maximal mtu value to the tap port.
Acked-by: Aaron Conole
Signed-off-by: Lin Huang
Signed-off-by: Ilya Maximets
---
utilities/ovs-tcpdump.in | 7 +++++++
1 file changed, 7 insertions(+)
diff --git a/utilities/ovs-tcpdump.in b/utilities/ovs-tcpdump.in
index 7fd26e40557..e12bab88956 100755
--- a/utilities/ovs-tcpdump.in
+++ b/utilities/ovs-tcpdump.in
@@ -225,6 +225,13 @@ class OVSDB(object):
def interface_mtu(self, intf_name):
try:
intf = self._find_row_by_name('Interface', intf_name)
+ if intf is None:
+ mtu = 1500
+ port = self._find_row_by_name('Port', intf_name)
+ for intf in port.interfaces:
+ if mtu < intf.mtu[0]:
+ mtu = intf.mtu[0]
+ return mtu
return intf.mtu[0]
except Exception:
return None
From dc54104526030123fc8390e6106782c6a3aca2f3 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Mon, 10 Oct 2022 15:11:57 +0200
Subject: [PATCH 010/833] ovsdb: Fix race for datum JSON string reference
counter.
Compaction thread supposed to not change anything in the database
it is working on, since the same data can be accessed by the main
thread at the same time. However, while converting database rows
to JSON objects, strings in the datum will be cloned using
json_clone(), which is a shallow copy, and that will change the
reference counter for the JSON string object. If both the main
thread and the compaction thread will clone/destroy the same object
at the same time we may end up with a broken reference counter
leading to a memory leak or use-after free.
Adding a new argument to the database to JSON conversion to prevent
use of shallow copies from the compaction thread. This way all
the database operations will be truly read-only avoiding the race.
'ovsdb_atom_to_json' and 'ovsdb_datum_to_json' are more widely used,
so creating separate variant for these functions instead of adding
a new argument, to avoid changing a lot of existing code.
Other solution might be to use atomic reference counters, but that
will require API/ABI break, because counter is exposed in public
headers. Also, we can not easily expose atomic functions, so we'll
need to un-inline reference counting with the associated performance
cost.
Fixes: 3cd2cbd684e0 ("ovsdb: Prepare snapshot JSON in a separate thread.")
Reported-at: https://bugzilla.redhat.com/2133431
Acked-by: Dumitru Ceara
Signed-off-by: Ilya Maximets
---
lib/ovsdb-data.c | 56 ++++++++++++++++++++++++++++++++++++----------
lib/ovsdb-data.h | 2 ++
ovsdb/file.c | 34 ++++++++++++++++++++++------
ovsdb/file.h | 3 ++-
ovsdb/ovsdb-tool.c | 5 +++--
ovsdb/ovsdb.c | 7 ++++--
ovsdb/trigger.c | 2 +-
7 files changed, 84 insertions(+), 25 deletions(-)
diff --git a/lib/ovsdb-data.c b/lib/ovsdb-data.c
index 183e752583a..f18f74298f9 100644
--- a/lib/ovsdb-data.c
+++ b/lib/ovsdb-data.c
@@ -455,9 +455,15 @@ ovsdb_atom_from_json(union ovsdb_atom *atom,
/* Converts 'atom', of the specified 'type', to JSON format, and returns the
* JSON. The caller is responsible for freeing the returned JSON.
*
+ * If 'allow_shallow_copies' is false, deep copy of the string JSON object
+ * will be used. Useful when the same string object is accessed by multiple
+ * threads as deep copy will not change the reference counter of the original
+ * JSON string.
+ *
* Refer to RFC 7047 for the format of the JSON that this function produces. */
-struct json *
-ovsdb_atom_to_json(const union ovsdb_atom *atom, enum ovsdb_atomic_type type)
+static struct json *
+ovsdb_atom_to_json__(const union ovsdb_atom *atom, enum ovsdb_atomic_type type,
+ bool allow_shallow_copies)
{
switch (type) {
case OVSDB_TYPE_VOID:
@@ -473,7 +479,8 @@ ovsdb_atom_to_json(const union ovsdb_atom *atom, enum ovsdb_atomic_type type)
return json_boolean_create(atom->boolean);
case OVSDB_TYPE_STRING:
- return json_clone(atom->s);
+ return allow_shallow_copies ? json_clone(atom->s)
+ : json_deep_clone(atom->s);
case OVSDB_TYPE_UUID:
return wrap_json("uuid", json_string_create_nocopy(
@@ -485,6 +492,19 @@ ovsdb_atom_to_json(const union ovsdb_atom *atom, enum ovsdb_atomic_type type)
}
}
+struct json *
+ovsdb_atom_to_json(const union ovsdb_atom *atom, enum ovsdb_atomic_type type)
+{
+ return ovsdb_atom_to_json__(atom, type, true);
+}
+
+static struct json *
+ovsdb_atom_to_json_deep(const union ovsdb_atom *atom,
+ enum ovsdb_atomic_type type)
+{
+ return ovsdb_atom_to_json__(atom, type, false);
+}
+
static char *
ovsdb_atom_from_string__(union ovsdb_atom *atom,
union ovsdb_atom **range_end_atom,
@@ -1409,12 +1429,15 @@ ovsdb_unconstrained_datum_from_json(struct ovsdb_datum *datum,
static struct json *
ovsdb_base_to_json(const union ovsdb_atom *atom,
const struct ovsdb_base_type *base,
- bool use_row_names)
+ bool use_row_names,
+ bool allow_shallow_copies)
{
if (!use_row_names
|| base->type != OVSDB_TYPE_UUID
|| !base->uuid.refTableName) {
- return ovsdb_atom_to_json(atom, base->type);
+ return allow_shallow_copies
+ ? ovsdb_atom_to_json(atom, base->type)
+ : ovsdb_atom_to_json_deep(atom, base->type);
} else {
return json_array_create_2(
json_string_create("named-uuid"),
@@ -1425,7 +1448,8 @@ ovsdb_base_to_json(const union ovsdb_atom *atom,
static struct json *
ovsdb_datum_to_json__(const struct ovsdb_datum *datum,
const struct ovsdb_type *type,
- bool use_row_names)
+ bool use_row_names,
+ bool allow_shallow_copies)
{
if (ovsdb_type_is_map(type)) {
struct json **elems;
@@ -1435,14 +1459,15 @@ ovsdb_datum_to_json__(const struct ovsdb_datum *datum,
for (i = 0; i < datum->n; i++) {
elems[i] = json_array_create_2(
ovsdb_base_to_json(&datum->keys[i], &type->key,
- use_row_names),
+ use_row_names, allow_shallow_copies),
ovsdb_base_to_json(&datum->values[i], &type->value,
- use_row_names));
+ use_row_names, allow_shallow_copies));
}
return wrap_json("map", json_array_create(elems, datum->n));
} else if (datum->n == 1) {
- return ovsdb_base_to_json(&datum->keys[0], &type->key, use_row_names);
+ return ovsdb_base_to_json(&datum->keys[0], &type->key,
+ use_row_names, allow_shallow_copies);
} else {
struct json **elems;
size_t i;
@@ -1450,7 +1475,7 @@ ovsdb_datum_to_json__(const struct ovsdb_datum *datum,
elems = xmalloc(datum->n * sizeof *elems);
for (i = 0; i < datum->n; i++) {
elems[i] = ovsdb_base_to_json(&datum->keys[i], &type->key,
- use_row_names);
+ use_row_names, allow_shallow_copies);
}
return wrap_json("set", json_array_create(elems, datum->n));
@@ -1467,14 +1492,21 @@ struct json *
ovsdb_datum_to_json(const struct ovsdb_datum *datum,
const struct ovsdb_type *type)
{
- return ovsdb_datum_to_json__(datum, type, false);
+ return ovsdb_datum_to_json__(datum, type, false, true);
+}
+
+struct json *
+ovsdb_datum_to_json_deep(const struct ovsdb_datum *datum,
+ const struct ovsdb_type *type)
+{
+ return ovsdb_datum_to_json__(datum, type, false, false);
}
struct json *
ovsdb_datum_to_json_with_row_names(const struct ovsdb_datum *datum,
const struct ovsdb_type *type)
{
- return ovsdb_datum_to_json__(datum, type, true);
+ return ovsdb_datum_to_json__(datum, type, true, true);
}
static const char *
diff --git a/lib/ovsdb-data.h b/lib/ovsdb-data.h
index dcb62051358..f048a8cb03d 100644
--- a/lib/ovsdb-data.h
+++ b/lib/ovsdb-data.h
@@ -195,6 +195,8 @@ ovsdb_unconstrained_datum_from_json(struct ovsdb_datum *,
OVS_WARN_UNUSED_RESULT;
struct json *ovsdb_datum_to_json(const struct ovsdb_datum *,
const struct ovsdb_type *);
+struct json *ovsdb_datum_to_json_deep(const struct ovsdb_datum *,
+ const struct ovsdb_type *);
char *ovsdb_datum_from_string(struct ovsdb_datum *,
const struct ovsdb_type *, const char *,
diff --git a/ovsdb/file.c b/ovsdb/file.c
index ca80c282356..fdc289ad1b7 100644
--- a/ovsdb/file.c
+++ b/ovsdb/file.c
@@ -52,7 +52,8 @@ static void ovsdb_file_txn_init(struct ovsdb_file_txn *);
static void ovsdb_file_txn_add_row(struct ovsdb_file_txn *,
const struct ovsdb_row *old,
const struct ovsdb_row *new,
- const unsigned long int *changed);
+ const unsigned long int *changed,
+ bool allow_shallow_copies);
/* If set to 'true', file transactions will contain difference between
* datums of old and new rows and not the whole new datum for the column. */
@@ -361,12 +362,19 @@ ovsdb_file_change_cb(const struct ovsdb_row *old,
void *ftxn_)
{
struct ovsdb_file_txn *ftxn = ftxn_;
- ovsdb_file_txn_add_row(ftxn, old, new, changed);
+ ovsdb_file_txn_add_row(ftxn, old, new, changed, true);
return true;
}
+/* Converts the database into transaction JSON representation.
+ * If 'allow_shallow_copies' is false, makes sure that all the JSON
+ * objects in the resulted transaction JSON are separately allocated
+ * objects and not shallow clones of JSON objects already existing
+ * in the database. Useful when multiple threads are working on the
+ * same database object. */
struct json *
-ovsdb_to_txn_json(const struct ovsdb *db, const char *comment)
+ovsdb_to_txn_json(const struct ovsdb *db, const char *comment,
+ bool allow_shallow_copies)
{
struct ovsdb_file_txn ftxn;
@@ -378,7 +386,8 @@ ovsdb_to_txn_json(const struct ovsdb *db, const char *comment)
const struct ovsdb_row *row;
HMAP_FOR_EACH (row, hmap_node, &table->rows) {
- ovsdb_file_txn_add_row(&ftxn, NULL, row, NULL);
+ ovsdb_file_txn_add_row(&ftxn, NULL, row, NULL,
+ allow_shallow_copies);
}
}
@@ -426,7 +435,8 @@ static void
ovsdb_file_txn_add_row(struct ovsdb_file_txn *ftxn,
const struct ovsdb_row *old,
const struct ovsdb_row *new,
- const unsigned long int *changed)
+ const unsigned long int *changed,
+ bool allow_shallow_copies)
{
struct json *row;
@@ -451,10 +461,20 @@ ovsdb_file_txn_add_row(struct ovsdb_file_txn *ftxn,
if (old && use_column_diff) {
ovsdb_datum_diff(&datum, &old->fields[idx],
&new->fields[idx], type);
- column_json = ovsdb_datum_to_json(&datum, type);
+ if (allow_shallow_copies) {
+ column_json = ovsdb_datum_to_json(&datum, type);
+ } else {
+ column_json = ovsdb_datum_to_json_deep(&datum, type);
+ }
ovsdb_datum_destroy(&datum, type);
} else {
- column_json = ovsdb_datum_to_json(&new->fields[idx], type);
+ if (allow_shallow_copies) {
+ column_json = ovsdb_datum_to_json(
+ &new->fields[idx], type);
+ } else {
+ column_json = ovsdb_datum_to_json_deep(
+ &new->fields[idx], type);
+ }
}
if (!row) {
row = json_object_create();
diff --git a/ovsdb/file.h b/ovsdb/file.h
index be4f6ad27ca..ae90d4fe130 100644
--- a/ovsdb/file.h
+++ b/ovsdb/file.h
@@ -25,7 +25,8 @@ struct ovsdb_txn;
void ovsdb_file_column_diff_disable(void);
-struct json *ovsdb_to_txn_json(const struct ovsdb *, const char *comment);
+struct json *ovsdb_to_txn_json(const struct ovsdb *, const char *comment,
+ bool allow_shallow_copies);
struct json *ovsdb_file_txn_to_json(const struct ovsdb_txn *);
struct json *ovsdb_file_txn_annotate(struct json *, const char *comment);
struct ovsdb_error *ovsdb_file_txn_from_json(struct ovsdb *,
diff --git a/ovsdb/ovsdb-tool.c b/ovsdb/ovsdb-tool.c
index df2e373c3cd..60f353197bf 100644
--- a/ovsdb/ovsdb-tool.c
+++ b/ovsdb/ovsdb-tool.c
@@ -304,7 +304,7 @@ do_create_cluster(struct ovs_cmdl_context *ctx)
struct ovsdb *ovsdb = ovsdb_file_read(src_file_name, false);
char *comment = xasprintf("created from %s", src_file_name);
- data = ovsdb_to_txn_json(ovsdb, comment);
+ data = ovsdb_to_txn_json(ovsdb, comment, true);
free(comment);
schema = ovsdb_schema_clone(ovsdb->schema);
ovsdb_destroy(ovsdb);
@@ -359,7 +359,8 @@ write_standalone_db(const char *file_name, const char *comment,
error = ovsdb_log_write_and_free(log, ovsdb_schema_to_json(db->schema));
if (!error) {
- error = ovsdb_log_write_and_free(log, ovsdb_to_txn_json(db, comment));
+ error = ovsdb_log_write_and_free(log,
+ ovsdb_to_txn_json(db, comment, true));
}
ovsdb_log_close(log);
diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c
index 8cbefbe3d21..1c011fab00d 100644
--- a/ovsdb/ovsdb.c
+++ b/ovsdb/ovsdb.c
@@ -585,7 +585,9 @@ compaction_thread(void *aux)
struct json *data;
VLOG_DBG("%s: Compaction thread started.", state->db->name);
- data = ovsdb_to_txn_json(state->db, "compacting database online");
+ data = ovsdb_to_txn_json(state->db, "compacting database online",
+ /* Do not allow shallow copies to avoid races. */
+ false);
state->data = json_serialized_object_create(data);
json_destroy(data);
@@ -633,7 +635,8 @@ ovsdb_snapshot(struct ovsdb *db, bool trim_memory OVS_UNUSED)
if (!applied_index) {
/* Parallel compaction is not supported for standalone databases. */
state = xzalloc(sizeof *state);
- state->data = ovsdb_to_txn_json(db, "compacting database online");
+ state->data = ovsdb_to_txn_json(db,
+ "compacting database online", true);
state->schema = ovsdb_schema_to_json(db->schema);
} else if (ovsdb_snapshot_ready(db)) {
xpthread_join(db->snap_state->thread, NULL);
diff --git a/ovsdb/trigger.c b/ovsdb/trigger.c
index 7d3003bca32..01bb80e282b 100644
--- a/ovsdb/trigger.c
+++ b/ovsdb/trigger.c
@@ -282,7 +282,7 @@ ovsdb_trigger_try(struct ovsdb_trigger *t, long long int now)
/* Make the new copy into a transaction log record. */
struct json *txn_json = ovsdb_to_txn_json(
- newdb, "converted by ovsdb-server");
+ newdb, "converted by ovsdb-server", true);
/* Propose the change. */
t->progress = ovsdb_txn_propose_schema_change(
From edeefe762331095574be64b238320f4e7cd4f637 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Wed, 12 Oct 2022 11:19:41 +0200
Subject: [PATCH 011/833] github: Update versions of action dependencies.
checkout@v2, cache@v2 and setup-python@v2 are using outdated Node.js 12
which is now deprecated in GHA [1], so these actions will stop working
soon.
Updating to most recent major versions with Node.js 16. This stops GHA
from throwing warnings in every build.
While at it, also updating upload-artifacts to more recent version.
[1] https://github.blog/changelog/2022-09-22-github-actions-all-actions-will-begin-running-on-node16-instead-of-node12/
Acked-by: David Marchand
Signed-off-by: Ilya Maximets
---
.github/workflows/build-and-test.yml | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/.github/workflows/build-and-test.yml b/.github/workflows/build-and-test.yml
index 58ab85e5d7e..7baa914034a 100644
--- a/.github/workflows/build-and-test.yml
+++ b/.github/workflows/build-and-test.yml
@@ -96,7 +96,7 @@ jobs:
steps:
- name: checkout
- uses: actions/checkout@v2
+ uses: actions/checkout@v3
- name: update PATH
run: |
@@ -104,7 +104,7 @@ jobs:
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: set up python
- uses: actions/setup-python@v2
+ uses: actions/setup-python@v4
with:
python-version: '3.9'
@@ -120,7 +120,7 @@ jobs:
- name: cache
if: matrix.dpdk != '' || matrix.dpdk_shared != ''
- uses: actions/cache@v2
+ uses: actions/cache@v3
env:
matrix_key: ${{ matrix.dpdk }}${{ matrix.dpdk_shared }}
ci_key: ${{ hashFiles('dpdk-ci-signature') }}
@@ -156,7 +156,7 @@ jobs:
- name: upload logs on failure
if: failure() || cancelled()
- uses: actions/upload-artifact@v2
+ uses: actions/upload-artifact@v3
with:
name: logs-linux-${{ join(matrix.*, '-') }}
path: logs.tgz
@@ -175,13 +175,13 @@ jobs:
steps:
- name: checkout
- uses: actions/checkout@v2
+ uses: actions/checkout@v3
- name: update PATH
run: |
echo "$HOME/bin" >> $GITHUB_PATH
echo "$HOME/.local/bin" >> $GITHUB_PATH
- name: set up python
- uses: actions/setup-python@v2
+ uses: actions/setup-python@v4
with:
python-version: '3.9'
- name: install dependencies
@@ -192,7 +192,7 @@ jobs:
run: ./.ci/osx-build.sh
- name: upload logs on failure
if: failure()
- uses: actions/upload-artifact@v2
+ uses: actions/upload-artifact@v3
with:
name: logs-osx-clang---disable-ssl
path: config.log
@@ -217,7 +217,7 @@ jobs:
steps:
- name: checkout
- uses: actions/checkout@v2
+ uses: actions/checkout@v3
- name: update PATH
run: |
@@ -239,7 +239,7 @@ jobs:
run: ./.ci/linux-build.sh
- name: upload deb packages
- uses: actions/upload-artifact@v2
+ uses: actions/upload-artifact@v3
with:
name: deb-packages-${{ matrix.dpdk }}-dpdk
path: '/home/runner/work/ovs/*.deb'
From 6f535383948664794ceccf5471e6d77000478877 Mon Sep 17 00:00:00 2001
From: Ben Pfaff
Date: Fri, 7 Jun 2019 16:28:24 -0700
Subject: [PATCH 012/833] ofproto-dpif-xlate: Do not use zero-weight buckets in
select groups.
The OpenFlow specification says that buckets in select groups with a weight
of zero should not be selected, but the ofproto-dpif implementation could
select them in corner cases. This fixes the problem.
Reported-by: ychen
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2019-May/359349.html
Signed-off-by: Ben Pfaff
Signed-off-by: Ilya Maximets
---
ofproto/ofproto-dpif-xlate.c | 16 +++++++++++-----
1 file changed, 11 insertions(+), 5 deletions(-)
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 3b9b26da171..81deb72d91c 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -1924,8 +1924,8 @@ group_is_alive(const struct xlate_ctx *ctx, uint32_t group_id, int depth)
#define MAX_LIVENESS_RECURSION 128 /* Arbitrary limit */
static bool
-bucket_is_alive(const struct xlate_ctx *ctx,
- struct ofputil_bucket *bucket, int depth)
+bucket_is_alive(const struct xlate_ctx *ctx, const struct group_dpif *group,
+ const struct ofputil_bucket *bucket, int depth)
{
if (depth >= MAX_LIVENESS_RECURSION) {
xlate_report_error(ctx, "bucket chaining exceeded %d links",
@@ -1933,6 +1933,12 @@ bucket_is_alive(const struct xlate_ctx *ctx,
return false;
}
+ /* In "select" groups, buckets with weight 0 are not used.
+ * In other kinds of groups, weight does not matter. */
+ if (group->up.type == OFPGT11_SELECT && bucket->weight == 0) {
+ return false;
+ }
+
return (!ofputil_bucket_has_liveness(bucket)
|| (bucket->watch_port != OFPP_ANY
&& bucket->watch_port != OFPP_CONTROLLER
@@ -1973,7 +1979,7 @@ group_first_live_bucket(const struct xlate_ctx *ctx,
{
struct ofputil_bucket *bucket;
LIST_FOR_EACH (bucket, list_node, &group->up.buckets) {
- if (bucket_is_alive(ctx, bucket, depth)) {
+ if (bucket_is_alive(ctx, group, bucket, depth)) {
return bucket;
}
xlate_report_bucket_not_live(ctx, bucket);
@@ -1992,7 +1998,7 @@ group_best_live_bucket(const struct xlate_ctx *ctx,
struct ofputil_bucket *bucket;
LIST_FOR_EACH (bucket, list_node, &group->up.buckets) {
- if (bucket_is_alive(ctx, bucket, 0)) {
+ if (bucket_is_alive(ctx, group, bucket, 0)) {
uint32_t score =
(hash_int(bucket->bucket_id, basis) & 0xffff) * bucket->weight;
if (score >= best_score) {
@@ -4755,7 +4761,7 @@ pick_dp_hash_select_group(struct xlate_ctx *ctx, struct group_dpif *group)
for (int i = 0; i <= hash_mask; i++) {
struct ofputil_bucket *b =
group->hash_map[(dp_hash + i) & hash_mask];
- if (bucket_is_alive(ctx, b, 0)) {
+ if (bucket_is_alive(ctx, group, b, 0)) {
return b;
}
}
From 31db0e043119cf597d720d94f70ec19cf5b8b7d4 Mon Sep 17 00:00:00 2001
From: Yanqin Wei
Date: Mon, 18 Nov 2019 10:45:18 +0800
Subject: [PATCH 013/833] cmap: Add thread fence for slot update.
Bucket update in the cmap lib is protected by a counter. But hash setting
is possible to be moved before counter update. This patch fix this issue.
Reviewed-by: Ola Liljedahl
Reviewed-by: Gavin Hu
Signed-off-by: Yanqin Wei
Signed-off-by: Ilya Maximets
---
lib/cmap.c | 4 +++-
1 file changed, 3 insertions(+), 1 deletion(-)
diff --git a/lib/cmap.c b/lib/cmap.c
index c9eef3f4aea..8ca893b0b25 100644
--- a/lib/cmap.c
+++ b/lib/cmap.c
@@ -598,7 +598,9 @@ cmap_set_bucket(struct cmap_bucket *b, int i,
uint32_t c;
atomic_read_explicit(&b->counter, &c, memory_order_acquire);
- atomic_store_explicit(&b->counter, c + 1, memory_order_release);
+ atomic_store_explicit(&b->counter, c + 1, memory_order_relaxed);
+ /* Need to make sure setting hash is not moved up before counter update. */
+ atomic_thread_fence(memory_order_release);
ovsrcu_set(&b->nodes[i].next, node); /* Also atomic. */
b->hashes[i] = hash;
atomic_store_explicit(&b->counter, c + 2, memory_order_release);
From 76ab364ea8facd73366411916d7d0f5ff611daed Mon Sep 17 00:00:00 2001
From: Eli Britstein
Date: Wed, 31 Aug 2022 12:59:55 +0300
Subject: [PATCH 014/833] netdev-offload: Set 'miss_api_supported' to be under
netdev.
Cited commit introduced a flag in dpif-netdev level, to optimize
performance and avoid hw_miss_packet_recover() for devices with no such
support.
However, there is a race condition between traffic processing and
assigning a 'flow_api' object to the netdev. In such case, EOPNOTSUPP is
returned by netdev_hw_miss_packet_recover() in netdev-offload.c layer
because 'flow_api' is not yet initialized. As a result, the flag is
falsely disabled, and subsequent packets won't be recovered, though they
should.
In order to fix it, move the flag to be in netdev-offload layer, to
avoid that race.
Fixes: 6e50c1651869 ("dpif-netdev: Avoid hw_miss_packet_recover() for devices with no support.")
Signed-off-by: Eli Britstein
Signed-off-by: Ilya Maximets
---
lib/dpif-netdev.c | 18 +++++++-----------
lib/netdev-offload.c | 28 +++++++++++++++++++++++-----
lib/netdev-offload.h | 2 ++
lib/netdev.c | 1 +
4 files changed, 33 insertions(+), 16 deletions(-)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index a45b460145c..2c08a71c8db 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -431,7 +431,6 @@ struct dp_netdev_rxq {
unsigned intrvl_idx; /* Write index for 'cycles_intrvl'. */
struct dp_netdev_pmd_thread *pmd; /* pmd thread that polls this queue. */
bool is_vhost; /* Is rxq of a vhost port. */
- bool hw_miss_api_supported; /* hw_miss_packet_recover() supported.*/
/* Counters of cycles spent successfully polling and processing pkts. */
atomic_ullong cycles[RXQ_N_CYCLES];
@@ -5416,7 +5415,6 @@ port_reconfigure(struct dp_netdev_port *port)
port->rxqs[i].port = port;
port->rxqs[i].is_vhost = !strncmp(port->type, "dpdkvhost", 9);
- port->rxqs[i].hw_miss_api_supported = true;
err = netdev_rxq_open(netdev, &port->rxqs[i].rx, i);
if (err) {
@@ -8034,17 +8032,15 @@ dp_netdev_hw_flow(const struct dp_netdev_pmd_thread *pmd,
#ifdef ALLOW_EXPERIMENTAL_API /* Packet restoration API required. */
/* Restore the packet if HW processing was terminated before completion. */
struct dp_netdev_rxq *rxq = pmd->ctx.last_rxq;
+ bool miss_api_supported;
- if (rxq->hw_miss_api_supported) {
+ atomic_read_relaxed(&rxq->port->netdev->hw_info.miss_api_supported,
+ &miss_api_supported);
+ if (miss_api_supported) {
int err = netdev_hw_miss_packet_recover(rxq->port->netdev, packet);
- if (err) {
- if (err != EOPNOTSUPP) {
- COVERAGE_INC(datapath_drop_hw_miss_recover);
- return -1;
- } else {
- /* API unsupported by the port; avoid subsequent calls. */
- rxq->hw_miss_api_supported = false;
- }
+ if (err && err != EOPNOTSUPP) {
+ COVERAGE_INC(datapath_drop_hw_miss_recover);
+ return -1;
}
}
#endif
diff --git a/lib/netdev-offload.c b/lib/netdev-offload.c
index 9fde5f7a95f..4592262bd34 100644
--- a/lib/netdev-offload.c
+++ b/lib/netdev-offload.c
@@ -183,6 +183,7 @@ netdev_assign_flow_api(struct netdev *netdev)
CMAP_FOR_EACH (rfa, cmap_node, &netdev_flow_apis) {
if (!rfa->flow_api->init_flow_api(netdev)) {
ovs_refcount_ref(&rfa->refcnt);
+ atomic_store_relaxed(&netdev->hw_info.miss_api_supported, true);
ovsrcu_set(&netdev->flow_api, rfa->flow_api);
VLOG_INFO("%s: Assigned flow API '%s'.",
netdev_get_name(netdev), rfa->flow_api->type);
@@ -191,6 +192,7 @@ netdev_assign_flow_api(struct netdev *netdev)
VLOG_DBG("%s: flow API '%s' is not suitable.",
netdev_get_name(netdev), rfa->flow_api->type);
}
+ atomic_store_relaxed(&netdev->hw_info.miss_api_supported, false);
VLOG_INFO("%s: No suitable flow API found.", netdev_get_name(netdev));
return -1;
@@ -322,12 +324,28 @@ int
netdev_hw_miss_packet_recover(struct netdev *netdev,
struct dp_packet *packet)
{
- const struct netdev_flow_api *flow_api =
- ovsrcu_get(const struct netdev_flow_api *, &netdev->flow_api);
+ const struct netdev_flow_api *flow_api;
+ bool miss_api_supported;
+ int rv;
+
+ atomic_read_relaxed(&netdev->hw_info.miss_api_supported,
+ &miss_api_supported);
+ if (!miss_api_supported) {
+ return EOPNOTSUPP;
+ }
+
+ flow_api = ovsrcu_get(const struct netdev_flow_api *, &netdev->flow_api);
+ if (!flow_api || !flow_api->hw_miss_packet_recover) {
+ return EOPNOTSUPP;
+ }
+
+ rv = flow_api->hw_miss_packet_recover(netdev, packet);
+ if (rv == EOPNOTSUPP) {
+ /* API unsupported by the port; avoid subsequent calls. */
+ atomic_store_relaxed(&netdev->hw_info.miss_api_supported, false);
+ }
- return (flow_api && flow_api->hw_miss_packet_recover)
- ? flow_api->hw_miss_packet_recover(netdev, packet)
- : EOPNOTSUPP;
+ return rv;
}
int
diff --git a/lib/netdev-offload.h b/lib/netdev-offload.h
index 180d3f95f06..edc843cd99a 100644
--- a/lib/netdev-offload.h
+++ b/lib/netdev-offload.h
@@ -20,6 +20,7 @@
#include "openvswitch/netdev.h"
#include "openvswitch/types.h"
+#include "ovs-atomic.h"
#include "ovs-rcu.h"
#include "ovs-thread.h"
#include "openvswitch/ofp-meter.h"
@@ -46,6 +47,7 @@ struct ovs_action_push_tnl;
/* Offload-capable (HW) netdev information */
struct netdev_hw_info {
bool oor; /* Out of Offload Resources ? */
+ atomic_bool miss_api_supported; /* hw_miss_packet_recover() supported.*/
int offload_count; /* Pending (non-offloaded) flow count */
int pending_count; /* Offloaded flow count */
OVSRCU_TYPE(void *) offload_data; /* Offload metadata. */
diff --git a/lib/netdev.c b/lib/netdev.c
index ce0d4117ac0..c797783782f 100644
--- a/lib/netdev.c
+++ b/lib/netdev.c
@@ -431,6 +431,7 @@ netdev_open(const char *name, const char *type, struct netdev **netdevp)
seq_read(netdev->reconfigure_seq);
ovsrcu_set(&netdev->flow_api, NULL);
netdev->hw_info.oor = false;
+ atomic_init(&netdev->hw_info.miss_api_supported, false);
netdev->node = shash_add(&netdev_shash, name, netdev);
/* By default enable one tx and rx queue per netdev. */
From 77f739914d406665dc17733a6cdd4fff9a80f7a3 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Tue, 27 Sep 2022 17:32:55 +0200
Subject: [PATCH 015/833] ofproto-dpif-xlate: Allow sample when no in_port.
OVN can (and indeed does) set in_port to OFPP_NONE during
the pipeline evaluation. If a sample action follows, it
will be incorrectly skipped.
Per-flow sampling version of:
f0a9000ca ofproto: Fix ipfix not always sampling on egress.
Signed-off-by: Adrian Moreno
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
ofproto/ofproto-dpif-xlate.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 81deb72d91c..5d2af93fa26 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -5699,7 +5699,7 @@ xlate_sample_action(struct xlate_ctx *ctx,
struct dpif_ipfix *ipfix = ctx->xbridge->ipfix;
bool emit_set_tunnel = false;
- if (!ipfix || ctx->xin->flow.in_port.ofp_port == OFPP_NONE) {
+ if (!ipfix) {
return;
}
From f7ae3f93c8511962c0198984004b7c10eb574c9c Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 6 Oct 2022 21:37:24 +0200
Subject: [PATCH 016/833] tests: Fix filtering of whole-second durations.
Current macros are unable to filter whole seconds, e.g. 'duration:6s'.
This is causing random test failures, most frequently in CirrusCI:
./dpif-netdev.at:370: ovs-ofctl -O OpenFlow13 meter-stats br0 | strip_timers
--- -
+++ /tmp/cirrus-ci-build/tests/testsuite.dir/at-groups/990/stdout
@@ -1,5 +1,5 @@
OFPST_METER reply (OF1.3) (xid=0x2):
-meter:1 flow_count:1 packet_in_count:10 byte_in_count:600 duration:0.0s bands:
+meter:1 flow_count:1 packet_in_count:10 byte_in_count:600 duration:6s bands:
Fix sed matches to correctly handle that scenario.
Repeating the [0-9\.] twice because it is hard to write a shorter
portable version with sed.
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
tests/dpif-netdev.at | 10 +++++-----
tests/stp.at | 2 +-
2 files changed, 6 insertions(+), 6 deletions(-)
diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at
index 3179e1645d8..6aff1eda7b0 100644
--- a/tests/dpif-netdev.at
+++ b/tests/dpif-netdev.at
@@ -6,8 +6,8 @@ m4_divert_push([PREPARE_TESTS])
# that vary from one run to another (e.g., timing and bond actions).
strip_timers () {
sed '
- s/duration:[0-9]*\.[0-9]*/duration:0.0/
- s/used:[0-9]*\.[0-9]*/used:0.0/
+ s/duration:[0-9\.][0-9\.]*/duration:0.0/
+ s/used:[0-9\.][0-9\.]*/used:0.0/
'
}
@@ -15,7 +15,7 @@ strip_xout () {
sed '
s/mega_ufid:[-0-9a-f]* //
s/ufid:[-0-9a-f]* //
- s/used:[0-9]*\.[0-9]*/used:0.0/
+ s/used:[0-9\.][0-9\.]*/used:0.0/
s/actions:.*/actions: /
s/packets:[0-9]*/packets:0/
s/bytes:[0-9]*/bytes:0/
@@ -26,7 +26,7 @@ strip_xout_keep_actions () {
sed '
s/mega_ufid:[-0-9a-f]* //
s/ufid:[-0-9a-f]* //
- s/used:[0-9]*\.[0-9]*/used:0.0/
+ s/used:[0-9\.][0-9\.]*/used:0.0/
s/packets:[0-9]*/packets:0/
s/bytes:[0-9]*/bytes:0/
' | sort
@@ -51,7 +51,7 @@ filter_hw_packet_netdev_dummy () {
filter_flow_dump () {
grep 'flow_dump ' | sed '
s/.*flow_dump //
- s/used:[0-9]*\.[0-9]*/used:0.0/
+ s/used:[0-9\.][0-9\.]*/used:0.0/
' | sort | uniq
}
diff --git a/tests/stp.at b/tests/stp.at
index 7ddacfc3a0e..69475843e55 100644
--- a/tests/stp.at
+++ b/tests/stp.at
@@ -368,7 +368,7 @@ AT_CLEANUP
# Strips out uninteresting parts of flow output, as well as parts
# that vary from one run to another (e.g., timing and bond actions).
m4_define([STRIP_USED], [[sed '
- s/used:[0-9]*\.[0-9]*/used:0.0/
+ s/used:[0-9\.][0-9\.]*/used:0.0/
s/duration=[0-9.]*s*/duration=Xs/
s/idle_age=[0-9]*,/idle_age=X,/
']])
From 9c27bd230f7f108974157d858e71b3eda2139d08 Mon Sep 17 00:00:00 2001
From: Paolo Valerio
Date: Wed, 12 Oct 2022 16:36:13 +0200
Subject: [PATCH 017/833] ct-dpif: Replace ct_dpif_format_flags() with
format_flags_masked().
This patch removes ct_dpif_format_flags() in favor of the existing
format_flags_masked().
This has the extra bonus of showing keys with empty values as "key=0",
instead of showing "key=".
E.g., the following:
NEW tcp,orig=([...]),reply=([...]),id=1800618864,
status=CONFIRMED|SRC_NAT_DONE|DST_NAT_DONE,timeout=120,
protoinfo=(state_orig=SYN_SENT,state_reply=SYN_SENT,wscale_orig=7,
wscale_reply=0,flags_orig=WINDOW_SCALE|SACK_PERM,flags_reply=)
becomes:
NEW tcp,orig=([...]),reply=([...]),id=1800618864,
status=CONFIRMED|SRC_NAT_DONE|DST_NAT_DONE,timeout=120,
protoinfo=(state_orig=SYN_SENT,state_reply=SYN_SENT,wscale_orig=7,
wscale_reply=0,flags_orig=WINDOW_SCALE|SACK_PERM,flags_reply=0)
Signed-off-by: Paolo Valerio
Signed-off-by: Ilya Maximets
---
lib/ct-dpif.c | 76 ++++++++++++++++++++++++++-------------------------
lib/ct-dpif.h | 4 +++
2 files changed, 43 insertions(+), 37 deletions(-)
diff --git a/lib/ct-dpif.c b/lib/ct-dpif.c
index cfc2315e3dc..6f17a26b5f4 100644
--- a/lib/ct-dpif.c
+++ b/lib/ct-dpif.c
@@ -35,20 +35,11 @@ static void ct_dpif_format_counters(struct ds *,
const struct ct_dpif_counters *);
static void ct_dpif_format_timestamp(struct ds *,
const struct ct_dpif_timestamp *);
-static void ct_dpif_format_flags(struct ds *, const char *title,
- uint32_t flags, const struct flags *);
static void ct_dpif_format_protoinfo(struct ds *, const char *title,
const struct ct_dpif_protoinfo *,
bool verbose);
static void ct_dpif_format_helper(struct ds *, const char *title,
const struct ct_dpif_helper *);
-
-static const struct flags ct_dpif_status_flags[] = {
-#define CT_DPIF_STATUS_FLAG(FLAG) { CT_DPIF_STATUS_##FLAG, #FLAG },
- CT_DPIF_STATUS_FLAGS
-#undef CT_DPIF_STATUS_FLAG
- { 0, NULL } /* End marker. */
-};
/* Dumping */
@@ -275,6 +266,20 @@ ct_dpif_entry_uninit(struct ct_dpif_entry *entry)
}
}
+static const char *
+ct_dpif_status_flags(uint32_t flags)
+{
+ switch (flags) {
+#define CT_DPIF_STATUS_FLAG(FLAG) \
+ case CT_DPIF_STATUS_##FLAG: \
+ return #FLAG;
+ CT_DPIF_STATUS_FLAGS
+#undef CT_DPIF_TCP_FLAG
+ default:
+ return NULL;
+ }
+}
+
void
ct_dpif_format_entry(const struct ct_dpif_entry *entry, struct ds *ds,
bool verbose, bool print_stats)
@@ -305,8 +310,9 @@ ct_dpif_format_entry(const struct ct_dpif_entry *entry, struct ds *ds,
ds_put_format(ds, ",zone=%"PRIu16, entry->zone);
}
if (verbose) {
- ct_dpif_format_flags(ds, ",status=", entry->status,
- ct_dpif_status_flags);
+ format_flags_masked(ds, ",status", ct_dpif_status_flags,
+ entry->status, CT_DPIF_STATUS_MASK,
+ CT_DPIF_STATUS_MASK);
}
if (print_stats) {
ds_put_format(ds, ",timeout=%"PRIu32, entry->timeout);
@@ -415,28 +421,6 @@ ct_dpif_format_tuple(struct ds *ds, const struct ct_dpif_tuple *tuple)
}
}
-static void
-ct_dpif_format_flags(struct ds *ds, const char *title, uint32_t flags,
- const struct flags *table)
-{
- if (title) {
- ds_put_cstr(ds, title);
- }
- for (; table->name; table++) {
- if (flags & table->flag) {
- ds_put_format(ds, "%s|", table->name);
- }
- }
- ds_chomp(ds, '|');
-}
-
-static const struct flags tcp_flags[] = {
-#define CT_DPIF_TCP_FLAG(FLAG) { CT_DPIF_TCPF_##FLAG, #FLAG },
- CT_DPIF_TCP_FLAGS
-#undef CT_DPIF_TCP_FLAG
- { 0, NULL } /* End marker. */
-};
-
const char *ct_dpif_tcp_state_string[] = {
#define CT_DPIF_TCP_STATE(STATE) [CT_DPIF_TCPS_##STATE] = #STATE,
CT_DPIF_TCP_STATES
@@ -498,6 +482,20 @@ ct_dpif_format_protoinfo_tcp(struct ds *ds,
ct_dpif_format_enum(ds, "state=", tcp_state, ct_dpif_tcp_state_string);
}
+static const char *
+ct_dpif_tcp_flags(uint32_t flags)
+{
+ switch (flags) {
+#define CT_DPIF_TCP_FLAG(FLAG) \
+ case CT_DPIF_TCPF_##FLAG: \
+ return #FLAG;
+ CT_DPIF_TCP_FLAGS
+#undef CT_DPIF_TCP_FLAG
+ default:
+ return NULL;
+ }
+}
+
static void
ct_dpif_format_protoinfo_tcp_verbose(struct ds *ds,
const struct ct_dpif_protoinfo *protoinfo)
@@ -512,10 +510,14 @@ ct_dpif_format_protoinfo_tcp_verbose(struct ds *ds,
protoinfo->tcp.wscale_orig,
protoinfo->tcp.wscale_reply);
}
- ct_dpif_format_flags(ds, ",flags_orig=", protoinfo->tcp.flags_orig,
- tcp_flags);
- ct_dpif_format_flags(ds, ",flags_reply=", protoinfo->tcp.flags_reply,
- tcp_flags);
+
+ format_flags_masked(ds, ",flags_orig", ct_dpif_tcp_flags,
+ protoinfo->tcp.flags_orig, CT_DPIF_TCPF_MASK,
+ CT_DPIF_TCPF_MASK);
+
+ format_flags_masked(ds, ",flags_reply", ct_dpif_tcp_flags,
+ protoinfo->tcp.flags_reply, CT_DPIF_TCPF_MASK,
+ CT_DPIF_TCPF_MASK);
}
static void
diff --git a/lib/ct-dpif.h b/lib/ct-dpif.h
index b59cba962a7..2848549b0ba 100644
--- a/lib/ct-dpif.h
+++ b/lib/ct-dpif.h
@@ -103,6 +103,8 @@ enum ct_dpif_tcp_flags {
#undef CT_DPIF_TCP_FLAG
};
+#define CT_DPIF_TCPF_MASK ((CT_DPIF_TCPF_MAXACK_SET << 1) - 1)
+
extern const char *ct_dpif_sctp_state_string[];
#define CT_DPIF_SCTP_STATES \
@@ -173,6 +175,8 @@ enum ct_dpif_status_flags {
#undef CT_DPIF_STATUS_FLAG
};
+#define CT_DPIF_STATUS_MASK ((CT_DPIF_STATUS_UNTRACKED << 1) - 1)
+
struct ct_dpif_entry {
/* Const members. */
struct ct_dpif_tuple tuple_orig;
From ba9e387dc4f4acd8dd7ff9188296a4442d16576c Mon Sep 17 00:00:00 2001
From: Wilson Peng
Date: Tue, 25 Oct 2022 15:37:48 +0800
Subject: [PATCH 018/833] unaligned: Correct the stats of packet_count and
byte_count on Windows.
The stats(byte_count) is got via function call
ofputil_decode_flow_stats_reply() and for OpenFlow15 it will also call
oxs_pull_entry__(). Currently we found on Windows the byte_count
counter is incorrect. It will get the byte_count on OpenFlow15
handling via ntohll(get_unaligned_be64(payload))
Quote the comments below from Ilya Maximets (thanks for the given
soluton and explanation):
static inline uint64_t get_unaligned_u64__(const uint64_t *p_)
...
return ntohll(((uint64_t) p[0] << 56)
| ((uint64_t) p[1] << 48)
| ((uint64_t) p[2] << 40)
| ((uint64_t) p[3] << 32)
| (p[4] << 24)
| (p[5] << 16)
| (p[6] << 8)
| p[7]);
And indeed the expression above has an issue with data types.
The problem is the (p[4] << 24) part. The p[4] itself has a type
'uint8_t' which is unsigned 8bit value. It is not enough to hold
the result of a left shift, so compiler automatically promotes it
to the 'int' by default. But it is *signed* 32bit value.
In your original report p[4] was equal to 0x81. After the left
shift it became 0x81000000. Looks correct, but the type is 'int'.
The next operation that we do is '|' with the previous shifted
bytes that were explicitly converted to uint64_t before the left
shift. So we have uint64_t | int. In this case compiler needs
to extend the 'int' to 'unit64_t' before performing the operation.
And since the 'int' is signed and the sign bit happens to be set
in the 0x81000000, the sign extension is performed in order to
preserve the value. The result is 0xffffffff81000000. And that
is breaking everything else.
From the new test below, it is incorrect for the n_bytes counter via
OpenFlow15 on CMD: ovs-ofctl dump-flows.
With the patch, get_unaligned_u64__() will return correct value to
caller on Windows.
In the output (Got via original CMD without fix) below n_bytes
2177130813 will be incorrectly changed to 18446744071591715133 when
processing OpenFlow15 which is equal to 0xFFFFFFFF81C4613D and here
the p[4] on Windows is 0x81.
With the fix, new compiled ovs-ofctl1025.exe could dump the correct
n_bytes counter Via OpenFlow15.
ovs-ofctl.exe -O OpenFlow15 dump-flows nsx-managed | findstr 1516011
cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=18446744071591715133,
cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=18446744071591715133,
ovs-ofctl.exe -O OpenFlow10 dump-flows nsx-managed | findstr 1516011
cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813,
cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813,
ovs-ofctl.exe dump-flows nsx-managed | findstr 1516011
cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813,
cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813,
With the fix, new compiled ovs-ofctl1025.exe could dump the correct
n_bytes counter Via OpenFlow15.
ovs-ofctl1025.exe -O OpenFlow15 dump-flows nsx-managed | findstr 1516011
cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813,
cookie=<>, duration=<>s, table=4, n_packets=1516011, n_bytes=2177130813,
Fixes: afa3a93165f1 ("Add header for access to potentially unaligned data.")
Signed-off-by: Wilson Peng
Signed-off-by: Ilya Maximets
---
lib/unaligned.h | 13 +++++++------
1 file changed, 7 insertions(+), 6 deletions(-)
diff --git a/lib/unaligned.h b/lib/unaligned.h
index f40e4e10df6..15334e3c764 100644
--- a/lib/unaligned.h
+++ b/lib/unaligned.h
@@ -95,7 +95,7 @@ GCC_UNALIGNED_ACCESSORS(ovs_be64, be64);
static inline uint16_t get_unaligned_u16(const uint16_t *p_)
{
const uint8_t *p = (const uint8_t *) p_;
- return ntohs((p[0] << 8) | p[1]);
+ return ntohs(((uint16_t) p[0] << 8) | (uint16_t) p[1]);
}
static inline void put_unaligned_u16(uint16_t *p_, uint16_t x_)
@@ -110,7 +110,8 @@ static inline void put_unaligned_u16(uint16_t *p_, uint16_t x_)
static inline uint32_t get_unaligned_u32(const uint32_t *p_)
{
const uint8_t *p = (const uint8_t *) p_;
- return ntohl((p[0] << 24) | (p[1] << 16) | (p[2] << 8) | p[3]);
+ return ntohl(((uint32_t) p[0] << 24) | ((uint32_t) p[1] << 16) |
+ ((uint32_t) p[2] << 8) | (uint32_t) p[3]);
}
static inline void put_unaligned_u32(uint32_t *p_, uint32_t x_)
@@ -131,10 +132,10 @@ static inline uint64_t get_unaligned_u64__(const uint64_t *p_)
| ((uint64_t) p[1] << 48)
| ((uint64_t) p[2] << 40)
| ((uint64_t) p[3] << 32)
- | (p[4] << 24)
- | (p[5] << 16)
- | (p[6] << 8)
- | p[7]);
+ | ((uint64_t) p[4] << 24)
+ | ((uint64_t) p[5] << 16)
+ | ((uint64_t) p[6] << 8)
+ | (uint64_t) p[7]);
}
static inline void put_unaligned_u64__(uint64_t *p_, uint64_t x_)
From 850e639021125c3646effa0eae9e422082ade2ca Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Tue, 25 Oct 2022 23:57:40 +0200
Subject: [PATCH 019/833] AUTHORS: Add Wilson Peng.
Signed-off-by: Ilya Maximets
---
AUTHORS.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/AUTHORS.rst b/AUTHORS.rst
index c13cf60c5e8..145387ce94f 100644
--- a/AUTHORS.rst
+++ b/AUTHORS.rst
@@ -460,6 +460,7 @@ Wei Yongjun yjwei@cn.fujitsu.com
Wenyu Zhang wenyuz@vmware.com
William Fulton
William Tu u9012063@gmail.com
+Wilson Peng pweisong@vmware.com
Xavier Simonart xsimonar@redhat.com
Xiao Liang shaw.leon@gmail.com
xu rong xu.rong@zte.com.cn
From 7a5ee32518dfd55dc207abaf92e1ae1c25b857cc Mon Sep 17 00:00:00 2001
From: Roi Dayan
Date: Sun, 23 Oct 2022 09:27:10 +0300
Subject: [PATCH 020/833] tc: On last action use drop action attribute instead
of pipe
OVN is setting ct drop rule with a ct clear action.
OVS datapath behavior is if there is no forward action
the default is drop.
TC behavior is to continue with next match.
Fix to match tc to ovs behavior by setting last action
attribute as drop instead of pipe.
Also update lastused when parsing ct action.
example rule
recirc_id(0x1),in_port(2),ct_state(+trk),eth(),eth_type(0x0800),ipv4(frag=no),
packets:82, bytes:8036, used:2.108s, actions:ct_clear
Reviewed-by: Maor Dickman
Signed-off-by: Roi Dayan
Signed-off-by: Simon Horman
---
lib/tc.c | 15 ++++++++++++++-
1 file changed, 14 insertions(+), 1 deletion(-)
diff --git a/lib/tc.c b/lib/tc.c
index 94044cde606..f8419e637b9 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -1541,6 +1541,9 @@ static const struct nl_policy ct_policy[] = {
.optional = true, },
[TCA_CT_NAT_PORT_MAX] = { .type = NL_A_U16,
.optional = true, },
+ [TCA_CT_TM] = { .type = NL_A_UNSPEC,
+ .min_len = sizeof(struct tcf_t),
+ .optional = true, },
};
static int
@@ -1551,6 +1554,7 @@ nl_parse_act_ct(struct nlattr *options, struct tc_flower *flower)
struct tc_action *action;
const struct tc_ct *ct;
uint16_t ct_action = 0;
+ struct tcf_t tm;
if (!nl_parse_nested(options, ct_policy, ct_attrs,
ARRAY_SIZE(ct_policy))) {
@@ -1636,6 +1640,11 @@ nl_parse_act_ct(struct nlattr *options, struct tc_flower *flower)
}
action->type = TC_ACT_CT;
+ if (ct_attrs[TCA_CT_TM]) {
+ memcpy(&tm, nl_attr_get_unspec(ct_attrs[TCA_CT_TM], sizeof tm),
+ sizeof tm);
+ nl_parse_tcf(&tm, flower);
+ }
nl_parse_action_pc(ct->action, action);
return 0;
}
@@ -3126,7 +3135,11 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct tc_flower *flower)
uint32_t action_pc; /* Programmatic Control */
if (!action->jump_action) {
- action_pc = TC_ACT_PIPE;
+ if (i == flower->action_count - 1) {
+ action_pc = TC_ACT_SHOT;
+ } else {
+ action_pc = TC_ACT_PIPE;
+ }
} else if (action->jump_action == JUMP_ACTION_STOP) {
action_pc = TC_ACT_STOLEN;
} else {
From 743499607bdd0dcb3541a179ba2bb41ea10c4b3b Mon Sep 17 00:00:00 2001
From: Tianyu Yuan
Date: Wed, 12 Oct 2022 08:42:28 +0800
Subject: [PATCH 021/833] Revert "tc: Fix stats dump when using same meter
table"
This reverts commit dd9881ed55e6 ('tc: Fix stats dump when
using same meter table')
This patch doesn't solve the tc flow stats update issue and
will lead to failure of system-offloads-traffic testsuite, it
only counts packets surviving after the tc filter, rather than
hitting the filter
A following patch will come up to solve this flow stats update
issue
Signed-off-by: Tianyu Yuan
Acked-by: Ilya Maximets
Signed-off-by: Simon Horman
---
lib/tc.c | 11 -----------
1 file changed, 11 deletions(-)
diff --git a/lib/tc.c b/lib/tc.c
index f8419e637b9..3b591975b12 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -1913,8 +1913,6 @@ nl_parse_single_action(struct nlattr *action, struct tc_flower *flower,
struct nlattr *act_cookie;
const char *act_kind;
struct nlattr *action_attrs[ARRAY_SIZE(act_policy)];
- int act_index = flower->action_count;
- bool is_meter = false;
int err = 0;
if (!nl_parse_nested(action, act_policy, action_attrs,
@@ -1952,7 +1950,6 @@ nl_parse_single_action(struct nlattr *action, struct tc_flower *flower,
nl_parse_act_ct(act_options, flower);
} else if (!strcmp(act_kind, "police")) {
nl_parse_act_police(act_options, flower);
- is_meter = tc_is_meter_index(flower->actions[act_index].police.index);
} else {
VLOG_ERR_RL(&error_rl, "unknown tc action kind: %s", act_kind);
err = EINVAL;
@@ -1967,14 +1964,6 @@ nl_parse_single_action(struct nlattr *action, struct tc_flower *flower,
flower->act_cookie.len = nl_attr_get_size(act_cookie);
}
- /* Skip the stats update when act_police is meter since there are always
- * some other actions following meter. For other potential kinds of
- * act_police actions, whose stats could not be skipped (e.g. filter has
- * only one police action), update the action stats to the flow rule. */
- if (is_meter) {
- return 0;
- }
-
return nl_parse_action_stats(action_attrs[TCA_ACT_STATS],
&flower->stats_sw, &flower->stats_hw, NULL);
}
From ffcb6f115fe5e00be3ca8fb9a940a3224e687e23 Mon Sep 17 00:00:00 2001
From: Baowen Zheng
Date: Fri, 30 Sep 2022 14:07:56 +0800
Subject: [PATCH 022/833] netdev-linux: Allow meter to work in tc software
datapath when tc-policy is specified
Add tc action flags when adding police action to offload meter table.
There is a restriction that the flag of skip_sw/skip_hw should be same for
filter rule and the independent created tc actions the rule uses. In this
case, if we configure the tc-policy as skip_hw, filter rule will be created
with skip_hw flag and the police action according to meter table will have
no action flag, then flower rule will fail to add to tc kernel system.
To fix this issue, we will add tc action flag when adding police action to
offload a meter table, so it will allow meter table to work in tc software
datapath.
Fixes: 5c039ddc64ff ("netdev-linux: Add functions to manipulate tc police action")
Signed-off-by: Baowen Zheng
Acked-by: Ilya Maximets
Signed-off-by: Simon Horman
---
acinclude.m4 | 6 +++---
include/linux/pkt_cls.h | 11 +++++++----
lib/netdev-linux.c | 20 ++++++++++++++------
lib/tc.c | 21 +++++++++++++++++++++
lib/tc.h | 2 ++
5 files changed, 47 insertions(+), 13 deletions(-)
diff --git a/acinclude.m4 b/acinclude.m4
index ad07989ac29..aa9af55062f 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -163,10 +163,10 @@ dnl Configure Linux tc compat.
AC_DEFUN([OVS_CHECK_LINUX_TC], [
AC_COMPILE_IFELSE([
AC_LANG_PROGRAM([#include ], [
- int x = TCA_POLICE_PKTRATE64;
+ int x = TCA_ACT_FLAGS_SKIP_HW;
])],
- [AC_DEFINE([HAVE_TCA_POLICE_PKTRATE64], [1],
- [Define to 1 if TCA_POLICE_PKTRATE64 is available.])])
+ [AC_DEFINE([HAVE_TCA_ACT_FLAGS_SKIP_HW], [1],
+ [Define to 1 if TCA_ACT_FLAGS_SKIP_HW is available.])])
AC_CHECK_MEMBERS([struct tcf_t.firstuse], [], [], [#include ])
diff --git a/include/linux/pkt_cls.h b/include/linux/pkt_cls.h
index ba82e690eba..a8cd8db5bf8 100644
--- a/include/linux/pkt_cls.h
+++ b/include/linux/pkt_cls.h
@@ -1,7 +1,7 @@
#ifndef __LINUX_PKT_CLS_WRAPPER_H
#define __LINUX_PKT_CLS_WRAPPER_H 1
-#if defined(__KERNEL__) || defined(HAVE_TCA_POLICE_PKTRATE64)
+#if defined(__KERNEL__) || defined(HAVE_TCA_ACT_FLAGS_SKIP_HW)
#include_next
#else
@@ -21,9 +21,12 @@ enum {
__TCA_ACT_MAX
};
-#define TCA_ACT_FLAGS_NO_PERCPU_STATS 1 /* Don't use percpu allocator for
- * actions stats.
- */
+/* See other TCA_ACT_FLAGS_ * flags in include/net/act_api.h. */
+#define TCA_ACT_FLAGS_NO_PERCPU_STATS (1 << 0) /* Don't use percpu allocator for
+ * actions stats.
+ */
+#define TCA_ACT_FLAGS_SKIP_HW (1 << 1) /* don't offload action to HW */
+#define TCA_ACT_FLAGS_SKIP_SW (1 << 2) /* don't use action in SW */
#define TCA_ACT_MAX __TCA_ACT_MAX
#define TCA_OLD_COMPAT (TCA_ACT_MAX+1)
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index cdc66246ced..7ea4070c23a 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2623,10 +2623,17 @@ tc_matchall_fill_police(uint32_t kbits_rate, uint32_t kbits_burst)
static void
nl_msg_act_police_start_nest(struct ofpbuf *request, uint32_t prio,
- size_t *offset, size_t *act_offset)
+ size_t *offset, size_t *act_offset,
+ bool single_action)
{
*act_offset = nl_msg_start_nested(request, prio);
nl_msg_put_string(request, TCA_ACT_KIND, "police");
+
+ /* If police action is added independently from filter, we need to
+ * add action flag according to tc-policy. */
+ if (single_action) {
+ nl_msg_put_act_tc_policy_flag(request);
+ }
*offset = nl_msg_start_nested(request, TCA_ACT_OPTIONS);
}
@@ -2642,7 +2649,7 @@ nl_msg_act_police_end_nest(struct ofpbuf *request, size_t offset,
static void
nl_msg_put_act_police(struct ofpbuf *request, struct tc_police *police,
uint64_t pkts_rate, uint64_t pkts_burst,
- uint32_t notexceed_act)
+ uint32_t notexceed_act, bool single_action)
{
size_t offset, act_offset;
uint32_t prio = 0;
@@ -2651,7 +2658,8 @@ nl_msg_put_act_police(struct ofpbuf *request, struct tc_police *police,
return;
}
- nl_msg_act_police_start_nest(request, ++prio, &offset, &act_offset);
+ nl_msg_act_police_start_nest(request, ++prio, &offset, &act_offset,
+ single_action);
if (police->rate.rate) {
tc_put_rtab(request, TCA_POLICE_RATE, &police->rate);
}
@@ -2698,7 +2706,7 @@ tc_add_matchall_policer(struct netdev *netdev, uint32_t kbits_rate,
basic_offset = nl_msg_start_nested(&request, TCA_OPTIONS);
action_offset = nl_msg_start_nested(&request, TCA_MATCHALL_ACT);
nl_msg_put_act_police(&request, &pol_act, kpkts_rate * 1000,
- kpkts_burst * 1000, TC_ACT_UNSPEC);
+ kpkts_burst * 1000, TC_ACT_UNSPEC, false);
nl_msg_end_nested(&request, action_offset);
nl_msg_end_nested(&request, basic_offset);
@@ -5667,7 +5675,7 @@ tc_add_policer(struct netdev *netdev, uint32_t kbits_rate,
police_offset = nl_msg_start_nested(&request, TCA_BASIC_ACT);
tc_policer_init(&tc_police, kbits_rate, kbits_burst);
nl_msg_put_act_police(&request, &tc_police, kpkts_rate * 1000ULL,
- kpkts_burst * 1000ULL, TC_ACT_UNSPEC);
+ kpkts_burst * 1000ULL, TC_ACT_UNSPEC, false);
nl_msg_end_nested(&request, police_offset);
nl_msg_end_nested(&request, basic_offset);
@@ -5702,7 +5710,7 @@ tc_add_policer_action(uint32_t index, uint32_t kbits_rate,
offset = nl_msg_start_nested(&request, TCA_ACT_TAB);
nl_msg_put_act_police(&request, &tc_police, pkts_rate, pkts_burst,
- TC_ACT_PIPE);
+ TC_ACT_PIPE, true);
nl_msg_end_nested(&request, offset);
error = tc_transact(&request, NULL);
diff --git a/lib/tc.c b/lib/tc.c
index 3b591975b12..4d7de8adde4 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -3810,3 +3810,24 @@ tc_set_policy(const char *policy)
VLOG_INFO("tc: Using policy '%s'", policy);
}
+
+void
+nl_msg_put_act_tc_policy_flag(struct ofpbuf *request)
+{
+ int flag = 0;
+
+ if (!request) {
+ return;
+ }
+
+ if (tc_policy == TC_POLICY_SKIP_HW) {
+ flag = TCA_ACT_FLAGS_SKIP_HW;
+ } else if (tc_policy == TC_POLICY_SKIP_SW) {
+ flag = TCA_ACT_FLAGS_SKIP_SW;
+ }
+
+ if (flag) {
+ struct nla_bitfield32 flags = { flag, flag };
+ nl_msg_put_unspec(request, TCA_ACT_FLAGS, &flags, sizeof flags);
+ }
+}
diff --git a/lib/tc.h b/lib/tc.h
index 2e64ad37259..161f438124b 100644
--- a/lib/tc.h
+++ b/lib/tc.h
@@ -399,4 +399,6 @@ int tc_parse_action_stats(struct nlattr *action,
int tc_dump_tc_action_start(char *name, struct nl_dump *dump);
int parse_netlink_to_tc_policer(struct ofpbuf *reply, uint32_t police_idx[]);
+void nl_msg_put_act_tc_policy_flag(struct ofpbuf *request);
+
#endif /* tc.h */
From 97873af3734a9300f5eb29f664513edc839cf88a Mon Sep 17 00:00:00 2001
From: Robin Jarry
Date: Wed, 27 Apr 2022 10:15:25 +0200
Subject: [PATCH 023/833] Documentation: Use new syntax for dpdk port
representors.
Since DPDK 21.05, the representor identifier now handles a relative VF
offset. The legacy representor ID seems only valid in certain cases
(first dpdk port).
Link: https://github.com/DPDK/dpdk/commit/cebf7f17159a8
Signed-off-by: Robin Jarry
Signed-off-by: Ilya Maximets
---
Documentation/topics/dpdk/phy.rst | 12 ++++++------
lib/netdev-dpdk.c | 2 +-
2 files changed, 7 insertions(+), 7 deletions(-)
diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
index 937f4c40e5a..8fc34a378cb 100644
--- a/Documentation/topics/dpdk/phy.rst
+++ b/Documentation/topics/dpdk/phy.rst
@@ -267,7 +267,7 @@ Representors are multi devices created on top of one PF.
For more information, refer to the `DPDK documentation`__.
-__ https://doc.dpdk.org/guides-21.11/prog_guide/switch_representation.html
+__ https://doc.dpdk.org/guides-21.11/prog_guide/switch_representation.html#port-representors
Prior to port representors there was a one-to-one relationship between the PF
and the eth device. With port representors the relationship becomes one PF to
@@ -287,18 +287,18 @@ address in devargs. For an existing bridge called ``br0`` and PCI address
When configuring a VF-based port, DPDK uses an extended devargs syntax which
has the following format::
- BDBF,representor=[]
+ BDBF,representor=
This syntax shows that a representor is an enumerated eth device (with
-a representor ID) which uses the PF PCI address.
-The following commands add representors 3 and 5 using PCI device address
+a representor identifier) which uses the PF PCI address.
+The following commands add representors of VF 3 and 5 using PCI device address
``0000:08:00.0``::
$ ovs-vsctl add-port br0 dpdk-rep3 -- set Interface dpdk-rep3 type=dpdk \
- options:dpdk-devargs=0000:08:00.0,representor=[3]
+ options:dpdk-devargs=0000:08:00.0,representor=vf3
$ ovs-vsctl add-port br0 dpdk-rep5 -- set Interface dpdk-rep5 type=dpdk \
- options:dpdk-devargs=0000:08:00.0,representor=[5]
+ options:dpdk-devargs=0000:08:00.0,representor=vf5
.. important::
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 0dd655507b5..d2eeb22ae37 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -1823,7 +1823,7 @@ static dpdk_port_t netdev_dpdk_get_port_by_devargs(const char *devargs)
}
/*
- * Normally, a PCI id (optionally followed by a representor number)
+ * Normally, a PCI id (optionally followed by a representor identifier)
* is enough for identifying a specific DPDK port.
* However, for some NICs having multiple ports sharing the same PCI
* id, using PCI id won't work then.
From 2db297ea37f4d6ec4bddb2a2db540339fa5af1df Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Wed, 2 Nov 2022 16:47:47 +0100
Subject: [PATCH 024/833] AUTHORS: Add Robin Jarry.
Signed-off-by: Ilya Maximets
---
AUTHORS.rst | 5 +++--
1 file changed, 3 insertions(+), 2 deletions(-)
diff --git a/AUTHORS.rst b/AUTHORS.rst
index 145387ce94f..f62840b1b36 100644
--- a/AUTHORS.rst
+++ b/AUTHORS.rst
@@ -365,9 +365,10 @@ Rich Lane rlane@bigswitch.com
Richard Oliver richard@richard-oliver.co.uk
Rishi Bamba rishi.bamba@tcs.com
Rob Adams readams@readams.net
-Robert Åkerblom-Andersson Robert.nr1@gmail.com
-Robert Wojciechowicz robertx.wojciechowicz@intel.com
Rob Hoes rob.hoes@citrix.com
+Robert Wojciechowicz robertx.wojciechowicz@intel.com
+Robert Åkerblom-Andersson Robert.nr1@gmail.com
+Robin Jarry rjarry@redhat.com
Rohith Basavaraja rohith.basavaraja@gmail.com
Roi Dayan roid@nvidia.com
Róbert Mulik robert.mulik@ericsson.com
From c98762d91b578b5d8290077af4de0b6e3d95c3ce Mon Sep 17 00:00:00 2001
From: Robin Jarry
Date: Thu, 1 Sep 2022 12:16:02 +0200
Subject: [PATCH 025/833] netdev-dpdk: Fix tx_dropped counters value.
Packets that could not be transmitted because the TXQ are full should be
taken into account in the global ovs_tx_failure_drops as it was the case
before commit 29b94e12d57d ("netdev-dpdk: Refactor the DPDK transmit
path.").
netdev_dpdk_eth_tx_burst() returns the number of packets that were *not*
transmitted. Add that number to stats.tx_failure_drops and only include
the packets that were dropped in previous steps afterwards.
Fixes: 29b94e12d57d ("netdev-dpdk: Refactor the DPDK transmit path.")
Acked-by: Mike Pattrick
Signed-off-by: Robin Jarry
Signed-off-by: Ilya Maximets
---
lib/netdev-dpdk.c | 6 +++---
1 file changed, 3 insertions(+), 3 deletions(-)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index d2eeb22ae37..e4b3465e09b 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -2882,9 +2882,9 @@ netdev_dpdk_eth_send(struct netdev *netdev, int qid,
cnt = netdev_dpdk_common_send(netdev, batch, &stats);
- dropped = batch_cnt - cnt;
-
- dropped += netdev_dpdk_eth_tx_burst(dev, qid, pkts, cnt);
+ dropped = netdev_dpdk_eth_tx_burst(dev, qid, pkts, cnt);
+ stats.tx_failure_drops += dropped;
+ dropped += batch_cnt - cnt;
if (OVS_UNLIKELY(dropped)) {
struct netdev_dpdk_sw_stats *sw_stats = dev->sw_stats;
From eb86c28ddcdb7922974def08749076c8bf2c5635 Mon Sep 17 00:00:00 2001
From: Daniel Ding
Date: Tue, 13 Sep 2022 23:36:11 +0800
Subject: [PATCH 026/833] ovs-tcpdump: Cleanup mirror port on SIGHUP/SIGTERM.
If ovs-tcpdump received HUP or TERM signal, mirror and mirror
interface should be destroyed. This often happens, when
controlling terminal is closed, like ssh session closed, and
other users use kill to terminate it.
Acked-by: Mike Pattrick
Signed-off-by: Daniel Ding
Signed-off-by: Ilya Maximets
---
utilities/ovs-tcpdump.in | 40 ++++++++++++++++++++++------------------
1 file changed, 22 insertions(+), 18 deletions(-)
diff --git a/utilities/ovs-tcpdump.in b/utilities/ovs-tcpdump.in
index e12bab88956..a49ec9f9426 100755
--- a/utilities/ovs-tcpdump.in
+++ b/utilities/ovs-tcpdump.in
@@ -44,6 +44,7 @@ try:
from ovs import jsonrpc
from ovs.poller import Poller
from ovs.stream import Stream
+ from ovs.fatal_signal import add_hook
except Exception:
print("ERROR: Please install the correct Open vSwitch python support")
print(" libraries (version @VERSION@).")
@@ -412,6 +413,24 @@ def py_which(executable):
for path in os.environ["PATH"].split(os.pathsep))
+def teardown(db_sock, interface, mirror_interface, tap_created):
+ def cleanup_mirror():
+ try:
+ ovsdb = OVSDB(db_sock)
+ ovsdb.destroy_mirror(interface, ovsdb.port_bridge(interface))
+ ovsdb.destroy_port(mirror_interface, ovsdb.port_bridge(interface))
+ if tap_created is True:
+ _del_taps[sys.platform](mirror_interface)
+ except Exception:
+ print("Unable to tear down the ports and mirrors.")
+ print("Please use ovs-vsctl to remove the ports and mirrors"
+ " created.")
+ print(" ex: ovs-vsctl --db=%s del-port %s" % (db_sock,
+ mirror_interface))
+
+ add_hook(cleanup_mirror, None, True)
+
+
def main():
rundir = os.environ.get('OVS_RUNDIR', '@RUNDIR@')
db_sock = 'unix:%s' % os.path.join(rundir, "db.sock")
@@ -496,6 +515,9 @@ def main():
print("ERROR: Mirror port (%s) exists for port %s." %
(mirror_interface, interface))
sys.exit(1)
+
+ teardown(db_sock, interface, mirror_interface, tap_created)
+
try:
ovsdb.make_port(mirror_interface, ovsdb.port_bridge(interface))
ovsdb.bridge_mirror(interface, mirror_interface,
@@ -503,12 +525,6 @@ def main():
mirror_select_all)
except OVSDBException as oe:
print("ERROR: Unable to properly setup the mirror: %s." % str(oe))
- try:
- ovsdb.destroy_port(mirror_interface, ovsdb.port_bridge(interface))
- if tap_created is True:
- _del_taps[sys.platform](mirror_interface)
- except Exception:
- pass
sys.exit(1)
ovsdb.close_idl()
@@ -525,18 +541,6 @@ def main():
if pipes.poll() is None:
pipes.terminate()
- ovsdb = OVSDB(db_sock)
- ovsdb.destroy_mirror(interface, ovsdb.port_bridge(interface))
- ovsdb.destroy_port(mirror_interface, ovsdb.port_bridge(interface))
- if tap_created is True:
- _del_taps[sys.platform](mirror_interface)
- except Exception:
- print("Unable to tear down the ports and mirrors.")
- print("Please use ovs-vsctl to remove the ports and mirrors created.")
- print(" ex: ovs-vsctl --db=%s del-port %s" % (db_sock,
- mirror_interface))
- sys.exit(1)
-
sys.exit(0)
From 46ab9d80c2ab8f13dfe2ba2a9700887cd4f7fc36 Mon Sep 17 00:00:00 2001
From: yangchang
Date: Fri, 14 Oct 2022 15:29:36 +0800
Subject: [PATCH 027/833] bond: Fix crash while logging not yet enabled member.
The log should be printed with the member name, not the active member
name, and the active member does not judge whether it is NULL. If null,
OVS will crash with the following backtrace:
(gdb) bt
0 bond_check_admissibility (ofproto/bond.c:877)
1 is_admissible (ofproto/ofproto-dpif-xlate.c:2574)
2 xlate_normal (ofproto/ofproto-dpif-xlate.c:3027)
3 xlate_output_action (ofproto/ofproto-dpif-xlate.c:5284)
4 do_xlate_actions (ofproto/ofproto-dpif-xlate.c:6960)
5 xlate_actions (ofproto/ofproto-dpif-xlate.c:7924)
6 upcall_xlate (ofproto/ofproto-dpif-upcall.c:1237)
7 process_upcall (ofproto/ofproto-dpif-upcall.c:1456)
8 upcall_cb (ofproto/ofproto-dpif-upcall.c:1358)
9 dp_netdev_upcall (lib/dpif-netdev.c:7793)
10 handle_packet_upcall (lib/dpif-netdev.c:8255)
11 fast_path_processing (lib/dpif-netdev.c:8374)
12 dp_netdev_input__ (lib/dpif-netdev.c:8463)
13 dp_netdev_input (lib/dpif-netdev.c:8501)
14 dp_netdev_process_rxq_port (lib/dpif-netdev.c:5337)
15 pmd_thread_main (lib/dpif-netdev.c:6944)
16 ovsthread_wrapper (lib/ovs-thread.c:422)
17 ?? (/lib64/libpthread.so.0)
18 clone (/lib64/libc.so.6)
Fixes: 423416f58749 ("lacp: report desync in ovs threads enabling slave")
Signed-off-by: yangchang
Signed-off-by: Ilya Maximets
---
ofproto/bond.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/ofproto/bond.c b/ofproto/bond.c
index 47630a6b06a..cfdf44f8542 100644
--- a/ofproto/bond.c
+++ b/ofproto/bond.c
@@ -897,7 +897,7 @@ bond_check_admissibility(struct bond *bond, const void *member_,
if (!member->enabled && member->may_enable) {
VLOG_DBG_RL(&rl, "bond %s: member %s: "
"main thread has not yet enabled member",
- bond->name, bond->active_member->name);
+ bond->name, member->name);
}
goto out;
case LACP_CONFIGURED:
From 2158254fcbd97620151525a8aa91b0a040927690 Mon Sep 17 00:00:00 2001
From: Eelco Chaudron
Date: Tue, 18 Oct 2022 15:27:52 +0200
Subject: [PATCH 028/833] utilities: Add a GDB macro to dump any cmap
structure.
Add a new GDB macro called ovs_dump_cmap, which can be used to dump any
cmap structure. Some examples:
(gdb) ovs_dump_cmap &subtable->rules
(struct cmap *) 0x3e02758
(gdb) ovs_dump_cmap &subtable->rules "struct dpcls_rule" cmap_node
(struct dpcls_rule *) 0x3e02758
(gdb) ovs_dump_cmap &subtable->rules "struct dpcls_rule" cmap_node dump
(struct dpcls_rule *) 0x3e02758 =
{cmap_node = {next = {p = 0x0}}, mask = 0x3dfe100, flow = {hash = ...
Signed-off-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
utilities/gdb/ovs_gdb.py | 66 ++++++++++++++++++++++++++++++++++++++++
1 file changed, 66 insertions(+)
diff --git a/utilities/gdb/ovs_gdb.py b/utilities/gdb/ovs_gdb.py
index 763ece2a78d..7f63dd0d592 100644
--- a/utilities/gdb/ovs_gdb.py
+++ b/utilities/gdb/ovs_gdb.py
@@ -849,6 +849,71 @@ def invoke(self, arg, from_tty):
member).dereference()))
+#
+# Implements the GDB "ovs_dump_cmap" command
+#
+class CmdDumpCmap(gdb.Command):
+ """Dump all nodes of a given cmap
+ Usage:
+ ovs_dump_cmap {[] [] {dump}]}
+
+ For example dump all the rules in a dpcls_subtable:
+
+ (gdb) ovs_dump_cmap &subtable->rules
+ (struct cmap *) 0x3e02758
+
+ This is not very useful, so please use this with the container_of mode:
+
+ (gdb) ovs_dump_cmap &subtable->rules "struct dpcls_rule" cmap_node
+ (struct dpcls_rule *) 0x3e02758
+
+ Now you can manually use the print command to show the content, or use the
+ dump option to dump the structure for all nodes:
+
+ (gdb) ovs_dump_cmap &subtable->rules "struct dpcls_rule" cmap_node dump
+ (struct dpcls_rule *) 0x3e02758 =
+ {cmap_node = {next = {p = 0x0}}, mask = 0x3dfe100, flow = {hash = ...
+ """
+ def __init__(self):
+ super(CmdDumpCmap, self).__init__("ovs_dump_cmap",
+ gdb.COMMAND_DATA)
+
+ def invoke(self, arg, from_tty):
+ arg_list = gdb.string_to_argv(arg)
+ typeobj = None
+ member = None
+ dump = False
+
+ if len(arg_list) != 1 and len(arg_list) != 3 and len(arg_list) != 4:
+ print("usage: ovs_dump_cmap "
+ "{[] [] {dump}]}")
+ return
+
+ cmap = gdb.parse_and_eval(arg_list[0]).cast(
+ gdb.lookup_type('struct cmap').pointer())
+
+ if len(arg_list) >= 3:
+ typeobj = arg_list[1]
+ member = arg_list[2]
+ if len(arg_list) == 4 and arg_list[3] == "dump":
+ dump = True
+
+ for node in ForEachCMAP(cmap.dereference()):
+ if typeobj is None or member is None:
+ print("(struct cmap *) {}".format(node))
+ else:
+ print("({} *) {} {}".format(
+ typeobj,
+ container_of(node,
+ gdb.lookup_type(typeobj).pointer(), member),
+ "=" if dump else ""))
+ if dump:
+ print(" {}\n".format(container_of(
+ node,
+ gdb.lookup_type(typeobj).pointer(),
+ member).dereference()))
+
+
#
# Implements the GDB "ovs_dump_simap" command
#
@@ -1449,6 +1514,7 @@ def extract_pkt(self, pkt):
CmdDumpOfpacts()
CmdDumpOvsList()
CmdDumpPackets()
+CmdDumpCmap()
CmdDumpSimap()
CmdDumpSmap()
CmdDumpUdpifKeys()
From a1de888ab1a4a74dfa6a46b153184fc7dddce6eb Mon Sep 17 00:00:00 2001
From: Han Ding
Date: Wed, 19 Oct 2022 23:06:54 +0800
Subject: [PATCH 029/833] ofproto-dpif-xlate: Update tunnel neighbor when
receive gratuitous ARP.
OVS now just allow the ARP Reply which the destination address is matched
against the known xbridge addresses to update tunnel neighbor. So when OVS
receive the gratuitous ARP from underlay gateway which the source address
and destination address are all gateway IP, tunnel neighbor will not be updated.
Fixes: ba07cf222a0c ("Handle gratuitous ARP requests and replies in tnl_arp_snoop()")
Fixes: 83c2757bd16e ("xlate: Move tnl_neigh_snoop() to terminate_native_tunnel()")
Acked-by: Paolo Valerio
Signed-off-by: Han Ding
Signed-off-by: Ilya Maximets
---
ofproto/ofproto-dpif-xlate.c | 14 +++++++++++---
tests/tunnel-push-pop.at | 20 ++++++++++++++++++++
2 files changed, 31 insertions(+), 3 deletions(-)
diff --git a/ofproto/ofproto-dpif-xlate.c b/ofproto/ofproto-dpif-xlate.c
index 5d2af93fa26..a9cf3cbee0b 100644
--- a/ofproto/ofproto-dpif-xlate.c
+++ b/ofproto/ofproto-dpif-xlate.c
@@ -4178,6 +4178,16 @@ xport_has_ip(const struct xport *xport)
return n_in6 ? true : false;
}
+static bool check_neighbor_reply(struct xlate_ctx *ctx, struct flow *flow)
+{
+ if (flow->dl_type == htons(ETH_TYPE_ARP) ||
+ flow->nw_proto == IPPROTO_ICMPV6) {
+ return is_neighbor_reply_correct(ctx, flow);
+ }
+
+ return false;
+}
+
static bool
terminate_native_tunnel(struct xlate_ctx *ctx, const struct xport *xport,
struct flow *flow, struct flow_wildcards *wc,
@@ -4198,9 +4208,7 @@ terminate_native_tunnel(struct xlate_ctx *ctx, const struct xport *xport,
/* If no tunnel port was found and it's about an ARP or ICMPv6 packet,
* do tunnel neighbor snooping. */
if (*tnl_port == ODPP_NONE &&
- (flow->dl_type == htons(ETH_TYPE_ARP) ||
- flow->nw_proto == IPPROTO_ICMPV6) &&
- is_neighbor_reply_correct(ctx, flow)) {
+ (check_neighbor_reply(ctx, flow) || is_garp(flow, wc))) {
tnl_neigh_snoop(flow, wc, ctx->xbridge->name,
ctx->xin->allow_side_effects);
} else if (*tnl_port != ODPP_NONE &&
diff --git a/tests/tunnel-push-pop.at b/tests/tunnel-push-pop.at
index 92eebba2eaa..013ecbcaa80 100644
--- a/tests/tunnel-push-pop.at
+++ b/tests/tunnel-push-pop.at
@@ -369,6 +369,26 @@ AT_CHECK([ovs-appctl tnl/neigh/show | grep br | sort], [0], [dnl
1.1.2.92 f8:bc:12:44:34:b6 br0
])
+dnl Receiving Gratuitous ARP request with correct VLAN id should alter tunnel neighbor cache
+AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:c8,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=10,pcp=7),encap(eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.92,op=1,sha=f8:bc:12:44:34:c8,tha=00:00:00:00:00:00))'])
+
+ovs-appctl time/warp 1000
+ovs-appctl time/warp 1000
+
+AT_CHECK([ovs-appctl tnl/neigh/show | grep br | sort], [0], [dnl
+1.1.2.92 f8:bc:12:44:34:c8 br0
+])
+
+dnl Receiving Gratuitous ARP reply with correct VLAN id should alter tunnel neighbor cache
+AT_CHECK([ovs-appctl netdev-dummy/receive p0 'recirc_id(0),in_port(1),eth(src=f8:bc:12:44:34:b2,dst=ff:ff:ff:ff:ff:ff),eth_type(0x8100),vlan(vid=10,pcp=7),encap(eth_type(0x0806),arp(sip=1.1.2.92,tip=1.1.2.92,op=2,sha=f8:bc:12:44:34:b2,tha=f8:bc:12:44:34:b2))'])
+
+ovs-appctl time/warp 1000
+ovs-appctl time/warp 1000
+
+AT_CHECK([ovs-appctl tnl/neigh/show | grep br | sort], [0], [dnl
+1.1.2.92 f8:bc:12:44:34:b2 br0
+])
+
dnl Receive ARP reply without VLAN header
AT_CHECK([ovs-vsctl set port br0 tag=0])
AT_CHECK([ovs-appctl tnl/neigh/flush], [0], [OK
From f1eb850aea833c5fd0cc106020184b0db63d7a30 Mon Sep 17 00:00:00 2001
From: Lin Huang
Date: Sun, 23 Oct 2022 12:58:55 +0800
Subject: [PATCH 030/833] mac-learning: Fix learned fdb entries not age out
issue.
After user add a static fdb entry, the get_lru() function will always
return the static fdb entry. That's normal fdb entries will not age
out through mac_learning_run().
Fix the issue by modify the get_lru() function to check the
entry->expires field and not return the entry which entry->expires is
MAC_ENTRY_AGE_STATIC_ENTRY.
Adding a unit test for this.
Fixes: ccc24fc88d59 ("ofproto-dpif: APIs and CLI option to add/delete static fdb entry.")
Acked-by: Eelco Chaudron
Tested-by: Zhang Yuhuang
Signed-off-by: Lin Huang
Signed-off-by: Ilya Maximets
---
lib/mac-learning.c | 37 ++++++++++++++-----------------------
tests/ofproto-dpif.at | 23 +++++++++++++++++++++++
2 files changed, 37 insertions(+), 23 deletions(-)
diff --git a/lib/mac-learning.c b/lib/mac-learning.c
index a60794fb26e..5932e2709d0 100644
--- a/lib/mac-learning.c
+++ b/lib/mac-learning.c
@@ -176,12 +176,18 @@ get_lru(struct mac_learning *ml, struct mac_entry **e)
OVS_REQ_RDLOCK(ml->rwlock)
{
if (!ovs_list_is_empty(&ml->lrus)) {
- *e = mac_entry_from_lru_node(ml->lrus.next);
- return true;
- } else {
- *e = NULL;
- return false;
+ struct mac_entry *entry;
+
+ LIST_FOR_EACH (entry, lru_node, &ml->lrus) {
+ if (entry->expires != MAC_ENTRY_AGE_STATIC_ENTRY) {
+ *e = entry;
+ return true;
+ }
+ }
}
+
+ *e = NULL;
+ return false;
}
static unsigned int
@@ -618,25 +624,10 @@ mac_learning_expire(struct mac_learning *ml, struct mac_entry *e)
void
mac_learning_flush(struct mac_learning *ml)
{
- struct mac_entry *e, *first_static_mac = NULL;
-
- while (get_lru(ml, &e) && (e != first_static_mac)) {
-
- /* Static mac should not be evicted. */
- if (MAC_ENTRY_AGE_STATIC_ENTRY == e->expires) {
-
- /* Make note of first static-mac encountered, so that this while
- * loop will break on visting this mac again via get_lru(). */
- if (!first_static_mac) {
- first_static_mac = e;
- }
+ struct mac_entry *e;
- /* Remove from lru head and append it to tail. */
- ovs_list_remove(&e->lru_node);
- ovs_list_push_back(&ml->lrus, &e->lru_node);
- } else {
- mac_learning_expire(ml, e);
- }
+ while (get_lru(ml, &e)) {
+ mac_learning_expire(ml, e);
}
hmap_shrink(&ml->table);
}
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index 8e993c585ff..eb4cd189609 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -7287,6 +7287,29 @@ AT_CHECK([ovs-appctl coverage/read-counter mac_learning_static_none_move], [0],
OVS_VSWITCHD_STOP
AT_CLEANUP
+AT_SETUP([ofproto-dpif - static-mac learned mac age out])
+OVS_VSWITCHD_START([set bridge br0 fail-mode=standalone -- set bridge br0 other_config:mac-aging-time=5])
+add_of_ports br0 1 2
+
+dnl Add some static mac entries.
+AT_CHECK([ovs-appctl fdb/add br0 p1 0 50:54:00:00:01:01])
+AT_CHECK([ovs-appctl fdb/add br0 p2 0 50:54:00:00:02:02])
+
+dnl Generate some dynamic fdb entries on some ports.
+OFPROTO_TRACE([ovs-dummy], [in_port(1),eth(src=60:54:00:00:00:01)], [-generate], [100,2])
+OFPROTO_TRACE([ovs-dummy], [in_port(2),eth(src=60:54:00:00:00:02)], [-generate], [100,1])
+
+dnl Waiting for aging out.
+ovs-appctl time/warp 20000
+
+dnl Count number of static entries remaining.
+AT_CHECK_UNQUOTED([ovs-appctl fdb/stats-show br0 | grep expired], [0], [dnl
+ Total number of expired MAC entries : 2
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
AT_SETUP([ofproto-dpif - basic truncate action])
OVS_VSWITCHD_START
add_of_ports br0 1 2 3 4 5
From 0d0f282c19e1d83fd18529e225845560c6e830e4 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Tue, 25 Oct 2022 18:33:53 +0200
Subject: [PATCH 031/833] vswitch.xml: Fix the name of rstp-path-cost option.
For some reason it is documented as 'rstp-port-path-cost', while
the code and some other bits of documentation use 'rstp-path-cost'.
Fixes: 9efd308e957c ("Rapid Spanning Tree Protocol (IEEE 802.1D).")
Reviewed-by: David Marchand
Signed-off-by: Ilya Maximets
---
vswitchd/vswitch.xml | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 36388e3c42d..928821a8239 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -2350,7 +2350,7 @@
lowest port-id is elected as the root.
-
The port path cost. The Port's contribution, when it is
the Root Port, to the Root Path Cost for the Bridge. By default the
From 0bd4155f560fe5fb790b2f714d3682008b6ce736 Mon Sep 17 00:00:00 2001
From: Paolo Valerio
Date: Wed, 26 Oct 2022 10:44:09 +0200
Subject: [PATCH 032/833] odp-util: Add missing separator in
format_odp_conntrack_action().
If OVS_CT_ATTR_TIMEOUT is included, the resulting output is
the following:
actions:ct(commit,timeout=1nat(src=10.1.1.240))
Fix it by trivially adding a trailing ',' to timeout as well.
Signed-off-by: Paolo Valerio
Signed-off-by: Ilya Maximets
---
lib/odp-util.c | 2 +-
tests/odp.at | 2 ++
2 files changed, 3 insertions(+), 1 deletion(-)
diff --git a/lib/odp-util.c b/lib/odp-util.c
index ba5be4bb355..72e076e1c5b 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -1004,7 +1004,7 @@ format_odp_conntrack_action(struct ds *ds, const struct nlattr *attr)
ds_put_format(ds, "helper=%s,", helper);
}
if (timeout) {
- ds_put_format(ds, "timeout=%s", timeout);
+ ds_put_format(ds, "timeout=%s,", timeout);
}
if (nat) {
format_odp_ct_nat(ds, nat);
diff --git a/tests/odp.at b/tests/odp.at
index 7a1cf3b2ceb..88b7cfd917f 100644
--- a/tests/odp.at
+++ b/tests/odp.at
@@ -348,7 +348,9 @@ ct(commit,helper=tftp)
ct(commit,timeout=ovs_tp_1_tcp4)
ct(nat)
ct(commit,nat(src))
+ct(commit,timeout=ovs_tp_1_tcp4,nat(src))
ct(commit,nat(dst))
+ct(commit,timeout=ovs_tp_1_tcp4,nat(dst))
ct(commit,nat(src=10.0.0.240,random))
ct(commit,nat(src=10.0.0.240:32768-65535,random))
ct(commit,nat(dst=10.0.0.128-10.0.0.254,hash))
From fec5424aedc9a104013d85cdd4e7399e10777a8a Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Wed, 26 Oct 2022 15:40:28 +0200
Subject: [PATCH 033/833] tc: Fix misaligned writes while parsing pedit.
Offsets within 'rewrite' action are not 4-byte aligned, so has to
be accessed carefully.
SUMMARY: UndefinedBehaviorSanitizer: undefined-behavior lib/tc.c:1132:17 in
lib/tc.c:1132:17: runtime error: store to misaligned address 0x7fba215b2025
for type 'ovs_be32' (aka 'unsigned int'), which requires 4 byte alignment
0 0xd78857 in nl_parse_act_pedit lib/tc.c:1132:24
1 0xd68103 in nl_parse_single_action lib/tc.c:1936:15
2 0xd624ee in nl_parse_flower_actions lib/tc.c:2024:19
3 0xd624ee in nl_parse_flower_options lib/tc.c:2139:12
4 0xd5f082 in parse_netlink_to_tc_flower lib/tc.c:2187:12
5 0xd6a2a1 in tc_replace_flower lib/tc.c:3776:19
6 0xd2ae8f in netdev_tc_flow_put lib/netdev-offload-tc.c:2350:11
7 0x951d07 in netdev_flow_put lib/netdev-offload.c:318:14
8 0xcbb81a in parse_flow_put lib/dpif-netlink.c:2297:11
9 0xcbb81a in try_send_to_netdev lib/dpif-netlink.c:2384:15
10 0xcbb81a in dpif_netlink_operate lib/dpif-netlink.c:2455:23
11 0x8678ae in dpif_operate lib/dpif.c:1372:13
12 0x6bcc89 in handle_upcalls ofproto/ofproto-dpif-upcall.c:1674:5
13 0x6bcc89 in recv_upcalls ofproto/ofproto-dpif-upcall.c:905:9
14 0x6b7f9a in udpif_upcall_handler ofproto/ofproto-dpif-upcall.c:801:13
15 0xb54c5a in ovsthread_wrapper lib/ovs-thread.c:422:12
16 0x7fba2f2081ce in start_thread (/lib64/libpthread.so.0+0x81ce)
17 0x7fba2de39dd2 in clone (/lib64/libc.so.6+0x39dd2)
Fixes: 8ada482bbe19 ("tc: Add header rewrite using tc pedit action")
Reviewed-by: Simon Horman
Signed-off-by: Ilya Maximets
---
lib/tc.c | 11 ++++++++---
1 file changed, 8 insertions(+), 3 deletions(-)
diff --git a/lib/tc.c b/lib/tc.c
index 4d7de8adde4..dce66ab0bd3 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -1114,7 +1114,7 @@ nl_parse_act_pedit(struct nlattr *options, struct tc_flower *flower)
int diff = flower_off + (keys->off - mf);
ovs_be32 *dst = (void *) (rewrite_key + diff);
ovs_be32 *dst_m = (void *) (rewrite_mask + diff);
- ovs_be32 mask, mask_word, data_word;
+ ovs_be32 mask, mask_word, data_word, val;
uint32_t zero_bits;
mask_word = htonl(ntohl(keys->mask) << m->boundary_shift);
@@ -1129,8 +1129,13 @@ nl_parse_act_pedit(struct nlattr *options, struct tc_flower *flower)
mask &= htonl(UINT32_MAX << zero_bits);
}
- *dst_m |= mask;
- *dst |= data_word & mask;
+ val = get_unaligned_be32(dst_m);
+ val |= mask;
+ put_unaligned_be32(dst_m, val);
+
+ val = get_unaligned_be32(dst);
+ val |= data_word & mask;
+ put_unaligned_be32(dst, val);
}
}
From a3848d98e19479cf87cd2216fa606f51fdb32b52 Mon Sep 17 00:00:00 2001
From: Paolo Valerio
Date: Mon, 31 Oct 2022 16:57:33 +0100
Subject: [PATCH 034/833] conntrack: Show parent key if present.
Similarly to what happens when CTA_TUPLE_MASTER is present in a ct
netlink dump, add the ability to print out the parent key to the
userspace implementation as well.
Signed-off-by: Paolo Valerio
Signed-off-by: Ilya Maximets
---
lib/conntrack.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/conntrack.c b/lib/conntrack.c
index 13c5ab6283d..550b2be9b91 100644
--- a/lib/conntrack.c
+++ b/lib/conntrack.c
@@ -2647,6 +2647,10 @@ conn_to_ct_dpif_entry(const struct conn *conn, struct ct_dpif_entry *entry,
conn_key_to_tuple(&conn->key, &entry->tuple_orig);
conn_key_to_tuple(&conn->rev_key, &entry->tuple_reply);
+ if (conn->alg_related) {
+ conn_key_to_tuple(&conn->parent_key, &entry->tuple_parent);
+ }
+
entry->zone = conn->key.zone;
ovs_mutex_lock(&conn->lock);
From 02be2c318c8ef3255b62541ec3de53bd6f325c7a Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Mon, 31 Oct 2022 17:17:59 +0100
Subject: [PATCH 035/833] netdev-linux: Fix inability to apply QoS on ports
with custom qdiscs.
tc_del_qdisc() function only removes qdiscs with handle '1:0'. If for
some reason the interface has a qdisc with non-zero handle attached,
tc_del_qdisc() will not delete it and subsequent tc_install() will fail
to install a new qdisc.
The problem is that Libvirt by default is setting noqueue qdisc for all
tap interfaces it creates. This is done for performance reasons to
ensure lockless xmit.
The issue is causing non-working QoS in OpenStack setups since new
versions of Libvirt started to use OVS to configure it. In the past,
Libvirt configured TC on its own, bypassing OVS.
Removing the handle value from the deletion request, so any qdisc can
be removed. Changing the error checking to also pass ENOENT, since
that is the error reported if only default qdisc is present.
Alternative solution might be to use NLM_F_REPLACE, but that will be
a larger change with a potential need of refactoring.
Potential side effect of the change is that OVS may start removing
qdiscs that it didn't remove before. Though it's not a new issue and
'linux-noop' QoS type should be used for ports that OVS should not
touch. Otherwise, OVS owns qdiscs on all interfaces attached to it.
While at it, adding more logs as errors are not logged in any way
at the moment making the issue hard to debug.
Reported-at: https://bugzilla.redhat.com/2138339
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-October/052088.html
Reported-at: https://github.com/openvswitch/ovs-issues/issues/268
Suggested-by: Slawek Kaplonski
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
lib/netdev-linux.c | 13 +++++++++----
tests/system-traffic.at | 36 ++++++++++++++++++++++++++++++++++++
2 files changed, 45 insertions(+), 4 deletions(-)
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 7ea4070c23a..59e8dc0ae6c 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -2984,12 +2984,18 @@ netdev_linux_set_qos(struct netdev *netdev_,
/* Delete existing qdisc. */
error = tc_del_qdisc(netdev_);
if (error) {
+ VLOG_WARN_RL(&rl, "%s: Failed to delete existing qdisc: %s",
+ netdev_get_name(netdev_), ovs_strerror(error));
goto exit;
}
ovs_assert(netdev->tc == NULL);
/* Install new qdisc. */
error = new_ops->tc_install(netdev_, details);
+ if (error) {
+ VLOG_WARN_RL(&rl, "%s: Failed to install new qdisc: %s",
+ netdev_get_name(netdev_), ovs_strerror(error));
+ }
ovs_assert((error == 0) == (netdev->tc != NULL));
}
@@ -6143,13 +6149,12 @@ tc_del_qdisc(struct netdev *netdev_)
if (!tcmsg) {
return ENODEV;
}
- tcmsg->tcm_handle = tc_make_handle(1, 0);
tcmsg->tcm_parent = TC_H_ROOT;
error = tc_transact(&request, NULL);
- if (error == EINVAL) {
- /* EINVAL probably means that the default qdisc was in use, in which
- * case we've accomplished our purpose. */
+ if (error == EINVAL || error == ENOENT) {
+ /* EINVAL or ENOENT probably means that the default qdisc was in use,
+ * in which case we've accomplished our purpose. */
error = 0;
}
if (!error && netdev->tc) {
diff --git a/tests/system-traffic.at b/tests/system-traffic.at
index 731de439c7a..e5403519f2a 100644
--- a/tests/system-traffic.at
+++ b/tests/system-traffic.at
@@ -2080,6 +2080,42 @@ OVS_WAIT_UNTIL([cat p1.pcap | grep -E "0x0050: *2627 *2829 *2a2b *2c2d *2e2f *3
OVS_WAIT_UNTIL([cat p1.pcap | grep -E "0x0060: *3637" 2>&1 1>/dev/null])
+OVS_TRAFFIC_VSWITCHD_STOP
+AT_CLEANUP
+
+AT_BANNER([QoS])
+
+AT_SETUP([QoS - basic configuration])
+AT_SKIP_IF([test $HAVE_TC = no])
+OVS_TRAFFIC_VSWITCHD_START()
+
+ADD_NAMESPACES(at_ns0, at_ns1)
+
+ADD_VETH(p0, at_ns0, br0, "10.1.1.1/24")
+ADD_VETH(p1, at_ns1, br0, "10.1.1.2/24")
+
+dnl Adding a custom qdisc to ovs-p1, ovs-p0 will have the default qdisc.
+AT_CHECK([tc qdisc add dev ovs-p1 root noqueue])
+AT_CHECK([tc qdisc show dev ovs-p1 | grep -q noqueue])
+
+dnl Configure the same QoS for both ports.
+AT_CHECK([ovs-vsctl set port ovs-p0 qos=@qos -- set port ovs-p1 qos=@qos dnl
+ -- --id=@qos create qos dnl
+ type=linux-htb other-config:max-rate=3000000 queues:0=@queue dnl
+ -- --id=@queue create queue dnl
+ other_config:min-rate=2000000 other_config:max-rate=3000000 dnl
+ other_config:burst=3000000],
+ [ignore], [ignore])
+
+dnl Wait for qdiscs to be applied.
+OVS_WAIT_UNTIL([tc qdisc show dev ovs-p0 | grep -q htb])
+OVS_WAIT_UNTIL([tc qdisc show dev ovs-p1 | grep -q htb])
+
+dnl Check the configuration.
+m4_define([HTB_CONF], [rate 2Mbit ceil 3Mbit burst 375000b cburst 375000b])
+AT_CHECK([tc class show dev ovs-p0 | grep -q 'class htb .* HTB_CONF'])
+AT_CHECK([tc class show dev ovs-p1 | grep -q 'class htb .* HTB_CONF'])
+
OVS_TRAFFIC_VSWITCHD_STOP
AT_CLEANUP
From 235fc6f4c416f07ed3cc559c271641542eaf2e04 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Wed, 2 Nov 2022 23:45:04 +0100
Subject: [PATCH 036/833] AUTHORS: Add Daniel Ding.
Signed-off-by: Ilya Maximets
---
AUTHORS.rst | 1 +
1 file changed, 1 insertion(+)
diff --git a/AUTHORS.rst b/AUTHORS.rst
index f62840b1b36..7bb4e41a05d 100644
--- a/AUTHORS.rst
+++ b/AUTHORS.rst
@@ -117,6 +117,7 @@ Dan Wendlandt
Dan Williams dcbw@redhat.com
Daniel Alvarez dalvarez@redhat.com
Daniel Borkmann dborkman@redhat.com
+Daniel Ding zhihui.ding@easystack.cn
Daniel Hiltgen daniel@netkine.com
Daniel Roman
Daniele Di Proietto daniele.di.proietto@gmail.com
From 9a638044ecf26ef4fc3309b75be5aaf1280496bb Mon Sep 17 00:00:00 2001
From: Han Zhou
Date: Tue, 1 Nov 2022 21:09:07 -0700
Subject: [PATCH 037/833] ovsdb: transaction: Refactor assess_weak_refs.
The loops for adding weak refs are quite similar. Abstract to a
function, which will be used by one more cases later. The patch also
changes the txn_row arg to the source row.
Signed-off-by: Han Zhou
Signed-off-by: Ilya Maximets
---
ovsdb/transaction.c | 78 +++++++++++++++++++++------------------------
1 file changed, 36 insertions(+), 42 deletions(-)
diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c
index bb997b45b5d..6796880561e 100644
--- a/ovsdb/transaction.c
+++ b/ovsdb/transaction.c
@@ -587,7 +587,7 @@ ovsdb_txn_update_weak_refs(struct ovsdb_txn *txn OVS_UNUSED,
}
static void
-add_weak_ref(struct ovsdb_txn_row *txn_row, const struct ovsdb_row *dst_,
+add_weak_ref(const struct ovsdb_row *src, const struct ovsdb_row *dst_,
struct ovs_list *ref_list,
const union ovsdb_atom *key, const union ovsdb_atom *value,
bool by_key, const struct ovsdb_column *column)
@@ -595,13 +595,13 @@ add_weak_ref(struct ovsdb_txn_row *txn_row, const struct ovsdb_row *dst_,
struct ovsdb_row *dst = CONST_CAST(struct ovsdb_row *, dst_);
struct ovsdb_weak_ref *weak;
- if (txn_row->new == dst) {
+ if (src == dst) {
return;
}
weak = xzalloc(sizeof *weak);
- weak->src_table = txn_row->new->table;
- weak->src = *ovsdb_row_get_uuid(txn_row->new);
+ weak->src_table = src->table;
+ weak->src = *ovsdb_row_get_uuid(src);
weak->dst_table = dst->table;
weak->dst = *ovsdb_row_get_uuid(dst);
ovsdb_type_clone(&weak->type, &column->type);
@@ -616,7 +616,7 @@ add_weak_ref(struct ovsdb_txn_row *txn_row, const struct ovsdb_row *dst_,
}
static void
-find_and_add_weak_ref(struct ovsdb_txn_row *txn_row,
+find_and_add_weak_ref(const struct ovsdb_row *src,
const union ovsdb_atom *key,
const union ovsdb_atom *value,
const struct ovsdb_column *column,
@@ -628,7 +628,7 @@ find_and_add_weak_ref(struct ovsdb_txn_row *txn_row,
: ovsdb_table_get_row(column->type.value.uuid.refTable, &value->uuid);
if (row) {
- add_weak_ref(txn_row, row, ref_list, key, value, by_key, column);
+ add_weak_ref(src, row, ref_list, key, value, by_key, column);
} else if (not_found) {
if (uuid_is_zero(by_key ? &key->uuid : &value->uuid)) {
*zero = true;
@@ -637,6 +637,31 @@ find_and_add_weak_ref(struct ovsdb_txn_row *txn_row,
}
}
+static void
+find_and_add_weak_refs(const struct ovsdb_row *src,
+ const struct ovsdb_datum *datum,
+ const struct ovsdb_column *column,
+ struct ovs_list *ref_list,
+ struct ovsdb_datum *not_found, bool *zero)
+{
+ unsigned int i;
+
+ if (ovsdb_base_type_is_weak_ref(&column->type.key)) {
+ for (i = 0; i < datum->n; i++) {
+ find_and_add_weak_ref(src, &datum->keys[i],
+ datum->values ? &datum->values[i] : NULL,
+ column, true, ref_list, not_found, zero);
+ }
+ }
+
+ if (ovsdb_base_type_is_weak_ref(&column->type.value)) {
+ for (i = 0; i < datum->n; i++) {
+ find_and_add_weak_ref(src, &datum->keys[i], &datum->values[i],
+ column, false, ref_list, not_found, zero);
+ }
+ }
+}
+
static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
{
@@ -678,7 +703,7 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
const struct ovsdb_column *column = node->data;
struct ovsdb_datum *datum = &txn_row->new->fields[column->index];
struct ovsdb_datum added, removed, deleted_refs;
- unsigned int orig_n, i;
+ unsigned int orig_n;
bool zero = false;
orig_n = datum->n;
@@ -712,23 +737,8 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
/* Checking added data and creating new references. */
ovsdb_datum_init_empty(&deleted_refs);
- if (ovsdb_base_type_is_weak_ref(&column->type.key)) {
- for (i = 0; i < added.n; i++) {
- find_and_add_weak_ref(txn_row, &added.keys[i],
- added.values ? &added.values[i] : NULL,
- column, true, &txn_row->added_refs,
- &deleted_refs, &zero);
- }
- }
-
- if (ovsdb_base_type_is_weak_ref(&column->type.value)) {
- for (i = 0; i < added.n; i++) {
- find_and_add_weak_ref(txn_row, &added.keys[i],
- &added.values[i],
- column, false, &txn_row->added_refs,
- &deleted_refs, &zero);
- }
- }
+ find_and_add_weak_refs(txn_row->new, &added, column,
+ &txn_row->added_refs, &deleted_refs, &zero);
if (deleted_refs.n) {
/* Removing all the references that doesn't point to valid rows. */
ovsdb_datum_sort_unique(&deleted_refs, &column->type);
@@ -741,24 +751,8 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
/* Creating refs that needs to be removed on commit. This includes
* both: the references that got directly removed from the datum and
* references removed due to deletion of a referenced row. */
- if (ovsdb_base_type_is_weak_ref(&column->type.key)) {
- for (i = 0; i < removed.n; i++) {
- find_and_add_weak_ref(txn_row, &removed.keys[i],
- removed.values
- ? &removed.values[i] : NULL,
- column, true, &txn_row->deleted_refs,
- NULL, NULL);
- }
- }
-
- if (ovsdb_base_type_is_weak_ref(&column->type.value)) {
- for (i = 0; i < removed.n; i++) {
- find_and_add_weak_ref(txn_row, &removed.keys[i],
- &removed.values[i],
- column, false, &txn_row->deleted_refs,
- NULL, NULL);
- }
- }
+ find_and_add_weak_refs(txn_row->new, &removed, column,
+ &txn_row->deleted_refs, NULL, NULL);
ovsdb_datum_destroy(&removed, &column->type);
if (datum->n != orig_n) {
From c8a08db101237b985c44f81b9a2dd09130c9c3cf Mon Sep 17 00:00:00 2001
From: Han Zhou
Date: Tue, 1 Nov 2022 21:09:08 -0700
Subject: [PATCH 038/833] ovsdb: transaction: Fix weak reference leak.
When a row is deleted, if the row has weak references to other rows, the
weak reference nodes attached to the destination rows (through
weak->dst_node hmap) are not destroyed.
Deleting weak references is properly handled when a row is modified. The
removed references are taken care by:
1. assess_weak_refs() figures out the deleted references from the row
and add them to txn_row->deleted_refs.
2. before commit, in ovsdb_txn_update_weak_refs() it finds the
destination row for each item in txn_row->deleted_refs (from step 1),
and destroy the corresponding weak references of the destination row.
However, when the row is deleted, the step 1 in assess_weak_refs() is
missing. It directly returns without adding the deleted references to
txn_row->deleted_refs. So, the destination nodes will keep those weak
references although the source side of the references are already
deleted. When such rows that originating weak references are created
and deleted, more and more such useless weak reference structures
accumulate in the memory, and can stay there until the destination rows
are deleted. It is possible that the destination row is never deleted,
and in such case the ovsdb-server memory keeps growing (although it is
not strictly memory leak, because the structures are still referenced).
This problem has an impact to applications like OVN SB DB - the memory
grows very fast in long-running deployments and finally causes OOM.
This patch fixes it by generating deleted_refs for deleted rows in
assess_weak_refs().
Fixes: 4dbff9f0a685 ("ovsdb: transaction: Incremental reassessment of weak refs.")
Signed-off-by: Han Zhou
Signed-off-by: Ilya Maximets
---
ovsdb/transaction.c | 12 ++++++++++--
1 file changed, 10 insertions(+), 2 deletions(-)
diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c
index 6796880561e..5d7c70a51c0 100644
--- a/ovsdb/transaction.c
+++ b/ovsdb/transaction.c
@@ -666,7 +666,7 @@ static struct ovsdb_error * OVS_WARN_UNUSED_RESULT
assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
{
struct ovsdb_weak_ref *weak;
- struct ovsdb_table *table;
+ struct ovsdb_table *table = txn_row->table;
struct shash_node *node;
if (txn_row->old && !txn_row->new) {
@@ -688,6 +688,15 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
ovs_assert(ovs_list_is_empty(&weak->src_node));
ovs_list_insert(&src_txn_row->deleted_refs, &weak->src_node);
}
+
+ /* Creating refs that needs to be removed on commit. */
+ SHASH_FOR_EACH (node, &table->schema->columns) {
+ const struct ovsdb_column *column = node->data;
+ struct ovsdb_datum *datum = &txn_row->old->fields[column->index];
+
+ find_and_add_weak_refs(txn_row->old, datum, column,
+ &txn_row->deleted_refs, NULL, NULL);
+ }
}
if (!txn_row->new) {
@@ -698,7 +707,6 @@ assess_weak_refs(struct ovsdb_txn *txn, struct ovsdb_txn_row *txn_row)
return NULL;
}
- table = txn_row->table;
SHASH_FOR_EACH (node, &table->schema->columns) {
const struct ovsdb_column *column = node->data;
struct ovsdb_datum *datum = &txn_row->new->fields[column->index];
From 165edb9ae2f85f4904aac6bba8370a4a891a867b Mon Sep 17 00:00:00 2001
From: Ian Stokes
Date: Wed, 2 Nov 2022 18:47:02 +0000
Subject: [PATCH 039/833] ci: Update meson requirement for DPDK.
The current version of meson used for building DPDK is 0.49.2.
This has the restriction of holding the required python version to 3.9.
A recent change [1] in DPDK bumped requirements on meson to 0.53.2.
Update the version of meson used to build DPDK to 0.53.2 to remove the
restriction.
[1] https://git.dpdk.org/dpdk/commit/?id=909ad7b80e5e
Signed-off-by: Ian Stokes
Reviewed-by: David Marchand
---
.ci/linux-prepare.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.ci/linux-prepare.sh b/.ci/linux-prepare.sh
index 16a7aec0b5b..11d75a6d598 100755
--- a/.ci/linux-prepare.sh
+++ b/.ci/linux-prepare.sh
@@ -27,7 +27,7 @@ cd ..
pip3 install --disable-pip-version-check --user wheel
pip3 install --disable-pip-version-check --user \
flake8 'hacking>=3.0' netaddr pyparsing sphinx setuptools pyelftools
-pip3 install --user 'meson==0.49.2'
+pip3 install --user 'meson==0.53.2'
if [ "$M32" ]; then
# Installing 32-bit libraries.
From d77f93f363b7bb68186b432f579855b8a837d64e Mon Sep 17 00:00:00 2001
From: Roi Dayan
Date: Fri, 4 Nov 2022 15:06:03 +0200
Subject: [PATCH 040/833] tc: Pass tun_metadata by reference
Fix coverity big parameter passed by value
CID 549858 (#1 of 1): Big parameter passed by value (PASS_BY_VALUE)
pass_by_value: Passing parameter metadata of type struct tun_metadata (size 272 bytes) by value,
which exceeds the medium threshold of 256 bytes
Signed-off-by: Roi Dayan
Signed-off-by: Simon Horman
---
lib/tc.c | 20 ++++++++++----------
1 file changed, 10 insertions(+), 10 deletions(-)
diff --git a/lib/tc.c b/lib/tc.c
index dce66ab0bd3..b9a0138459e 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -2501,13 +2501,13 @@ nl_msg_put_act_tunnel_key_release(struct ofpbuf *request)
static void
nl_msg_put_act_tunnel_geneve_option(struct ofpbuf *request,
- struct tun_metadata tun_metadata)
+ struct tun_metadata *tun_metadata)
{
const struct geneve_opt *opt;
size_t outer, inner;
int len, cnt = 0;
- len = tun_metadata.present.len;
+ len = tun_metadata->present.len;
if (!len) {
return;
}
@@ -2515,7 +2515,7 @@ nl_msg_put_act_tunnel_geneve_option(struct ofpbuf *request,
outer = nl_msg_start_nested(request, TCA_TUNNEL_KEY_ENC_OPTS);
while (len) {
- opt = &tun_metadata.opts.gnv[cnt];
+ opt = &tun_metadata->opts.gnv[cnt];
inner = nl_msg_start_nested(request, TCA_TUNNEL_KEY_ENC_OPTS_GENEVE);
nl_msg_put_be16(request, TCA_TUNNEL_KEY_ENC_OPT_GENEVE_CLASS,
@@ -2539,7 +2539,7 @@ nl_msg_put_act_tunnel_key_set(struct ofpbuf *request, bool id_present,
ovs_be32 ipv4_dst, struct in6_addr *ipv6_src,
struct in6_addr *ipv6_dst,
ovs_be16 tp_dst, uint8_t tos, uint8_t ttl,
- struct tun_metadata tun_metadata,
+ struct tun_metadata *tun_metadata,
uint8_t no_csum, uint32_t action_pc)
{
size_t offset;
@@ -3207,7 +3207,7 @@ nl_msg_put_flower_acts(struct ofpbuf *request, struct tc_flower *flower)
action->encap.tp_dst,
action->encap.tos,
action->encap.ttl,
- action->encap.data,
+ &action->encap.data,
action->encap.no_csum,
action_pc);
nl_msg_put_act_flags(request);
@@ -3379,20 +3379,20 @@ nl_msg_put_masked_value(struct ofpbuf *request, uint16_t type,
static void
nl_msg_put_flower_tunnel_opts(struct ofpbuf *request, uint16_t type,
- struct tun_metadata metadata)
+ struct tun_metadata *metadata)
{
struct geneve_opt *opt;
size_t outer, inner;
int len, cnt = 0;
- len = metadata.present.len;
+ len = metadata->present.len;
if (!len) {
return;
}
outer = nl_msg_start_nested(request, type);
while (len) {
- opt = &metadata.opts.gnv[cnt];
+ opt = &metadata->opts.gnv[cnt];
inner = nl_msg_start_nested(request, TCA_FLOWER_KEY_ENC_OPTS_GENEVE);
nl_msg_put_be16(request, TCA_FLOWER_KEY_ENC_OPT_GENEVE_CLASS,
@@ -3469,9 +3469,9 @@ nl_msg_put_flower_tunnel(struct ofpbuf *request, struct tc_flower *flower)
nl_msg_put_be32(request, TCA_FLOWER_KEY_ENC_KEY_ID, id);
}
nl_msg_put_flower_tunnel_opts(request, TCA_FLOWER_KEY_ENC_OPTS,
- flower->key.tunnel.metadata);
+ &flower->key.tunnel.metadata);
nl_msg_put_flower_tunnel_opts(request, TCA_FLOWER_KEY_ENC_OPTS_MASK,
- flower->mask.tunnel.metadata);
+ &flower->mask.tunnel.metadata);
}
#define FLOWER_PUT_MASKED_VALUE(member, type) \
From 6ccf8efffccbacd1d7caacbde37f6999a66b3867 Mon Sep 17 00:00:00 2001
From: Roi Dayan
Date: Fri, 4 Nov 2022 15:06:04 +0200
Subject: [PATCH 041/833] tc: Fix coverity dereference null return value
CID 550702 (#1 of 1): Dereference null return value (NULL_RETURNS)
7. dereference: Dereferencing a pointer that might be NULL ex_type when calling nl_attr_get_u16.
Signed-off-by: Roi Dayan
Signed-off-by: Simon Horman
---
lib/tc.c | 4 ++++
1 file changed, 4 insertions(+)
diff --git a/lib/tc.c b/lib/tc.c
index b9a0138459e..a66dc432f98 100644
--- a/lib/tc.c
+++ b/lib/tc.c
@@ -1087,6 +1087,10 @@ nl_parse_act_pedit(struct nlattr *options, struct tc_flower *flower)
}
ex_type = nl_attr_find_nested(nla, TCA_PEDIT_KEY_EX_HTYPE);
+ if (!ex_type) {
+ return EOPNOTSUPP;
+ }
+
type = nl_attr_get_u16(ex_type);
err = csum_update_flag(flower, type);
From 48a0adefae0a06a80be85dfe9adeb2ee2e51704a Mon Sep 17 00:00:00 2001
From: Roi Dayan
Date: Fri, 4 Nov 2022 15:06:05 +0200
Subject: [PATCH 042/833] dpif-netlink: Remove redundant null assignment
The assignment of the features pointer is not doing
anything and can be removed.
CC: Justin Pettit
Signed-off-by: Roi Dayan
Acked-by: Justin Pettit
Signed-off-by: Simon Horman
---
lib/dpif-netlink.c | 1 -
1 file changed, 1 deletion(-)
diff --git a/lib/dpif-netlink.c b/lib/dpif-netlink.c
index a620a6ec52d..026b0daa8d8 100644
--- a/lib/dpif-netlink.c
+++ b/lib/dpif-netlink.c
@@ -4105,7 +4105,6 @@ dpif_netlink_meter_get_features(const struct dpif *dpif_,
struct ofputil_meter_features *features)
{
if (probe_broken_meters(CONST_CAST(struct dpif *, dpif_))) {
- features = NULL;
return;
}
From c230c7579c14cbe5119df627f550a3db26391a39 Mon Sep 17 00:00:00 2001
From: Paul Blakey
Date: Wed, 2 Nov 2022 14:46:00 +0200
Subject: [PATCH 043/833] netdev-offload-tc: Reserve lower tc prios for ip
ethertypes
Currently ethertype to prio hmap is static and the first ethertype
being used gets a lower priority. Usually there is an arp request
before the ip traffic and the arp ethertype gets a lower tc priority
while the ip traffic proto gets a higher priority.
In this case ip traffic will go through more hops in tc and HW.
Instead, reserve lower priorities for ip ethertypes.
Signed-off-by: Paul Blakey
Reviewed-by: Roi Dayan
Acked-by: Eelco Chaudron
Signed-off-by: Simon Horman
---
lib/netdev-offload-tc.c | 35 ++++++++++++++++++++++++++++-------
lib/tc.h | 2 ++
2 files changed, 30 insertions(+), 7 deletions(-)
diff --git a/lib/netdev-offload-tc.c b/lib/netdev-offload-tc.c
index f6f90a741fd..ce7f8ad9730 100644
--- a/lib/netdev-offload-tc.c
+++ b/lib/netdev-offload-tc.c
@@ -325,6 +325,28 @@ struct prio_map_data {
uint16_t prio;
};
+static uint16_t
+get_next_available_prio(ovs_be16 protocol)
+{
+ static uint16_t last_prio = TC_RESERVED_PRIORITY_MAX;
+
+ if (multi_mask_per_prio) {
+ if (protocol == htons(ETH_P_IP)) {
+ return TC_RESERVED_PRIORITY_IPV4;
+ } else if (protocol == htons(ETH_P_IPV6)) {
+ return TC_RESERVED_PRIORITY_IPV6;
+ }
+ }
+
+ /* last_prio can overflow if there will be many different kinds of
+ * flows which shouldn't happen organically. */
+ if (last_prio == UINT16_MAX) {
+ return TC_RESERVED_PRIORITY_NONE;
+ }
+
+ return ++last_prio;
+}
+
/* Get free prio for tc flower
* If prio is already allocated for mask/eth_type combination then return it.
* If not assign new prio.
@@ -336,11 +358,11 @@ get_prio_for_tc_flower(struct tc_flower *flower)
{
static struct hmap prios = HMAP_INITIALIZER(&prios);
static struct ovs_mutex prios_lock = OVS_MUTEX_INITIALIZER;
- static uint16_t last_prio = TC_RESERVED_PRIORITY_MAX;
size_t key_len = sizeof(struct tc_flower_key);
size_t hash = hash_int((OVS_FORCE uint32_t) flower->key.eth_type, 0);
struct prio_map_data *data;
struct prio_map_data *new_data;
+ uint16_t prio;
if (!multi_mask_per_prio) {
hash = hash_bytes(&flower->mask, key_len, hash);
@@ -359,21 +381,20 @@ get_prio_for_tc_flower(struct tc_flower *flower)
}
}
- if (last_prio == UINT16_MAX) {
- /* last_prio can overflow if there will be many different kinds of
- * flows which shouldn't happen organically. */
+ prio = get_next_available_prio(flower->key.eth_type);
+ if (prio == TC_RESERVED_PRIORITY_NONE) {
ovs_mutex_unlock(&prios_lock);
- return 0;
+ return prio;
}
new_data = xzalloc(sizeof *new_data);
memcpy(&new_data->mask, &flower->mask, key_len);
- new_data->prio = ++last_prio;
+ new_data->prio = prio;
new_data->protocol = flower->key.eth_type;
hmap_insert(&prios, &new_data->node, hash);
ovs_mutex_unlock(&prios_lock);
- return new_data->prio;
+ return prio;
}
static uint32_t
diff --git a/lib/tc.h b/lib/tc.h
index 161f438124b..a828fd3e3f1 100644
--- a/lib/tc.h
+++ b/lib/tc.h
@@ -49,6 +49,8 @@
enum tc_flower_reserved_prio {
TC_RESERVED_PRIORITY_NONE,
TC_RESERVED_PRIORITY_POLICE,
+ TC_RESERVED_PRIORITY_IPV4,
+ TC_RESERVED_PRIORITY_IPV6,
__TC_RESERVED_PRIORITY_MAX
};
#define TC_RESERVED_PRIORITY_MAX (__TC_RESERVED_PRIORITY_MAX -1)
From bb9fedb79af8df5f14922ae588866314a0e31bf5 Mon Sep 17 00:00:00 2001
From: Chaoyong He
Date: Wed, 20 Jul 2022 16:42:00 +0800
Subject: [PATCH 044/833] netdev-offload-dpdk: Enhance the support of tunnel
pop action
Populate the 'is_ipv6' field of 'struct rte_flow_tunnel', which can
be used in the implementation of tunnel pop action for DPDK PMD.
Fixes: be56e063d028 ("netdev-offload-dpdk: Support tunnel pop action.")
Signed-off-by: Chaoyong He
Reviewed-by: Louis Peens
Acked-by: Eli Britstein
Signed-off-by: Simon Horman
---
lib/netdev-offload-dpdk.c | 14 ++++++++++----
1 file changed, 10 insertions(+), 4 deletions(-)
diff --git a/lib/netdev-offload-dpdk.c b/lib/netdev-offload-dpdk.c
index 80a64a6cc06..38f00fd309e 100644
--- a/lib/netdev-offload-dpdk.c
+++ b/lib/netdev-offload-dpdk.c
@@ -1099,12 +1099,18 @@ vport_to_rte_tunnel(struct netdev *vport,
const struct netdev_tunnel_config *tnl_cfg;
memset(tunnel, 0, sizeof *tunnel);
+
+ tnl_cfg = netdev_get_tunnel_config(vport);
+ if (!tnl_cfg) {
+ return -1;
+ }
+
+ if (!IN6_IS_ADDR_V4MAPPED(&tnl_cfg->ipv6_dst)) {
+ tunnel->is_ipv6 = true;
+ }
+
if (!strcmp(netdev_get_type(vport), "vxlan")) {
tunnel->type = RTE_FLOW_ITEM_TYPE_VXLAN;
- tnl_cfg = netdev_get_tunnel_config(vport);
- if (!tnl_cfg) {
- return -1;
- }
tunnel->tp_dst = tnl_cfg->dst_port;
if (!VLOG_DROP_DBG(&rl)) {
ds_put_format(s_tnl, "flow tunnel create %d type vxlan; ",
From 62ac7b8a53506d910b787d2909fe8bbe9fd99855 Mon Sep 17 00:00:00 2001
From: Wilson Peng
Date: Wed, 9 Nov 2022 09:35:06 +0800
Subject: [PATCH 045/833] datapath-windows: Check the condition to reset pseudo
header checksum on Rx side
If ovs node running on Windows is processing NAT action on the RX side, it will
reset pseudo header checksum only if the L4 checksum is same as the calculated
pseudo header checksum before NAT action.
Without the fix, if the L4 header checksum is filled with a pseudo header checksum
(sourceip, dstip, protocol, tcppayloadlen+tcpheaderlen) OVS will still do the
checksum update(replace some IP and port and recalculate the checksum). It will
lead to incorrect L4 header checksum.
Reported-at:https://github.com/openvswitch/ovs-issues/issues/265
Signed-off-by: Wilson Peng
Signed-off-by: Alin-Gabriel Serdean
---
datapath-windows/ovsext/Actions.c | 27 +++++++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
diff --git a/datapath-windows/ovsext/Actions.c b/datapath-windows/ovsext/Actions.c
index 2f44086b469..97029b0f4e1 100644
--- a/datapath-windows/ovsext/Actions.c
+++ b/datapath-windows/ovsext/Actions.c
@@ -1514,6 +1514,8 @@ OvsUpdateAddressAndPort(OvsForwardingContext *ovsFwdCtx,
UINT16 *checkField = NULL;
BOOLEAN l4Offload = FALSE;
NDIS_TCP_IP_CHECKSUM_NET_BUFFER_LIST_INFO csumInfo;
+ UINT16 preNatPseudoChecksum = 0;
+ BOOLEAN preservePseudoChecksum = FALSE;
ASSERT(layers->value != 0);
@@ -1549,6 +1551,11 @@ OvsUpdateAddressAndPort(OvsForwardingContext *ovsFwdCtx,
* case, we only update the TTL.
*/
/*Only tx direction the checksum value will be reset to be PseudoChecksum*/
+ if (!isTx) {
+ preNatPseudoChecksum = IPPseudoChecksum(&ipHdr->saddr, &ipHdr->daddr,
+ tcpHdr ? IPPROTO_TCP : IPPROTO_UDP,
+ ntohs(ipHdr->tot_len) - ipHdr->ihl * 4);
+ }
if (isSource) {
addrField = &ipHdr->saddr;
@@ -1565,7 +1572,12 @@ OvsUpdateAddressAndPort(OvsForwardingContext *ovsFwdCtx,
((BOOLEAN)csumInfo.Receive.UdpChecksumSucceeded ||
(BOOLEAN)csumInfo.Receive.UdpChecksumFailed);
}
- if (isTx && l4Offload) {
+ if (!isTx && l4Offload) {
+ if (*checkField == preNatPseudoChecksum) {
+ preservePseudoChecksum = TRUE;
+ }
+ }
+ if (isTx && l4Offload || preservePseudoChecksum) {
*checkField = IPPseudoChecksum(&newAddr, &ipHdr->daddr,
tcpHdr ? IPPROTO_TCP : IPPROTO_UDP,
ntohs(ipHdr->tot_len) - ipHdr->ihl * 4);
@@ -1585,8 +1597,13 @@ OvsUpdateAddressAndPort(OvsForwardingContext *ovsFwdCtx,
((BOOLEAN)csumInfo.Receive.UdpChecksumSucceeded ||
(BOOLEAN)csumInfo.Receive.UdpChecksumFailed);
}
+ if (!isTx && l4Offload) {
+ if (*checkField == preNatPseudoChecksum) {
+ preservePseudoChecksum = TRUE;
+ }
+ }
- if (isTx && l4Offload) {
+ if (isTx && l4Offload || preservePseudoChecksum) {
*checkField = IPPseudoChecksum(&ipHdr->saddr, &newAddr,
tcpHdr ? IPPROTO_TCP : IPPROTO_UDP,
ntohs(ipHdr->tot_len) - ipHdr->ihl * 4);
@@ -1595,7 +1612,8 @@ OvsUpdateAddressAndPort(OvsForwardingContext *ovsFwdCtx,
if (*addrField != newAddr) {
UINT32 oldAddr = *addrField;
- if ((checkField && *checkField != 0) && (!l4Offload || !isTx)) {
+ if ((checkField && *checkField != 0) &&
+ (!l4Offload || (!isTx && !preservePseudoChecksum))) {
/* Recompute total checksum. */
*checkField = ChecksumUpdate32(*checkField, oldAddr,
newAddr);
@@ -1609,7 +1627,8 @@ OvsUpdateAddressAndPort(OvsForwardingContext *ovsFwdCtx,
}
if (portField && *portField != newPort) {
- if ((checkField) && (!l4Offload || !isTx)) {
+ if ((checkField) &&
+ (!l4Offload || (!isTx && !preservePseudoChecksum))) {
/* Recompute total checksum. */
*checkField = ChecksumUpdate16(*checkField, *portField,
newPort);
From 8b3c86897d6a114a099255997bb74f12a735d9fb Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Wed, 23 Nov 2022 22:23:37 +0100
Subject: [PATCH 046/833] learn: Fix parsing immediate value for a field match.
The value is right-justified after the string parsing with
parse_int_string(), i.e. it is in BE byte order and aligned
to the right side of the array.
For example, the 0x10011 value in a 4-byte field will look
like 0x00 0x01 0x00 0x11.
However, value copy to the resulted ofpact is performed
from the start of the memory. So, in case the destination
size is smaller than the original field size, incorrect
part of the value will be copied.
In the 0x00 0x01 0x00 0x11 example above, if the copy is
performed to a 3-byte field, the first 3 bytes will be
copied, which are 0x00 0x01 0x00 instead of 0x01 0x00 0x11.
This leads to a problem where NXM_NX_REG3[0..16]=0x10011
turns into NXM_NX_REG3[0..16]=0x100 after the parsing.
Fix that by offsetting the starting position to the size
difference in bytes similarly to how it is done in
learn_parse_load_immediate().
While at it, changing &imm to imm.b in function calls that
expect byte arrays as an argument. The old way is technically
correct, but more error prone.
The mf_write_subfield_value() call was also incorrect.
However, the 'match' variable is actually not used for
anything since checking removal in commit:
dd43a558597b ("Do not perform validation in learn_parse();")
So, just removing the call and the 'match' variable
entirely instead of fixing it.
Fixes: 21b2fa617126 ("ofp-parse: Allow match field names in actions and brackets in matches.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-discuss/2022-November/052100.html
Reported-by: Thomas Lee
Reviewed-by: Simon Horman
Signed-off-by: Ilya Maximets
---
lib/learn.c | 18 +++++++-----------
tests/learn.at | 4 ++--
2 files changed, 9 insertions(+), 13 deletions(-)
diff --git a/lib/learn.c b/lib/learn.c
index a40209ec0b8..a62add2fda0 100644
--- a/lib/learn.c
+++ b/lib/learn.c
@@ -241,7 +241,7 @@ static char * OVS_WARN_UNUSED_RESULT
learn_parse_spec(const char *orig, char *name, char *value,
const struct ofputil_port_map *port_map,
struct ofpact_learn_spec *spec,
- struct ofpbuf *ofpacts, struct match *match)
+ struct ofpbuf *ofpacts)
{
/* Parse destination and check prerequisites. */
struct mf_subfield dst;
@@ -275,14 +275,14 @@ learn_parse_spec(const char *orig, char *name, char *value,
} else {
char *tail;
/* Partial field value. */
- if (parse_int_string(value, (uint8_t *)&imm,
+ if (parse_int_string(value, imm.b,
dst.field->n_bytes, &tail)
|| *tail != 0) {
imm_error = xasprintf("%s: cannot parse integer value", orig);
}
if (!imm_error &&
- !bitwise_is_all_zeros(&imm, dst.field->n_bytes,
+ !bitwise_is_all_zeros(imm.b, dst.field->n_bytes,
dst.n_bits,
dst.field->n_bytes * 8 - dst.n_bits)) {
struct ds ds;
@@ -304,15 +304,13 @@ learn_parse_spec(const char *orig, char *name, char *value,
spec->src_type = NX_LEARN_SRC_IMMEDIATE;
- /* Update 'match' to allow for satisfying destination
- * prerequisites. */
- mf_write_subfield_value(&dst, &imm, match);
-
/* Push value last, as this may reallocate 'spec'! */
unsigned int imm_bytes = DIV_ROUND_UP(dst.n_bits, 8);
uint8_t *src_imm = ofpbuf_put_zeros(ofpacts,
OFPACT_ALIGN(imm_bytes));
- memcpy(src_imm, &imm, imm_bytes);
+
+ memcpy(src_imm, &imm.b[dst.field->n_bytes - imm_bytes],
+ imm_bytes);
free(error);
return NULL;
@@ -391,7 +389,6 @@ learn_parse__(char *orig, char *arg, const struct ofputil_port_map *port_map,
struct ofpbuf *ofpacts)
{
struct ofpact_learn *learn;
- struct match match;
char *name, *value;
learn = ofpact_put_LEARN(ofpacts);
@@ -400,7 +397,6 @@ learn_parse__(char *orig, char *arg, const struct ofputil_port_map *port_map,
learn->priority = OFP_DEFAULT_PRIORITY;
learn->table_id = 1;
- match_init_catchall(&match);
while (ofputil_parse_key_value(&arg, &name, &value)) {
if (!strcmp(name, "table")) {
if (!ofputil_table_from_string(value, table_map,
@@ -448,7 +444,7 @@ learn_parse__(char *orig, char *arg, const struct ofputil_port_map *port_map,
spec = ofpbuf_put_zeros(ofpacts, sizeof *spec);
error = learn_parse_spec(orig, name, value, port_map,
- spec, ofpacts, &match);
+ spec, ofpacts);
if (error) {
return error;
}
diff --git a/tests/learn.at b/tests/learn.at
index 5f1d6df9de4..d127fed3481 100644
--- a/tests/learn.at
+++ b/tests/learn.at
@@ -6,7 +6,7 @@ actions=learn()
actions=learn(send_flow_rem)
actions=learn(delete_learned)
actions=learn(send_flow_rem,delete_learned)
-actions=learn(NXM_OF_VLAN_TCI[0..11], NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], output:NXM_OF_IN_PORT[], load:10->NXM_NX_REG0[5..10])
+actions=learn(NXM_OF_VLAN_TCI[0..11], NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[], NXM_NX_REG3[3..19]=0x10011, output:NXM_OF_IN_PORT[], load:10->NXM_NX_REG0[5..10])
actions=learn(table=1,idle_timeout=10, hard_timeout=20, fin_idle_timeout=5, fin_hard_timeout=10, priority=10, cookie=0xfedcba9876543210, in_port=99,eth_dst=eth_src,load:in_port->reg1[16..31])
actions=learn(limit=4096)
actions=learn(limit=4096,result_dst=reg0[0])
@@ -18,7 +18,7 @@ OFPT_FLOW_MOD (xid=0x1): ADD actions=learn(table=1)
OFPT_FLOW_MOD (xid=0x2): ADD actions=learn(table=1,send_flow_rem)
OFPT_FLOW_MOD (xid=0x3): ADD actions=learn(table=1,delete_learned)
OFPT_FLOW_MOD (xid=0x4): ADD actions=learn(table=1,send_flow_rem,delete_learned)
-OFPT_FLOW_MOD (xid=0x5): ADD actions=learn(table=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],output:NXM_OF_IN_PORT[],load:0xa->NXM_NX_REG0[5..10])
+OFPT_FLOW_MOD (xid=0x5): ADD actions=learn(table=1,NXM_OF_VLAN_TCI[0..11],NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],NXM_NX_REG3[3..19]=0x10011,output:NXM_OF_IN_PORT[],load:0xa->NXM_NX_REG0[5..10])
OFPT_FLOW_MOD (xid=0x6): ADD actions=learn(table=1,idle_timeout=10,hard_timeout=20,fin_idle_timeout=5,fin_hard_timeout=10,priority=10,cookie=0xfedcba9876543210,in_port=99,NXM_OF_ETH_DST[]=NXM_OF_ETH_SRC[],load:NXM_OF_IN_PORT[]->NXM_NX_REG1[16..31])
OFPT_FLOW_MOD (xid=0x7): ADD actions=learn(table=1,limit=4096)
OFPT_FLOW_MOD (xid=0x8): ADD actions=learn(table=1,limit=4096,result_dst=NXM_NX_REG0[0])
From c6062d107716e6bf84f8106b0806ee73ba7207a3 Mon Sep 17 00:00:00 2001
From: David Marchand
Date: Wed, 9 Nov 2022 21:31:50 +0100
Subject: [PATCH 047/833] vswitchd: Publish per iface received multicast
packets.
The count of received multicast packets has been computed internally,
but not exposed to ovsdb. Fix this.
Signed-off-by: David Marchand
Acked-by: Mike Pattrick
Acked-by: Michael Santana
Signed-off-by: Ilya Maximets
---
vswitchd/bridge.c | 1 +
1 file changed, 1 insertion(+)
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index 25ce45e3dc1..d0667f229da 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -2619,6 +2619,7 @@ iface_refresh_stats(struct iface *iface)
IFACE_STAT(tx_512_to_1023_packets, "tx_512_to_1023_packets") \
IFACE_STAT(tx_1024_to_1522_packets, "tx_1024_to_1522_packets") \
IFACE_STAT(tx_1523_to_max_packets, "tx_1523_to_max_packets") \
+ IFACE_STAT(multicast, "rx_multicast_packets") \
IFACE_STAT(tx_multicast_packets, "tx_multicast_packets") \
IFACE_STAT(rx_broadcast_packets, "rx_broadcast_packets") \
IFACE_STAT(tx_broadcast_packets, "tx_broadcast_packets") \
From 2496d854326577a2d7ae94a86a085e8ae336302e Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Fri, 4 Nov 2022 15:25:42 +0100
Subject: [PATCH 048/833] rculist: Fix iteration macros.
Some macros for rculist have no users and there are no unit tests
specific to that library as well, so broken code wasn't spotted
while updating to multi-variable iterators.
Fixing multiple problems like missing commas, parenthesis, incorrect
variable and macro names.
Fixes: d293965d7b06 ("rculist: use multi-variable helpers for loop macros.")
Reported-by: Subrata Nath
Co-authored-by: Dumitru Ceara
Signed-off-by: Dumitru Ceara
Acked-by: Alin-Gabriel Serdean
Signed-off-by: Ilya Maximets
---
lib/rculist.h | 18 +++++++++---------
1 file changed, 9 insertions(+), 9 deletions(-)
diff --git a/lib/rculist.h b/lib/rculist.h
index c0d77acf943..9bb8cbf3eb2 100644
--- a/lib/rculist.h
+++ b/lib/rculist.h
@@ -380,18 +380,18 @@ rculist_is_singleton_protected(const struct rculist *list)
#define RCULIST_FOR_EACH_REVERSE_PROTECTED(ITER, MEMBER, RCULIST) \
for (INIT_MULTIVAR(ITER, MEMBER, (RCULIST)->prev, struct rculist); \
CONDITION_MULTIVAR(ITER, MEMBER, ITER_VAR(ITER) != (RCULIST)); \
- UPDATE_MULTIVAR(ITER, ITER_VAR(VAR).prev))
+ UPDATE_MULTIVAR(ITER, ITER_VAR(ITER)->prev))
#define RCULIST_FOR_EACH_REVERSE_PROTECTED_CONTINUE(ITER, MEMBER, RCULIST) \
for (INIT_MULTIVAR(ITER, MEMBER, (ITER)->MEMBER.prev, struct rculist); \
CONDITION_MULTIVAR(ITER, MEMBER, ITER_VAR(ITER) != (RCULIST)); \
- UPDATE_MULTIVAR(ITER, ITER_VAR(VAR).prev))
+ UPDATE_MULTIVAR(ITER, ITER_VAR(ITER)->prev))
#define RCULIST_FOR_EACH_PROTECTED(ITER, MEMBER, RCULIST) \
for (INIT_MULTIVAR(ITER, MEMBER, rculist_next_protected(RCULIST), \
struct rculist); \
CONDITION_MULTIVAR(ITER, MEMBER, ITER_VAR(ITER) != (RCULIST)); \
- UPDATE_MULTIVAR(ITER, rculist_next_protected(ITER_VAR(ITER))) \
+ UPDATE_MULTIVAR(ITER, rculist_next_protected(ITER_VAR(ITER)))) \
#define RCULIST_FOR_EACH_SAFE_SHORT_PROTECTED(ITER, MEMBER, RCULIST) \
for (INIT_MULTIVAR_SAFE_SHORT(ITER, MEMBER, \
@@ -399,18 +399,18 @@ rculist_is_singleton_protected(const struct rculist *list)
struct rculist); \
CONDITION_MULTIVAR_SAFE_SHORT(ITER, MEMBER, \
ITER_VAR(ITER) != (RCULIST), \
- ITER_NEXT_VAR(ITER) = rculist_next_protected(ITER_VAR(VAR))); \
- UPDATE_MULTIVAR_SHORT(ITER))
+ ITER_NEXT_VAR(ITER) = rculist_next_protected(ITER_VAR(ITER))); \
+ UPDATE_MULTIVAR_SAFE_SHORT(ITER))
#define RCULIST_FOR_EACH_SAFE_LONG_PROTECTED(ITER, NEXT, MEMBER, RCULIST) \
for (INIT_MULTIVAR_SAFE_LONG(ITER, NEXT, MEMBER, \
- rculist_next_protected(RCULIST) \
+ rculist_next_protected(RCULIST), \
struct rculist); \
- CONDITION_MULTIVAR_SAFE_LONG(VAR, NEXT, MEMBER \
+ CONDITION_MULTIVAR_SAFE_LONG(ITER, NEXT, MEMBER, \
ITER_VAR(ITER) != (RCULIST), \
- ITER_VAR(NEXT) = rculist_next_protected(ITER_VAR(VAR)), \
+ ITER_VAR(NEXT) = rculist_next_protected(ITER_VAR(ITER)), \
ITER_VAR(NEXT) != (RCULIST)); \
- UPDATE_MULTIVAR_LONG(ITER))
+ UPDATE_MULTIVAR_SAFE_LONG(ITER, NEXT))
#define RCULIST_FOR_EACH_SAFE_PROTECTED(...) \
OVERLOAD_SAFE_MACRO(RCULIST_FOR_EACH_SAFE_LONG_PROTECTED, \
From 5b06970e8eedd074dfa5a5405b8ada7435689fc8 Mon Sep 17 00:00:00 2001
From: Lin Huang
Date: Tue, 24 May 2022 21:04:32 +0800
Subject: [PATCH 049/833] ofp-msgs: Fix comment typo.
Fix comment typo.
Signed-off-by: Lin Huang
Acked-by: Adrian Moreno
Signed-off-by: Ilya Maximets
---
lib/ofp-msgs.c | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/ofp-msgs.c b/lib/ofp-msgs.c
index 93aa812978e..fdb89806480 100644
--- a/lib/ofp-msgs.c
+++ b/lib/ofp-msgs.c
@@ -148,7 +148,7 @@ struct raw_instance {
/* Information about a particular 'enum ofpraw'. */
struct raw_info {
/* All possible instantiations of this OFPRAW_* into OpenFlow headers. */
- struct raw_instance *instances; /* min_version - max_version + 1 elems. */
+ struct raw_instance *instances; /* max_version - min_version + 1 elems. */
uint8_t min_version;
uint8_t max_version;
From 22413fe8a83cc4e153fc35defc6f01f7dc5a21b5 Mon Sep 17 00:00:00 2001
From: yangchang
Date: Thu, 23 Jun 2022 18:32:06 +0800
Subject: [PATCH 050/833] lacp: Modify the comment misspelling.
Change 'negotations' to 'negotiations'.
Signed-off-by: yangchang
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
lib/lacp.h | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/lib/lacp.h b/lib/lacp.h
index 1ca06f762ba..5ba17c36a5c 100644
--- a/lib/lacp.h
+++ b/lib/lacp.h
@@ -24,7 +24,7 @@
/* LACP Protocol Implementation. */
enum lacp_status {
- LACP_NEGOTIATED, /* Successful LACP negotations. */
+ LACP_NEGOTIATED, /* Successful LACP negotiations. */
LACP_CONFIGURED, /* LACP is enabled but not negotiated. */
LACP_DISABLED /* LACP is not enabled. */
};
From 0937209fc7aca1107bb3f77cf1585799a086d065 Mon Sep 17 00:00:00 2001
From: David Marchand
Date: Thu, 25 Aug 2022 12:25:24 +0200
Subject: [PATCH 051/833] netdev-dpdk: Cleanup code when DPDK is disabled.
Remove one unused stub: netdev_dpdk_register() can't be called if DPDK
is disabled at build time.
Remove unneeded #ifdef in call to free_dpdk_buf.
Drop unneeded cast when calling free_dpdk_buf.
Acked-by: Sunil Pai G
Signed-off-by: David Marchand
Signed-off-by: Ilya Maximets
---
lib/dp-packet.c | 6 +-----
lib/dp-packet.h | 4 +---
lib/netdev-dpdk.h | 5 -----
3 files changed, 2 insertions(+), 13 deletions(-)
diff --git a/lib/dp-packet.c b/lib/dp-packet.c
index 4538d2a6148..61e405460a2 100644
--- a/lib/dp-packet.c
+++ b/lib/dp-packet.c
@@ -134,11 +134,7 @@ dp_packet_uninit(struct dp_packet *b)
if (b->source == DPBUF_MALLOC) {
free(dp_packet_base(b));
} else if (b->source == DPBUF_DPDK) {
-#ifdef DPDK_NETDEV
- /* If this dp_packet was allocated by DPDK it must have been
- * created as a dp_packet */
- free_dpdk_buf((struct dp_packet*) b);
-#endif
+ free_dpdk_buf(b);
} else if (b->source == DPBUF_AFXDP) {
free_afxdp_buf(b);
}
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index 55eeaab2ce8..a8ea5b40f71 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -247,9 +247,7 @@ dp_packet_delete(struct dp_packet *b)
{
if (b) {
if (b->source == DPBUF_DPDK) {
- /* If this dp_packet was allocated by DPDK it must have been
- * created as a dp_packet */
- free_dpdk_buf((struct dp_packet*) b);
+ free_dpdk_buf(b);
return;
}
diff --git a/lib/netdev-dpdk.h b/lib/netdev-dpdk.h
index 7d2f64af23e..5cd95d00f5a 100644
--- a/lib/netdev-dpdk.h
+++ b/lib/netdev-dpdk.h
@@ -150,11 +150,6 @@ netdev_dpdk_rte_flow_tunnel_item_release(
#else
-static inline void
-netdev_dpdk_register(const struct smap *ovs_other_config OVS_UNUSED)
-{
- /* Nothing */
-}
static inline void
free_dpdk_buf(struct dp_packet *buf OVS_UNUSED)
{
From 126e6046eb9592200bfca2218002b8256f92d617 Mon Sep 17 00:00:00 2001
From: David Marchand
Date: Thu, 25 Aug 2022 12:25:25 +0200
Subject: [PATCH 052/833] netdev-dpdk: Move DPDK netdev related configuration.
vhost related configuration and per port memory are netdev-dpdk
configuration items.
dpdk-stub.c and netdev-dpdk.c are never linked together, so we can move
those bits out of the generic dpdk code.
The dpdk_* accessors for those configuration items are then not needed
anymore and we can simply reference local variables.
Acked-by: Sunil Pai G
Signed-off-by: David Marchand
Signed-off-by: Ilya Maximets
---
lib/dpdk-stub.c | 24 -----------
lib/dpdk.c | 101 ---------------------------------------------
lib/dpdk.h | 4 --
lib/netdev-dpdk.c | 102 ++++++++++++++++++++++++++++++++++++++++++----
4 files changed, 94 insertions(+), 137 deletions(-)
diff --git a/lib/dpdk-stub.c b/lib/dpdk-stub.c
index 3eee1f485c0..58ebf6cb62c 100644
--- a/lib/dpdk-stub.c
+++ b/lib/dpdk-stub.c
@@ -49,30 +49,6 @@ dpdk_detach_thread(void)
{
}
-const char *
-dpdk_get_vhost_sock_dir(void)
-{
- return NULL;
-}
-
-bool
-dpdk_vhost_iommu_enabled(void)
-{
- return false;
-}
-
-bool
-dpdk_vhost_postcopy_enabled(void)
-{
- return false;
-}
-
-bool
-dpdk_per_port_memory(void)
-{
- return false;
-}
-
bool
dpdk_available(void)
{
diff --git a/lib/dpdk.c b/lib/dpdk.c
index d909974f91b..240babc03e6 100644
--- a/lib/dpdk.c
+++ b/lib/dpdk.c
@@ -19,7 +19,6 @@
#include
#include
-#include
#include
#include
@@ -47,40 +46,9 @@ VLOG_DEFINE_THIS_MODULE(dpdk);
static FILE *log_stream = NULL; /* Stream for DPDK log redirection */
-static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */
-static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU support */
-static bool vhost_postcopy_enabled = false; /* Status of vHost POSTCOPY
- * support. */
-static bool per_port_memory = false; /* Status of per port memory support */
-
/* Indicates successful initialization of DPDK. */
static atomic_bool dpdk_initialized = ATOMIC_VAR_INIT(false);
-static int
-process_vhost_flags(char *flag, const char *default_val, int size,
- const struct smap *ovs_other_config,
- char **new_val)
-{
- const char *val;
- int changed = 0;
-
- val = smap_get(ovs_other_config, flag);
-
- /* Process the vhost-sock-dir flag if it is provided, otherwise resort to
- * default value.
- */
- if (val && (strlen(val) <= size)) {
- changed = 1;
- *new_val = xstrdup(val);
- VLOG_INFO("User-provided %s in use: %s", flag, *new_val);
- } else {
- VLOG_INFO("No %s provided - defaulting to %s", flag, default_val);
- *new_val = xstrdup(default_val);
- }
-
- return changed;
-}
-
static bool
args_contains(const struct svec *args, const char *value)
{
@@ -345,11 +313,9 @@ malloc_dump_stats_wrapper(FILE *stream)
static bool
dpdk_init__(const struct smap *ovs_other_config)
{
- char *sock_dir_subcomponent;
char **argv = NULL;
int result;
bool auto_determine = true;
- int err = 0;
struct ovs_numa_dump *affinity = NULL;
struct svec args = SVEC_EMPTY_INITIALIZER;
@@ -361,49 +327,6 @@ dpdk_init__(const struct smap *ovs_other_config)
rte_openlog_stream(log_stream);
}
- if (process_vhost_flags("vhost-sock-dir", ovs_rundir(),
- NAME_MAX, ovs_other_config,
- &sock_dir_subcomponent)) {
- struct stat s;
- if (!strstr(sock_dir_subcomponent, "..")) {
- vhost_sock_dir = xasprintf("%s/%s", ovs_rundir(),
- sock_dir_subcomponent);
-
- err = stat(vhost_sock_dir, &s);
- if (err) {
- VLOG_ERR("vhost-user sock directory '%s' does not exist.",
- vhost_sock_dir);
- }
- } else {
- vhost_sock_dir = xstrdup(ovs_rundir());
- VLOG_ERR("vhost-user sock directory request '%s/%s' has invalid"
- "characters '..' - using %s instead.",
- ovs_rundir(), sock_dir_subcomponent, ovs_rundir());
- }
- free(sock_dir_subcomponent);
- } else {
- vhost_sock_dir = sock_dir_subcomponent;
- }
-
- vhost_iommu_enabled = smap_get_bool(ovs_other_config,
- "vhost-iommu-support", false);
- VLOG_INFO("IOMMU support for vhost-user-client %s.",
- vhost_iommu_enabled ? "enabled" : "disabled");
-
- vhost_postcopy_enabled = smap_get_bool(ovs_other_config,
- "vhost-postcopy-support", false);
- if (vhost_postcopy_enabled && memory_locked()) {
- VLOG_WARN("vhost-postcopy-support and mlockall are not compatible.");
- vhost_postcopy_enabled = false;
- }
- VLOG_INFO("POSTCOPY support for vhost-user-client %s.",
- vhost_postcopy_enabled ? "enabled" : "disabled");
-
- per_port_memory = smap_get_bool(ovs_other_config,
- "per-port-memory", false);
- VLOG_INFO("Per port memory for DPDK devices %s.",
- per_port_memory ? "enabled" : "disabled");
-
svec_add(&args, ovs_get_program_name());
construct_dpdk_args(ovs_other_config, &args);
@@ -558,30 +481,6 @@ dpdk_init(const struct smap *ovs_other_config)
atomic_store_relaxed(&dpdk_initialized, enabled);
}
-const char *
-dpdk_get_vhost_sock_dir(void)
-{
- return vhost_sock_dir;
-}
-
-bool
-dpdk_vhost_iommu_enabled(void)
-{
- return vhost_iommu_enabled;
-}
-
-bool
-dpdk_vhost_postcopy_enabled(void)
-{
- return vhost_postcopy_enabled;
-}
-
-bool
-dpdk_per_port_memory(void)
-{
- return per_port_memory;
-}
-
bool
dpdk_available(void)
{
diff --git a/lib/dpdk.h b/lib/dpdk.h
index 64ebca47d6d..1b790e682e4 100644
--- a/lib/dpdk.h
+++ b/lib/dpdk.h
@@ -38,10 +38,6 @@ struct ovsrec_open_vswitch;
void dpdk_init(const struct smap *ovs_other_config);
bool dpdk_attach_thread(unsigned cpu);
void dpdk_detach_thread(void);
-const char *dpdk_get_vhost_sock_dir(void);
-bool dpdk_vhost_iommu_enabled(void);
-bool dpdk_vhost_postcopy_enabled(void);
-bool dpdk_per_port_memory(void);
bool dpdk_available(void);
void print_dpdk_version(void);
void dpdk_status(const struct ovsrec_open_vswitch *);
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index e4b3465e09b..339936b6e29 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -24,6 +24,7 @@
#include
#include
#include
+#include
#include
#include
@@ -78,6 +79,12 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
COVERAGE_DEFINE(vhost_tx_contention);
COVERAGE_DEFINE(vhost_notification);
+static char *vhost_sock_dir = NULL; /* Location of vhost-user sockets */
+static bool vhost_iommu_enabled = false; /* Status of vHost IOMMU support */
+static bool vhost_postcopy_enabled = false; /* Status of vHost POSTCOPY
+ * support. */
+static bool per_port_memory = false; /* Status of per port memory support */
+
#define DPDK_PORT_WATCHDOG_INTERVAL 5
#define OVS_CACHE_LINE_SIZE CACHE_LINE_SIZE
@@ -915,7 +922,7 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
struct dpdk_mp *dmp;
int ret = 0;
- bool per_port_mp = dpdk_per_port_memory();
+ bool per_port_mp = per_port_memory;
/* With shared memory we do not need to configure a mempool if the MTU
* and socket ID have not changed, the previous configuration is still
@@ -1379,7 +1386,7 @@ netdev_dpdk_vhost_construct(struct netdev *netdev)
/* Take the name of the vhost-user port and append it to the location where
* the socket is to be created, then register the socket.
*/
- dev->vhost_id = xasprintf("%s/%s", dpdk_get_vhost_sock_dir(), name);
+ dev->vhost_id = xasprintf("%s/%s", vhost_sock_dir, name);
dev->vhost_driver_flags &= ~RTE_VHOST_USER_CLIENT;
@@ -5102,12 +5109,12 @@ netdev_dpdk_vhost_client_reconfigure(struct netdev *netdev)
vhost_flags |= RTE_VHOST_USER_LINEARBUF_SUPPORT;
/* Enable IOMMU support, if explicitly requested. */
- if (dpdk_vhost_iommu_enabled()) {
+ if (vhost_iommu_enabled) {
vhost_flags |= RTE_VHOST_USER_IOMMU_SUPPORT;
}
/* Enable POSTCOPY support, if explicitly requested. */
- if (dpdk_vhost_postcopy_enabled()) {
+ if (vhost_postcopy_enabled) {
vhost_flags |= RTE_VHOST_USER_POSTCOPY_SUPPORT;
}
@@ -5389,8 +5396,18 @@ netdev_dpdk_rte_flow_tunnel_item_release(struct netdev *netdev,
#endif /* ALLOW_EXPERIMENTAL_API */
static void
-parse_user_mempools_list(const char *mtus)
+parse_mempool_config(const struct smap *ovs_other_config)
+{
+ per_port_memory = smap_get_bool(ovs_other_config,
+ "per-port-memory", false);
+ VLOG_INFO("Per port memory for DPDK devices %s.",
+ per_port_memory ? "enabled" : "disabled");
+}
+
+static void
+parse_user_mempools_list(const struct smap *ovs_other_config)
{
+ const char *mtus = smap_get(ovs_other_config, "shared-mempool-config");
char *list, *copy, *key, *value;
int error = 0;
@@ -5438,6 +5455,75 @@ parse_user_mempools_list(const char *mtus)
free(copy);
}
+static int
+process_vhost_flags(char *flag, const char *default_val, int size,
+ const struct smap *ovs_other_config,
+ char **new_val)
+{
+ const char *val;
+ int changed = 0;
+
+ val = smap_get(ovs_other_config, flag);
+
+ /* Process the vhost-sock-dir flag if it is provided, otherwise resort to
+ * default value.
+ */
+ if (val && (strlen(val) <= size)) {
+ changed = 1;
+ *new_val = xstrdup(val);
+ VLOG_INFO("User-provided %s in use: %s", flag, *new_val);
+ } else {
+ VLOG_INFO("No %s provided - defaulting to %s", flag, default_val);
+ *new_val = xstrdup(default_val);
+ }
+
+ return changed;
+}
+
+static void
+parse_vhost_config(const struct smap *ovs_other_config)
+{
+ char *sock_dir_subcomponent;
+
+ if (process_vhost_flags("vhost-sock-dir", ovs_rundir(),
+ NAME_MAX, ovs_other_config,
+ &sock_dir_subcomponent)) {
+ struct stat s;
+
+ if (!strstr(sock_dir_subcomponent, "..")) {
+ vhost_sock_dir = xasprintf("%s/%s", ovs_rundir(),
+ sock_dir_subcomponent);
+
+ if (stat(vhost_sock_dir, &s)) {
+ VLOG_ERR("vhost-user sock directory '%s' does not exist.",
+ vhost_sock_dir);
+ }
+ } else {
+ vhost_sock_dir = xstrdup(ovs_rundir());
+ VLOG_ERR("vhost-user sock directory request '%s/%s' has invalid"
+ "characters '..' - using %s instead.",
+ ovs_rundir(), sock_dir_subcomponent, ovs_rundir());
+ }
+ free(sock_dir_subcomponent);
+ } else {
+ vhost_sock_dir = sock_dir_subcomponent;
+ }
+
+ vhost_iommu_enabled = smap_get_bool(ovs_other_config,
+ "vhost-iommu-support", false);
+ VLOG_INFO("IOMMU support for vhost-user-client %s.",
+ vhost_iommu_enabled ? "enabled" : "disabled");
+
+ vhost_postcopy_enabled = smap_get_bool(ovs_other_config,
+ "vhost-postcopy-support", false);
+ if (vhost_postcopy_enabled && memory_locked()) {
+ VLOG_WARN("vhost-postcopy-support and mlockall are not compatible.");
+ vhost_postcopy_enabled = false;
+ }
+ VLOG_INFO("POSTCOPY support for vhost-user-client %s.",
+ vhost_postcopy_enabled ? "enabled" : "disabled");
+}
+
#define NETDEV_DPDK_CLASS_COMMON \
.is_pmd = true, \
.alloc = netdev_dpdk_alloc, \
@@ -5523,10 +5609,10 @@ static const struct netdev_class dpdk_vhost_client_class = {
void
netdev_dpdk_register(const struct smap *ovs_other_config)
{
- const char *mempoolcfg = smap_get(ovs_other_config,
- "shared-mempool-config");
+ parse_mempool_config(ovs_other_config);
+ parse_user_mempools_list(ovs_other_config);
+ parse_vhost_config(ovs_other_config);
- parse_user_mempools_list(mempoolcfg);
netdev_register_provider(&dpdk_class);
netdev_register_provider(&dpdk_vhost_class);
netdev_register_provider(&dpdk_vhost_client_class);
From d240f72ad2adb9932b59b8e01f47a93f76c5c93c Mon Sep 17 00:00:00 2001
From: David Marchand
Date: Thu, 25 Aug 2022 12:25:26 +0200
Subject: [PATCH 053/833] netdev-dpdk: Cleanup mempool selection code.
Propagating per_port_memory value through a DPDK netdev creation gives
the false impression its value is somehow contextual to the creation.
On the contrary, this parameter value is set once and for all at
OVS initialization time.
Simplify the code and directly access the local boolean.
Acked-by: Sunil Pai G
Signed-off-by: David Marchand
Signed-off-by: Ilya Maximets
---
lib/netdev-dpdk.c | 21 ++++++++++-----------
1 file changed, 10 insertions(+), 11 deletions(-)
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 339936b6e29..72e7a32688f 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -694,11 +694,11 @@ dpdk_mp_sweep(void) OVS_REQUIRES(dpdk_mp_mutex)
* calculating.
*/
static uint32_t
-dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
+dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu)
{
uint32_t n_mbufs;
- if (!per_port_mp) {
+ if (!per_port_memory) {
/* Shared memory are being used.
* XXX: this is a really rough method of provisioning memory.
* It's impossible to determine what the exact memory requirements are
@@ -729,7 +729,7 @@ dpdk_calculate_mbufs(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
}
static struct dpdk_mp *
-dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
+dpdk_mp_create(struct netdev_dpdk *dev, int mtu)
{
char mp_name[RTE_MEMPOOL_NAMESIZE];
const char *netdev_name = netdev_get_name(&dev->up);
@@ -754,7 +754,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
/* Get the size of each mbuf, based on the MTU */
mbuf_size = MTU_TO_FRAME_LEN(mtu);
- n_mbufs = dpdk_calculate_mbufs(dev, mtu, per_port_mp);
+ n_mbufs = dpdk_calculate_mbufs(dev, mtu);
do {
/* Full DPDK memory pool name must be unique and cannot be
@@ -840,7 +840,7 @@ dpdk_mp_create(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
}
static struct dpdk_mp *
-dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
+dpdk_mp_get(struct netdev_dpdk *dev, int mtu)
{
struct dpdk_mp *dmp, *next;
bool reuse = false;
@@ -848,7 +848,7 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
ovs_mutex_lock(&dpdk_mp_mutex);
/* Check if shared memory is being used, if so check existing mempools
* to see if reuse is possible. */
- if (!per_port_mp) {
+ if (!per_port_memory) {
/* If user has provided defined mempools, check if one is suitable
* and get new buffer size.*/
mtu = dpdk_get_user_adjusted_mtu(mtu, dev->requested_mtu,
@@ -867,7 +867,7 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
dpdk_mp_sweep();
if (!reuse) {
- dmp = dpdk_mp_create(dev, mtu, per_port_mp);
+ dmp = dpdk_mp_create(dev, mtu);
if (dmp) {
/* Shared memory will hit the reuse case above so will not
* request a mempool that already exists but we need to check
@@ -877,7 +877,7 @@ dpdk_mp_get(struct netdev_dpdk *dev, int mtu, bool per_port_mp)
* dmp to point to the existing entry and increment the refcount
* to avoid being freed at a later stage.
*/
- if (per_port_mp && rte_errno == EEXIST) {
+ if (per_port_memory && rte_errno == EEXIST) {
LIST_FOR_EACH (next, list_node, &dpdk_mp_list) {
if (dmp->mp == next->mp) {
rte_free(dmp);
@@ -922,17 +922,16 @@ netdev_dpdk_mempool_configure(struct netdev_dpdk *dev)
uint32_t buf_size = dpdk_buf_size(dev->requested_mtu);
struct dpdk_mp *dmp;
int ret = 0;
- bool per_port_mp = per_port_memory;
/* With shared memory we do not need to configure a mempool if the MTU
* and socket ID have not changed, the previous configuration is still
* valid so return 0 */
- if (!per_port_mp && dev->mtu == dev->requested_mtu
+ if (!per_port_memory && dev->mtu == dev->requested_mtu
&& dev->socket_id == dev->requested_socket_id) {
return ret;
}
- dmp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size), per_port_mp);
+ dmp = dpdk_mp_get(dev, FRAME_LEN_TO_MTU(buf_size));
if (!dmp) {
VLOG_ERR("Failed to create memory pool for netdev "
"%s, with MTU %d on socket %d: %s\n",
From b22c4d84038c3eceab9486984e601b2f979ebe6d Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Tue, 25 Oct 2022 18:37:41 +0200
Subject: [PATCH 054/833] netdev: Assume default link speed to be 10 Gbps
instead of 100 Mbps.
100 Mbps was a fair assumption 13 years ago. Modern days 10 Gbps seems
like a good value in case no information is available otherwise.
The change mainly affects QoS which is currently limited to 100 Mbps if
the user didn't specify 'max-rate' and the card doesn't report the
speed or OVS doesn't have a predefined enumeration for the speed
reported by the NIC.
Calculation of the path cost for STP/RSTP is also affected if OVS is
unable to determine the link speed.
Lower link speed adapters are typically good at reporting their speed,
so chances for overshoot should be low. But newer high-speed adapters,
for which there is no speed enumeration or if there are some other
issues, will not suffer that much.
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
NEWS | 4 ++++
include/openvswitch/netdev.h | 2 ++
lib/netdev-linux.c | 4 ++--
lib/rstp.c | 2 +-
lib/rstp.h | 2 +-
lib/stp.c | 4 ++--
tests/stp.at | 14 +++++++-------
tests/test-rstp.c | 7 +++++--
vswitchd/bridge.c | 4 ++--
vswitchd/vswitch.xml | 2 +-
10 files changed, 27 insertions(+), 18 deletions(-)
diff --git a/NEWS b/NEWS
index ff77ee404f3..3ae6882d551 100644
--- a/NEWS
+++ b/NEWS
@@ -23,6 +23,10 @@ Post-v3.0.0
bug and CVE fixes addressed since its release.
If a user wishes to benefit from these fixes it is recommended to use
DPDK 21.11.2.
+ - For the QoS max-rate and STP/RSTP path-cost configuration OVS now assumes
+ 10 Gbps link speed by default in case the actual link speed cannot be
+ determined. Previously it was 10 Mbps. Values can still be overridden
+ by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
v3.0.0 - 15 Aug 2022
diff --git a/include/openvswitch/netdev.h b/include/openvswitch/netdev.h
index 0c10f7b487c..cf48f86915f 100644
--- a/include/openvswitch/netdev.h
+++ b/include/openvswitch/netdev.h
@@ -121,6 +121,8 @@ enum netdev_features {
NETDEV_F_PAUSE_ASYM = 1 << 15, /* Asymmetric pause. */
};
+#define NETDEV_DEFAULT_BPS UINT64_C(10 * 1000 * 1000 * 1000)
+
int netdev_get_features(const struct netdev *,
enum netdev_features *current,
enum netdev_features *advertised,
diff --git a/lib/netdev-linux.c b/lib/netdev-linux.c
index 59e8dc0ae6c..f6d7a1b9743 100644
--- a/lib/netdev-linux.c
+++ b/lib/netdev-linux.c
@@ -4710,7 +4710,7 @@ htb_parse_qdisc_details__(struct netdev *netdev_,
netdev_linux_read_features(netdev);
current = !netdev->get_features_error ? netdev->current : 0;
- hc->max_rate = netdev_features_to_bps(current, 100 * 1000 * 1000) / 8;
+ hc->max_rate = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 8;
}
hc->min_rate = hc->max_rate;
hc->burst = 0;
@@ -5182,7 +5182,7 @@ hfsc_parse_qdisc_details__(struct netdev *netdev_, const struct smap *details,
netdev_linux_read_features(netdev);
current = !netdev->get_features_error ? netdev->current : 0;
- max_rate = netdev_features_to_bps(current, 100 * 1000 * 1000) / 8;
+ max_rate = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 8;
}
class->min_rate = max_rate;
diff --git a/lib/rstp.c b/lib/rstp.c
index 7e351bf32ff..2f01966f796 100644
--- a/lib/rstp.c
+++ b/lib/rstp.c
@@ -784,7 +784,7 @@ rstp_convert_speed_to_cost(unsigned int speed)
: speed >= 100 ? 200000 /* 100 Mb/s. */
: speed >= 10 ? 2000000 /* 10 Mb/s. */
: speed >= 1 ? 20000000 /* 1 Mb/s. */
- : RSTP_DEFAULT_PORT_PATH_COST; /* 100 Mb/s. */
+ : RSTP_DEFAULT_PORT_PATH_COST; /* 10 Gb/s. */
return value;
}
diff --git a/lib/rstp.h b/lib/rstp.h
index 39a13b58c1f..13af2019516 100644
--- a/lib/rstp.h
+++ b/lib/rstp.h
@@ -84,7 +84,7 @@ struct dp_packet;
/* Port path cost [Table 17-3] */
#define RSTP_MIN_PORT_PATH_COST 1
#define RSTP_MAX_PORT_PATH_COST 200000000
-#define RSTP_DEFAULT_PORT_PATH_COST 200000
+#define RSTP_DEFAULT_PORT_PATH_COST 2000
/* RSTP Bridge identifier [9.2.5]. Top four most significant bits are a
* priority value. The next most significant twelve bits are a locally
diff --git a/lib/stp.c b/lib/stp.c
index a869b5f390c..f37337992a3 100644
--- a/lib/stp.c
+++ b/lib/stp.c
@@ -313,7 +313,7 @@ stp_create(const char *name, stp_identifier bridge_id,
for (p = stp->ports; p < &stp->ports[ARRAY_SIZE(stp->ports)]; p++) {
p->stp = stp;
p->port_id = (stp_port_no(p) + 1) | (STP_DEFAULT_PORT_PRIORITY << 8);
- p->path_cost = 19; /* Recommended default for 100 Mb/s link. */
+ p->path_cost = 2; /* Recommended default for 10 Gb/s link. */
stp_initialize_port(p, STP_DISABLED);
}
ovs_refcount_init(&stp->ref_cnt);
@@ -989,7 +989,7 @@ stp_convert_speed_to_cost(unsigned int speed)
: speed >= 16 ? 62 /* 16 Mb/s. */
: speed >= 10 ? 100 /* 10 Mb/s. */
: speed >= 4 ? 250 /* 4 Mb/s. */
- : 19; /* 100 Mb/s (guess). */
+ : 2; /* 10 Gb/s (guess). */
ovs_mutex_unlock(&mutex);
return ret;
}
diff --git a/tests/stp.at b/tests/stp.at
index 69475843e55..a6b6465d12a 100644
--- a/tests/stp.at
+++ b/tests/stp.at
@@ -620,10 +620,10 @@ ovs-appctl time/stop
ovs-appctl time/warp 31000 1000
AT_CHECK([ovs-appctl stp/show br0 | grep p1], [0], [dnl
- p1 designated forwarding 19 128.1
+ p1 designated forwarding 2 128.1
])
AT_CHECK([ovs-appctl stp/show br0 | grep p2], [0], [dnl
- p2 designated forwarding 19 128.2
+ p2 designated forwarding 2 128.2
])
# add a stp port
@@ -637,10 +637,10 @@ ovs-appctl netdev-dummy/set-admin-state p3 down
# We should not show the p3 because its link-state is down
AT_CHECK([ovs-appctl stp/show br0 | grep p1], [0], [dnl
- p1 designated forwarding 19 128.1
+ p1 designated forwarding 2 128.1
])
AT_CHECK([ovs-appctl stp/show br0 | grep p2], [0], [dnl
- p2 designated forwarding 19 128.2
+ p2 designated forwarding 2 128.2
])
AT_CHECK([ovs-appctl stp/show br0 | grep p3], [1], [dnl
])
@@ -648,13 +648,13 @@ AT_CHECK([ovs-appctl stp/show br0 | grep p3], [1], [dnl
ovs-appctl netdev-dummy/set-admin-state p3 up
AT_CHECK([ovs-appctl stp/show br0 | grep p1], [0], [dnl
- p1 designated forwarding 19 128.1
+ p1 designated forwarding 2 128.1
])
AT_CHECK([ovs-appctl stp/show br0 | grep p2], [0], [dnl
- p2 designated forwarding 19 128.2
+ p2 designated forwarding 2 128.2
])
AT_CHECK([ovs-appctl stp/show br0 | grep p3], [0], [dnl
- p3 designated listening 19 128.3
+ p3 designated listening 2 128.3
])
diff --git a/tests/test-rstp.c b/tests/test-rstp.c
index 01aeaf84783..9c1026ec1a8 100644
--- a/tests/test-rstp.c
+++ b/tests/test-rstp.c
@@ -107,6 +107,8 @@ send_bpdu(struct dp_packet *pkt, void *port_, void *b_)
dp_packet_delete(pkt);
}
+#define RSTP_PORT_PATH_COST_100M 200000
+
static struct bridge *
new_bridge(struct test_case *tc, int id)
{
@@ -122,6 +124,7 @@ new_bridge(struct test_case *tc, int id)
for (i = 1; i < MAX_PORTS; i++) {
p = rstp_add_port(b->rstp);
rstp_port_set_aux(p, p);
+ rstp_port_set_path_cost(p, RSTP_PORT_PATH_COST_100M);
rstp_port_set_state(p, RSTP_DISABLED);
rstp_port_set_mac_operational(p, true);
}
@@ -544,8 +547,8 @@ test_rstp_main(int argc, char *argv[])
}
get_token();
- path_cost = match(":") ? must_get_int() :
- RSTP_DEFAULT_PORT_PATH_COST;
+ path_cost = match(":") ? must_get_int()
+ : RSTP_PORT_PATH_COST_100M;
if (port_no < bridge->n_ports) {
/* Enable port. */
reinitialize_port(p);
diff --git a/vswitchd/bridge.c b/vswitchd/bridge.c
index d0667f229da..bfb2adef1dd 100644
--- a/vswitchd/bridge.c
+++ b/vswitchd/bridge.c
@@ -1678,7 +1678,7 @@ port_configure_stp(const struct ofproto *ofproto, struct port *port,
unsigned int mbps;
netdev_get_features(iface->netdev, ¤t, NULL, NULL, NULL);
- mbps = netdev_features_to_bps(current, 100 * 1000 * 1000) / 1000000;
+ mbps = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 1000000;
port_s->path_cost = stp_convert_speed_to_cost(mbps);
}
@@ -1761,7 +1761,7 @@ port_configure_rstp(const struct ofproto *ofproto, struct port *port,
unsigned int mbps;
netdev_get_features(iface->netdev, ¤t, NULL, NULL, NULL);
- mbps = netdev_features_to_bps(current, 100 * 1000 * 1000) / 1000000;
+ mbps = netdev_features_to_bps(current, NETDEV_DEFAULT_BPS) / 1000000;
port_s->path_cost = rstp_convert_speed_to_cost(mbps);
}
diff --git a/vswitchd/vswitch.xml b/vswitchd/vswitch.xml
index 928821a8239..f9bdb2d92be 100644
--- a/vswitchd/vswitch.xml
+++ b/vswitchd/vswitch.xml
@@ -4776,7 +4776,7 @@ ovs-vsctl add-port br0 p0 -- set Interface p0 type=patch options:peer=p1 \
Maximum rate shared by all queued traffic, in bit/s. Optional. If not
specified, for physical interfaces, the default is the link rate. For
other interfaces or if the link rate cannot be determined, the default
- is currently 100 Mbps.
+ is currently 10 Gbps.
From 59e8cb8a053d50f49629be8b6fd614562d066404 Mon Sep 17 00:00:00 2001
From: Timothy Redaelli
Date: Mon, 14 Nov 2022 20:41:53 +0100
Subject: [PATCH 055/833] rhel: Move conf.db to /var/lib/openvswitch, using
symlinks.
conf.db is by default at /etc/openvswitch, but it should be at
/var/lib/openvswitch like on Debian or like ovnnb_db.db and ovnsb_db.db.
If conf.db already exists in /etc/openvswitch then it's moved to
/var/lib/openvswitch.
Symlinks are created for conf.db and .conf.db.~lock~ into /etc/openvswitch
for backward compatibility.
Reported-at: https://bugzilla.redhat.com/1830857
Reported-by: Yedidyah Bar David
Signed-off-by: Timothy Redaelli
Signed-off-by: Ilya Maximets
---
rhel/openvswitch-fedora.spec.in | 27 +++++++++++++++++++++++----
1 file changed, 23 insertions(+), 4 deletions(-)
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 67268cb7833..c21592e47cb 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -238,8 +238,6 @@ rm -rf $RPM_BUILD_ROOT/%{_datadir}/openvswitch/python/
install -d -m 0755 $RPM_BUILD_ROOT/%{_sharedstatedir}/openvswitch
-touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/conf.db
-touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/.conf.db.~lock~
touch $RPM_BUILD_ROOT%{_sysconfdir}/openvswitch/system-id.conf
install -p -m 644 -D selinux/openvswitch-custom.pp \
@@ -328,6 +326,27 @@ if [ $1 -eq 1 ]; then
fi
%endif
+# Ensure that /etc/openvswitch/conf.db links to /var/lib/openvswitch,
+# moving an existing file if there is one.
+#
+# Ditto for .conf.db.~lock~.
+for base in conf.db .conf.db.~lock~; do
+ new=/var/lib/openvswitch/$base
+ old=/etc/openvswitch/$base
+ if test -f $old && test ! -e $new; then
+ mv $old $new
+ fi
+ if test ! -e $old && test ! -h $old; then
+ ln -s $new $old
+ fi
+ touch $new
+%if %{with dpdk}
+ chown openvswitch:hugetlbfs $new
+%else
+ chown openvswitch:openvswitch $new
+%endif
+done
+
%if 0%{?systemd_post:1}
# This may not enable openvswitch service or do daemon-reload.
%systemd_post %{name}.service
@@ -413,8 +432,8 @@ fi
%endif
%dir %{_sysconfdir}/openvswitch
%{_sysconfdir}/openvswitch/default.conf
-%config %ghost %{_sysconfdir}/openvswitch/conf.db
-%ghost %{_sysconfdir}/openvswitch/.conf.db.~lock~
+%config %ghost %{_sharedstatedir}/openvswitch/conf.db
+%ghost %{_sharedstatedir}/openvswitch/.conf.db.~lock~
%config %ghost %{_sysconfdir}/openvswitch/system-id.conf
%config(noreplace) %{_sysconfdir}/sysconfig/openvswitch
%defattr(-,root,root)
From cd475f976512bd1ce3abaf325c835780c37d6386 Mon Sep 17 00:00:00 2001
From: Timothy Redaelli
Date: Wed, 12 May 2021 19:44:33 +0200
Subject: [PATCH 056/833] ovs-dpctl-top: Fix ovs-dpctl-top via pipe.
Currently it's not possible to use ovs-dpctl-top via pipe (eg:
ovs-dpctl dump-flows | ovs-dpctl-top --script --verbose) since Python3
doesn't allow to open a file (stdin in our case) in binary mode without
buffering enabled.
This commit changes the behaviour in order to directly pass stdin to
flows_read instead of re-opening it without buffering.
Signed-off-by: Timothy Redaelli
Signed-off-by: Ilya Maximets
---
utilities/ovs-dpctl-top.in | 6 +-----
1 file changed, 1 insertion(+), 5 deletions(-)
diff --git a/utilities/ovs-dpctl-top.in b/utilities/ovs-dpctl-top.in
index fbe6e4f560a..2c1766eff5e 100755
--- a/utilities/ovs-dpctl-top.in
+++ b/utilities/ovs-dpctl-top.in
@@ -1236,11 +1236,7 @@ def flows_script(args):
if (args.flowFiles is None):
logging.info("reading flows from stdin")
- ihdl = os.fdopen(sys.stdin.fileno(), 'r', 0)
- try:
- flow_db = flows_read(ihdl, flow_db)
- finally:
- ihdl.close()
+ flow_db = flows_read(sys.stdin, flow_db)
else:
for flowFile in args.flowFiles:
logging.info("reading flows from %s", flowFile)
From 954ae38a12f0c0d7bab1334c9ba353da94de887c Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 24 Nov 2022 15:15:15 +0100
Subject: [PATCH 057/833] odp-util: Fix reporting unknown keys as keys with bad
length.
check_attr_len() currently reports all unknown keys as keys with bad
length. For example, IPv6 extension headers are printed out like this
in flow dumps:
eth_type(0x86dd),ipv6(...)
(bad key length 2, expected -1)(00 00/(bad mask length 2, expected -1)(00 00),
icmpv6(type=0/0,code=0/0)
However, since the key is unknown, the length check on it makes no
sense and should be ignored. This will allow the unknown key to be
caught later by the format_unknown_key() function and printed in a
more user-friendly way:
eth_type(0x86dd),ipv6(...),key32(00 00/00 00),icmpv6(type=0/0,code=0/0)
'32' here is the actual index of the key attribute, so we know
that it is unknown attribute #32 with the value/mask pair printed
out inside the parenthesis.
Acked-by: Aaron Conole
Signed-off-by: Ilya Maximets
---
lib/odp-util.c | 11 +++++++++--
1 file changed, 9 insertions(+), 2 deletions(-)
diff --git a/lib/odp-util.c b/lib/odp-util.c
index 72e076e1c5b..5fc312f8c00 100644
--- a/lib/odp-util.c
+++ b/lib/odp-util.c
@@ -3594,9 +3594,16 @@ static bool
check_attr_len(struct ds *ds, const struct nlattr *a, const struct nlattr *ma,
const struct attr_len_tbl tbl[], int max_type, bool need_key)
{
+ uint16_t type = nl_attr_type(a);
int expected_len;
- expected_len = odp_key_attr_len(tbl, max_type, nl_attr_type(a));
+ if (type > max_type) {
+ /* Unknown attribute, can't check the length. */
+ return true;
+ }
+
+ expected_len = odp_key_attr_len(tbl, max_type, type);
+
if (expected_len != ATTR_LEN_VARIABLE &&
expected_len != ATTR_LEN_NESTED) {
@@ -3605,7 +3612,7 @@ check_attr_len(struct ds *ds, const struct nlattr *a, const struct nlattr *ma,
if (bad_key_len || bad_mask_len) {
if (need_key) {
- ds_put_format(ds, "key%u", nl_attr_type(a));
+ ds_put_format(ds, "key%u", type);
}
if (bad_key_len) {
ds_put_format(ds, "(bad key length %"PRIuSIZE", expected %d)(",
From 55b9507e6824b935ffa0205fc7c7bebfe4e54279 Mon Sep 17 00:00:00 2001
From: Numan Siddique
Date: Sun, 27 Nov 2022 22:56:13 -0500
Subject: [PATCH 058/833] ovsdb-idl: Add the support to specify the uuid for
row insert.
ovsdb-server allows the OVSDB clients to specify the uuid for
the row inserts [1]. Both the C IDL client library and Python
IDL are missing this feature. This patch adds this support.
In C IDL, for each schema table, a new function is generated -
insert_persistent_uuid(txn, uuid) which can
be used the clients to persist the uuid.
ovs-vsctl and other derivatives of ctl now supports the same
in the generic 'create' command with the option "--id=".
In Python IDL, the uuid to persist can be specified in
the Transaction.insert() function.
[1] - a529e3cd1f("ovsdb-server: Allow OVSDB clients to specify the UUID for inserted rows.:)
Acked-by: Adrian Moreno
Acked-by: Han Zhou
Acked-by: Terry Wilson
Signed-off-by: Numan Siddique
Signed-off-by: Ilya Maximets
---
NEWS | 3 ++
lib/db-ctl-base.c | 38 ++++++++++++------
lib/db-ctl-base.man | 5 ++-
lib/db-ctl-base.xml | 6 ++-
lib/ovsdb-idl-provider.h | 1 +
lib/ovsdb-idl.c | 85 +++++++++++++++++++++++++++++-----------
lib/ovsdb-idl.h | 3 ++
ovsdb/ovsdb-idlc.in | 15 +++++++
python/ovs/db/idl.py | 26 ++++++++----
tests/ovs-vsctl.at | 25 ++++++++++++
tests/ovsdb-idl.at | 58 +++++++++++++++++++++++++++
tests/test-ovsdb.c | 28 +++++++++++--
tests/test-ovsdb.py | 20 +++++++++-
13 files changed, 263 insertions(+), 50 deletions(-)
diff --git a/NEWS b/NEWS
index 3ae6882d551..f6caf1ca7f0 100644
--- a/NEWS
+++ b/NEWS
@@ -3,6 +3,9 @@ Post-v3.0.0
- ovs-appctl:
* "ovs-appctl ofproto/trace" command can now display port names with the
"--names" option.
+ - OVSDB-IDL:
+ * Add the support to specify the persistent uuid for row insert in both
+ C and Python IDLs.
- Windows:
* Conntrack IPv6 fragment support.
- DPDK:
diff --git a/lib/db-ctl-base.c b/lib/db-ctl-base.c
index bc85e992173..856832a04d2 100644
--- a/lib/db-ctl-base.c
+++ b/lib/db-ctl-base.c
@@ -1731,29 +1731,43 @@ cmd_create(struct ctl_context *ctx)
const struct ovsdb_idl_table_class *table;
const struct ovsdb_idl_row *row;
const struct uuid *uuid = NULL;
+ bool persist_uuid = false;
+ struct uuid uuid_;
int i;
ctx->error = get_table(table_name, &table);
if (ctx->error) {
return;
}
+
if (id) {
- struct ovsdb_symbol *symbol = NULL;
+ if (uuid_from_string(&uuid_, id)) {
+ uuid = &uuid_;
+ persist_uuid = true;
+ } else {
+ struct ovsdb_symbol *symbol = NULL;
- ctx->error = create_symbol(ctx->symtab, id, &symbol, NULL);
- if (ctx->error) {
- return;
- }
- if (table->is_root) {
- /* This table is in the root set, meaning that rows created in it
- * won't disappear even if they are unreferenced, so disable
- * warnings about that by pretending that there is a reference. */
- symbol->strong_ref = true;
+ ctx->error = create_symbol(ctx->symtab, id, &symbol, NULL);
+ if (ctx->error) {
+ return;
+ }
+ if (table->is_root) {
+ /* This table is in the root set, meaning that rows created in
+ * it won't disappear even if they are unreferenced, so disable
+ * warnings about that by pretending that there is a
+ * reference. */
+ symbol->strong_ref = true;
+ }
+ uuid = &symbol->uuid;
}
- uuid = &symbol->uuid;
}
- row = ovsdb_idl_txn_insert(ctx->txn, table, uuid);
+ if (persist_uuid) {
+ row = ovsdb_idl_txn_insert_persist_uuid(ctx->txn, table, uuid);
+ } else {
+ row = ovsdb_idl_txn_insert(ctx->txn, table, uuid);
+ }
+
for (i = 2; i < ctx->argc; i++) {
ctx->error = set_column(table, row, ctx->argv[i], ctx->symtab);
if (ctx->error) {
diff --git a/lib/db-ctl-base.man b/lib/db-ctl-base.man
index a529d8b4d3f..c8111c9efbe 100644
--- a/lib/db-ctl-base.man
+++ b/lib/db-ctl-base.man
@@ -203,7 +203,7 @@ Without \fB\-\-if-exists\fR, it is an error if \fIrecord\fR does not
exist. With \fB\-\-if-exists\fR, this command does nothing if
\fIrecord\fR does not exist.
.
-.IP "[\fB\-\-id=@\fIname\fR] \fBcreate\fR \fItable column\fR[\fB:\fIkey\fR]\fB=\fIvalue\fR..."
+.IP "[\fB\-\-id=(@\fIname\fR | \fIuuid\fR] \fBcreate\fR \fItable column\fR[\fB:\fIkey\fR]\fB=\fIvalue\fR..."
Creates a new record in \fItable\fR and sets the initial values of
each \fIcolumn\fR. Columns not explicitly set will receive their
default values. Outputs the UUID of the new row.
@@ -212,6 +212,9 @@ If \fB@\fIname\fR is specified, then the UUID for the new row may be
referred to by that name elsewhere in the same \fB\*(PN\fR
invocation in contexts where a UUID is expected. Such references may
precede or follow the \fBcreate\fR command.
+.IP
+If a valid \fIuuid\fR is specified, then it is used as the UUID
+of the new row.
.
.RS
.IP "Caution (ovs-vsctl as example)"
diff --git a/lib/db-ctl-base.xml b/lib/db-ctl-base.xml
index f6efe98eaf0..27c999fe71f 100644
--- a/lib/db-ctl-base.xml
+++ b/lib/db-ctl-base.xml
@@ -310,7 +310,7 @@
- [--id=@
name] create
table column[:
key]=
value...
+ [--id=(@
name|uuid)] create
table column[:
key]=
value...
Creates a new record in table and sets the initial values of
@@ -323,6 +323,10 @@
invocation in contexts where a UUID is expected. Such references may
precede or follow the create
command.
+
+ If a valid uuid is specified, then it is used as the
+ UUID of the new row.
+
- Caution (ovs-vsctl as example)
-
diff --git a/lib/ovsdb-idl-provider.h b/lib/ovsdb-idl-provider.h
index 8797686f900..8d2b7d6b914 100644
--- a/lib/ovsdb-idl-provider.h
+++ b/lib/ovsdb-idl-provider.h
@@ -74,6 +74,7 @@ struct ovsdb_idl_row {
struct ovs_list dst_arcs; /* Backward arcs (ovsdb_idl_arc.dst_node). */
struct ovsdb_idl_table *table; /* Containing table. */
struct ovsdb_datum *old_datum; /* Committed data (null if orphaned). */
+ bool persist_uuid; /* Persist 'uuid' during insert txn if set. */
bool parsed; /* Whether the row is parsed. */
struct ovs_list reparse_node; /* Rows that needs to be re-parsed due to
* insertion of a referenced row. */
diff --git a/lib/ovsdb-idl.c b/lib/ovsdb-idl.c
index 99b58422eca..dbdfe45d87e 100644
--- a/lib/ovsdb-idl.c
+++ b/lib/ovsdb-idl.c
@@ -2855,11 +2855,14 @@ substitute_uuids(struct json *json, const struct ovsdb_idl_txn *txn)
row = ovsdb_idl_txn_get_row(txn, &uuid);
if (row && !row->old_datum && row->new_datum) {
- json_destroy(json);
-
- return json_array_create_2(
- json_string_create("named-uuid"),
- json_string_create_nocopy(ovsdb_data_row_name(&uuid)));
+ if (row->persist_uuid) {
+ return json;
+ } else {
+ json_destroy(json);
+ return json_array_create_2(
+ json_string_create("named-uuid"),
+ json_string_create_nocopy(ovsdb_data_row_name(&uuid)));
+ }
}
}
@@ -3284,9 +3287,19 @@ ovsdb_idl_txn_commit(struct ovsdb_idl_txn *txn)
any_updates = true;
- json_object_put(op, "uuid-name",
- json_string_create_nocopy(
- ovsdb_data_row_name(&row->uuid)));
+ char *uuid_json;
+ struct json *value;
+ if (row->persist_uuid) {
+ uuid_json = "uuid";
+ value = json_string_create_nocopy(
+ xasprintf(UUID_FMT, UUID_ARGS(&row->uuid)));
+ } else {
+ uuid_json = "uuid-name";
+ value = json_string_create_nocopy(
+ ovsdb_data_row_name(&row->uuid));
+ }
+
+ json_object_put(op, uuid_json, value);
insert = xmalloc(sizeof *insert);
insert->dummy = row->uuid;
@@ -3770,6 +3783,31 @@ ovsdb_idl_txn_delete(const struct ovsdb_idl_row *row_)
row->new_datum = NULL;
}
+static const struct ovsdb_idl_row *
+ovsdb_idl_txn_insert__(struct ovsdb_idl_txn *txn,
+ const struct ovsdb_idl_table_class *class,
+ const struct uuid *uuid,
+ bool persist_uuid)
+{
+ struct ovsdb_idl_row *row = ovsdb_idl_row_create__(class);
+
+ ovs_assert(uuid || !persist_uuid);
+ if (uuid) {
+ ovs_assert(!ovsdb_idl_txn_get_row(txn, uuid));
+ row->uuid = *uuid;
+ } else {
+ uuid_generate(&row->uuid);
+ }
+ row->persist_uuid = persist_uuid;
+ row->table = ovsdb_idl_table_from_class(txn->idl, class);
+ row->new_datum = xmalloc(class->n_columns * sizeof *row->new_datum);
+ hmap_insert(&row->table->rows, &row->hmap_node, uuid_hash(&row->uuid));
+ hmap_insert(&txn->txn_rows, &row->txn_node, uuid_hash(&row->uuid));
+ ovsdb_idl_add_to_indexes(row);
+
+ return row;
+}
+
/* Inserts and returns a new row in the table with the specified 'class' in the
* database with open transaction 'txn'.
*
@@ -3787,22 +3825,23 @@ ovsdb_idl_txn_insert(struct ovsdb_idl_txn *txn,
const struct ovsdb_idl_table_class *class,
const struct uuid *uuid)
{
- struct ovsdb_idl_row *row = ovsdb_idl_row_create__(class);
-
- if (uuid) {
- ovs_assert(!ovsdb_idl_txn_get_row(txn, uuid));
- row->uuid = *uuid;
- } else {
- uuid_generate(&row->uuid);
- }
-
- row->table = ovsdb_idl_table_from_class(txn->idl, class);
- row->new_datum = xmalloc(class->n_columns * sizeof *row->new_datum);
- hmap_insert(&row->table->rows, &row->hmap_node, uuid_hash(&row->uuid));
- hmap_insert(&txn->txn_rows, &row->txn_node, uuid_hash(&row->uuid));
- ovsdb_idl_add_to_indexes(row);
+ return ovsdb_idl_txn_insert__(txn, class, uuid, false);
+}
- return row;
+/* Inserts and returns a new row in the table with the specified 'class' in the
+ * database with open transaction 'txn'.
+ *
+ * The new row is assigned the specified UUID (which cannot be null).
+ *
+ * Usually this function is used indirectly through one of the
+ * "insert_persist_uuid" functions generated by ovsdb-idlc. */
+const struct ovsdb_idl_row *
+ovsdb_idl_txn_insert_persist_uuid(struct ovsdb_idl_txn *txn,
+ const struct ovsdb_idl_table_class *class,
+ const struct uuid *uuid)
+{
+ ovs_assert(uuid);
+ return ovsdb_idl_txn_insert__(txn, class, uuid, true);
}
static void
diff --git a/lib/ovsdb-idl.h b/lib/ovsdb-idl.h
index fbd9f671a20..9a3e19f2055 100644
--- a/lib/ovsdb-idl.h
+++ b/lib/ovsdb-idl.h
@@ -375,6 +375,9 @@ void ovsdb_idl_txn_delete(const struct ovsdb_idl_row *);
const struct ovsdb_idl_row *ovsdb_idl_txn_insert(
struct ovsdb_idl_txn *, const struct ovsdb_idl_table_class *,
const struct uuid *);
+const struct ovsdb_idl_row *ovsdb_idl_txn_insert_persist_uuid(
+ struct ovsdb_idl_txn *txn, const struct ovsdb_idl_table_class *class,
+ const struct uuid *uuid);
struct ovsdb_idl *ovsdb_idl_txn_get_idl (struct ovsdb_idl_txn *);
void ovsdb_idl_get_initial_snapshot(struct ovsdb_idl *);
diff --git a/ovsdb/ovsdb-idlc.in b/ovsdb/ovsdb-idlc.in
index 5a97a8ea3e1..9a54f06a191 100755
--- a/ovsdb/ovsdb-idlc.in
+++ b/ovsdb/ovsdb-idlc.in
@@ -362,6 +362,8 @@ struct %(s)s *%(s)s_cursor_data(struct ovsdb_idl_cursor *);
void %(s)s_init(struct %(s)s *);
void %(s)s_delete(const struct %(s)s *);
struct %(s)s *%(s)s_insert(struct ovsdb_idl_txn *);
+struct %(s)s *%(s)s_insert_persist_uuid(
+ struct ovsdb_idl_txn *txn, const struct uuid *uuid);
/* Returns true if the tracked column referenced by 'enum %(s)s_column_id' of
* the row referenced by 'struct %(s)s *' was updated since the last change
@@ -809,6 +811,19 @@ struct %(s)s *
return %(s)s_cast(ovsdb_idl_txn_insert(txn, &%(p)stable_%(tl)s, NULL));
}
+/* Inserts and returns a new row in the table "%(t)s" in the database
+ * with open transaction 'txn'.
+ *
+ * The new row is assigned the UUID specified in the 'uuid' parameter
+ * (which cannot be null). ovsdb-server will try to assign the same
+ * UUID when 'txn' is committed. */
+struct %(s)s *
+%(s)s_insert_persist_uuid(struct ovsdb_idl_txn *txn, const struct uuid *uuid)
+{
+ return %(s)s_cast(ovsdb_idl_txn_insert_persist_uuid(
+ txn, &%(p)stable_%(tl)s, uuid));
+}
+
bool
%(s)s_is_updated(const struct %(s)s *row, enum %(s)s_column_id column)
{
diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index 8e31e02d791..fe66402cff4 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -1223,7 +1223,7 @@ class Row(object):
d["a"] = "b"
row.mycolumn = d
"""
- def __init__(self, idl, table, uuid, data):
+ def __init__(self, idl, table, uuid, data, persist_uuid=False):
# All of the explicit references to self.__dict__ below are required
# to set real attributes with invoking self.__getattr__().
self.__dict__["uuid"] = uuid
@@ -1278,6 +1278,10 @@ def __init__(self, idl, table, uuid, data):
# in the dictionary are all None.
self.__dict__["_prereqs"] = {}
+ # Indicates if the specified 'uuid' should be used as the row uuid
+ # or let the server generate it.
+ self.__dict__["_persist_uuid"] = persist_uuid
+
def __lt__(self, other):
if not isinstance(other, Row):
return NotImplemented
@@ -1816,7 +1820,11 @@ def commit(self):
op = {"table": row._table.name}
if row._data is None:
op["op"] = "insert"
- op["uuid-name"] = _uuid_name_from_uuid(row.uuid)
+ if row._persist_uuid:
+ op["uuid"] = row.uuid
+ else:
+ op["uuid-name"] = _uuid_name_from_uuid(row.uuid)
+
any_updates = True
op_index = len(operations) - 1
@@ -2056,20 +2064,22 @@ def _write(self, row, column, datum):
row._mutations['_removes'].pop(column.name, None)
row._changes[column.name] = datum.copy()
- def insert(self, table, new_uuid=None):
+ def insert(self, table, new_uuid=None, persist_uuid=False):
"""Inserts and returns a new row in 'table', which must be one of the
ovs.db.schema.TableSchema objects in the Idl's 'tables' dict.
The new row is assigned a provisional UUID. If 'uuid' is None then one
is randomly generated; otherwise 'uuid' should specify a randomly
- generated uuid.UUID not otherwise in use. ovsdb-server will assign a
- different UUID when 'txn' is committed, but the IDL will replace any
- uses of the provisional UUID in the data to be to be committed by the
- UUID assigned by ovsdb-server."""
+ generated uuid.UUID not otherwise in use. If 'persist_uuid' is true
+ and 'new_uuid' is specified, IDL requests the ovsdb-server to assign
+ the same UUID, otherwise ovsdb-server will assign a different UUID when
+ 'txn' is committed and the IDL will replace any uses of the provisional
+ UUID in the data to be committed by the UUID assigned by
+ ovsdb-server."""
assert self._status == Transaction.UNCOMMITTED
if new_uuid is None:
new_uuid = uuid.uuid4()
- row = Row(self.idl, table, new_uuid, None)
+ row = Row(self.idl, table, new_uuid, None, persist_uuid=persist_uuid)
table.rows[row.uuid] = row
self._txn_rows[row.uuid] = row
return row
diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
index d6cd2c0849a..abf4fb9cf4e 100644
--- a/tests/ovs-vsctl.at
+++ b/tests/ovs-vsctl.at
@@ -1710,3 +1710,28 @@ ingress_policing_kpkts_rate: 100
])
OVS_VSCTL_CLEANUP
AT_CLEANUP
+
+AT_SETUP([ovs-vsctl create bridge with uuid])
+AT_KEYWORDS([create bridge with uuid])
+OVS_VSCTL_SETUP
+
+AT_CHECK([ovs-vsctl --no-wait --id=c5cc12f8-eaa1-43a7-8a73-bccd18df1111 create bridge \
+name=tst0 -- add open . bridges c5cc12f8-eaa1-43a7-8a73-bccd18df1111], [0],[dnl
+c5cc12f8-eaa1-43a7-8a73-bccd18df1111
+])
+
+AT_CHECK([ovs-vsctl --no-wait --id=c5cc12f8-eaa1-43a7-8a73-bccd18df1111 create bridge \
+name=tst1 -- add open . bridges c5cc12f8-eaa1-43a7-8a73-bccd18df1111], [1], [ignore], [ignore])
+
+AT_CHECK([ovs-vsctl --no-wait --bare --columns _uuid,name list bridge], [0], [dnl
+c5cc12f8-eaa1-43a7-8a73-bccd18df1111
+tst0
+])
+
+ovs-vsctl --no-wait --id=@a create bridge \
+name=tst1 -- add open . bridges @a
+
+AT_CHECK([ovs-vsctl --no-wait --bare --columns _uuid,name list bridge tst1], [0], [ignore])
+
+OVS_VSCTL_CLEANUP
+AT_CLEANUP
diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index 8e75d00d7cc..c2970984bae 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -2555,3 +2555,61 @@ OVSDB_CHECK_IDL_TRACK([track, insert and delete, refs to link2],
005: table link2: i=1 l1= uuid=<1>
006: done
]])
+
+m4_define([OVSDB_CHECK_IDL_PERS_UUID_INSERT_C],
+ [AT_SETUP([$1 - C])
+ AT_KEYWORDS([idl persistent uuid insert])
+ AT_CHECK([ovsdb_start_idltest "" "$abs_srcdir/idltest.ovsschema"])
+ AT_CHECK([test-ovsdb '-vPATTERN:console:test-ovsdb|%c|%m' -vjsonrpc -t10 idl unix:socket $2],
+ [0], [stdout], [stderr])
+ AT_CHECK([sort stdout],
+ [0], [$3])
+ AT_CHECK([grep $4 stderr], [0], [ignore])
+ OVSDB_SERVER_SHUTDOWN
+ AT_CLEANUP])
+
+m4_define([OVSDB_CHECK_IDL_PERS_UUID_INSERT_PY],
+ [AT_SETUP([$1 - Python3])
+ AT_KEYWORDS([idl persistent uuid insert])
+ AT_CHECK([ovsdb_start_idltest "" "$abs_srcdir/idltest.ovsschema"])
+ AT_CHECK([$PYTHON3 $srcdir/test-ovsdb.py -t10 idl $srcdir/idltest.ovsschema unix:socket $2],
+ [0], [stdout], [stderr])
+ AT_CHECK([sort stdout],
+ [0], [$3])
+ AT_CHECK([grep $4 stderr], [0], [ignore])
+ OVSDB_SERVER_SHUTDOWN
+ AT_CLEANUP])
+
+
+m4_define([OVSDB_CHECK_IDL_PERS_UUID_INSERT],
+ [OVSDB_CHECK_IDL_PERS_UUID_INSERT_C($@)
+ OVSDB_CHECK_IDL_PERS_UUID_INSERT_PY($@)])
+
+OVSDB_CHECK_IDL_PERS_UUID_INSERT([simple idl, persistent uuid insert],
+ [['insert_uuid c5cc12f8-eaa1-43a7-8a73-bccd18df2222 2, insert_uuid c5cc12f8-eaa1-43a7-8a73-bccd18df3333 3' \
+ 'insert_uuid c5cc12f8-eaa1-43a7-8a73-bccd18df4444 4, insert_uuid c5cc12f8-eaa1-43a7-8a73-bccd18df2222 5' \
+ 'insert_uuid c5cc12f8-eaa1-43a7-8a73-bccd18df4444 4' \
+ 'delete 2' \
+ 'insert_uuid c5cc12f8-eaa1-43a7-8a73-bccd18df2222 5'
+ ]],
+ [[000: empty
+001: commit, status=success
+002: table simple: i=2 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df2222
+002: table simple: i=3 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df3333
+003: commit, status=error
+004: table simple: i=2 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df2222
+004: table simple: i=3 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df3333
+005: commit, status=success
+006: table simple: i=2 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df2222
+006: table simple: i=3 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df3333
+006: table simple: i=4 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df4444
+007: commit, status=success
+008: table simple: i=3 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df3333
+008: table simple: i=4 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df4444
+009: commit, status=success
+010: table simple: i=3 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df3333
+010: table simple: i=4 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df4444
+010: table simple: i=5 r=0 b=false s= u=00000000-0000-0000-0000-000000000000 ia=[] ra=[] ba=[] sa=[] ua=[] uuid=c5cc12f8-eaa1-43a7-8a73-bccd18df2222
+011: done
+]],
+ [['This UUID would duplicate a UUID already present within the table or deleted within the same transaction']])
diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
index 5f7110f415f..84fe232765a 100644
--- a/tests/test-ovsdb.c
+++ b/tests/test-ovsdb.c
@@ -2400,7 +2400,7 @@ idltest_find_simple(struct ovsdb_idl *idl, int i)
return NULL;
}
-static void
+static bool
idl_set(struct ovsdb_idl *idl, char *commands, int step)
{
char *cmd, *save_ptr1 = NULL;
@@ -2458,6 +2458,19 @@ idl_set(struct ovsdb_idl *idl, char *commands, int step)
s = idltest_simple_insert(txn);
idltest_simple_set_i(s, atoi(arg1));
+ } else if (!strcmp(name, "insert_uuid")) {
+ struct idltest_simple *s;
+
+ if (!arg1 || !arg2) {
+ ovs_fatal(0, "\"insert\" command requires 2 arguments");
+ }
+
+ struct uuid s_uuid;
+ if (!uuid_from_string(&s_uuid, arg1)) {
+ ovs_fatal(0, "\"insert_uuid\" command requires valid uuid");
+ }
+ s = idltest_simple_insert_persist_uuid(txn, &s_uuid);
+ idltest_simple_set_i(s, atoi(arg2));
} else if (!strcmp(name, "delete")) {
const struct idltest_simple *s;
@@ -2522,7 +2535,7 @@ idl_set(struct ovsdb_idl *idl, char *commands, int step)
print_and_log("%03d: destroy", step);
ovsdb_idl_txn_destroy(txn);
ovsdb_idl_check_consistency(idl);
- return;
+ return true;
} else {
ovs_fatal(0, "unknown command %s", name);
}
@@ -2543,6 +2556,8 @@ idl_set(struct ovsdb_idl *idl, char *commands, int step)
ovsdb_idl_txn_destroy(txn);
ovsdb_idl_check_consistency(idl);
+
+ return (status != TXN_ERROR);
}
static const struct ovsdb_idl_table_class *
@@ -2777,7 +2792,14 @@ do_idl(struct ovs_cmdl_context *ctx)
update_conditions(idl, arg + strlen(cond_s));
print_and_log("%03d: change conditions", step++);
} else if (arg[0] != '[') {
- idl_set(idl, arg, step++);
+ if (!idl_set(idl, arg, step++)) {
+ /* If idl_set() returns false, then no transaction
+ * was sent to the server and most likely 'seqno'
+ * would remain the same. And the above 'Wait for update'
+ * for loop poll_block() would never return.
+ * So set seqno to 0. */
+ seqno = 0;
+ }
} else {
struct json *json = parse_json(arg);
substitute_uuids(json, symtab);
diff --git a/tests/test-ovsdb.py b/tests/test-ovsdb.py
index 402cacbe9d7..cca1818ea3a 100644
--- a/tests/test-ovsdb.py
+++ b/tests/test-ovsdb.py
@@ -429,6 +429,14 @@ def notify(event, row, updates=None):
s = txn.insert(idl.tables["simple"])
s.i = int(args[0])
+ elif name == "insert_uuid":
+ if len(args) != 2:
+ sys.stderr.write('"set" command requires 2 argument\n')
+ sys.exit(1)
+
+ s = txn.insert(idl.tables["simple"], new_uuid=args[0],
+ persist_uuid=True)
+ s.i = int(args[1])
elif name == "delete":
if len(args) != 1:
sys.stderr.write('"delete" command requires 1 argument\n')
@@ -491,7 +499,7 @@ def notify(event, row, updates=None):
print("%03d: destroy" % step)
sys.stdout.flush()
txn.abort()
- return
+ return True
elif name == "linktest":
l1_0 = txn.insert(idl.tables["link1"])
l1_0.i = 1
@@ -615,6 +623,8 @@ def notify(event, row, updates=None):
sys.stdout.write("\n")
sys.stdout.flush()
+ return status != ovs.db.idl.Transaction.ERROR
+
def update_condition(idl, commands):
commands = commands[len("condition "):].split(";")
@@ -748,7 +758,13 @@ def mock_notify(event, row, updates=None):
sys.stdout.flush()
step += 1
elif not command.startswith("["):
- idl_set(idl, command, step)
+ if not idl_set(idl, command, step):
+ # If idl_set() returns false, then no transaction
+ # was sent to the server and most likely seqno
+ # would remain the same. And the above 'Wait for update'
+ # for loop poller.block() would never return.
+ # So set seqno to 0.
+ seqno = 0
step += 1
else:
json = ovs.json.from_string(command)
From a77c7796f23a76190b61e2109a009df980253b0f Mon Sep 17 00:00:00 2001
From: Ian Stokes
Date: Mon, 5 Dec 2022 21:31:10 +0000
Subject: [PATCH 059/833] dpdk: Update to use v22.11.1.
This commit add support to for DPDK v22.11.1, it includes the following
changes.
1. ci: Reduce DPDK compilation time.
2. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=316528
3. system-dpdk: Update vhost tests to be compatible with DPDK 22.07.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=311332
4. netdev-dpdk: Report device bus specific information.
5. netdev-dpdk: Drop reference to Rx header split.
http://patchwork.ozlabs.org/project/openvswitch/list/?series=321808
In addition documentation was also updated in this commit for use with
DPDK v22.11.1.
The Debian shared DPDK compilation test is removed as part of this patch
due to a packaging requirement. Once DPDK v22.11.1 is available in Debian
repositories it should be re-enabled in OVS.
For credit all authors of the original commits to 'dpdk-latest' with the
above changes have been added as co-authors for this commit
Signed-off-by: David Marchand
Co-authored-by: David Marchand
Signed-off-by: Sunil Pai G
Co-authored-by: Sunil Pai G
Tested-by: Michael Phelan
Tested-by: Emma Finn
Signed-off-by: Ian Stokes
---
.ci/linux-build.sh | 7 ++-
.github/workflows/build-and-test.yml | 1 -
Documentation/faq/releases.rst | 2 +-
Documentation/intro/install/dpdk.rst | 16 ++---
Documentation/topics/dpdk/phy.rst | 8 +--
Documentation/topics/dpdk/vdev.rst | 2 +-
Documentation/topics/dpdk/vhost-user.rst | 2 +-
Documentation/topics/testing.rst | 2 +-
Documentation/topics/userspace-tso.rst | 2 +-
NEWS | 18 +-----
debian/control.in | 2 +-
lib/netdev-dpdk.c | 24 +++-----
rhel/openvswitch-fedora.spec.in | 2 +-
tests/system-dpdk.at | 78 ++++++++++++------------
14 files changed, 73 insertions(+), 93 deletions(-)
diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 23c8bbb7aed..48510967238 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -160,6 +160,11 @@ function install_dpdk()
# meson verbose outputs.
DPDK_OPTS="$DPDK_OPTS -Ddeveloper_mode=disabled"
+ # OVS compilation and "normal" unit tests (run in the CI) do not depend on
+ # any DPDK driver being present.
+ # We can disable all drivers to save compilation time.
+ DPDK_OPTS="$DPDK_OPTS -Ddisable_drivers=*/*"
+
# Install DPDK using prefix.
DPDK_OPTS="$DPDK_OPTS --prefix=$(pwd)/build"
@@ -228,7 +233,7 @@ fi
if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then
if [ -z "$DPDK_VER" ]; then
- DPDK_VER="21.11.2"
+ DPDK_VER="22.11.1"
fi
install_dpdk $DPDK_VER
fi
diff --git a/.github/workflows/build-and-test.yml b/.github/workflows/build-and-test.yml
index 7baa914034a..e08d7b1bac1 100644
--- a/.github/workflows/build-and-test.yml
+++ b/.github/workflows/build-and-test.yml
@@ -213,7 +213,6 @@ jobs:
matrix:
include:
- dpdk: no
- - dpdk: shared
steps:
- name: checkout
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index ac0001cd576..e19f54c8f01 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -233,7 +233,7 @@ Q: Are all the DPDK releases that OVS versions work with maintained?
The latest information about DPDK stable and LTS releases can be found
at `DPDK stable`_.
-.. _DPDK stable: http://doc.dpdk.org/guides-21.11/contributing/stable.html
+.. _DPDK stable: http://doc.dpdk.org/guides-22.11/contributing/stable.html
Q: I get an error like this when I configure Open vSwitch:
diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
index a284e68514c..e360ee83ddc 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -42,7 +42,7 @@ Build requirements
In addition to the requirements described in :doc:`general`, building Open
vSwitch with DPDK will require the following:
-- DPDK 21.11.2
+- DPDK 22.11.1
- A `DPDK supported NIC`_
@@ -59,8 +59,8 @@ vSwitch with DPDK will require the following:
Detailed system requirements can be found at `DPDK requirements`_.
-.. _DPDK supported NIC: https://doc.dpdk.org/guides-21.11/nics/index.html
-.. _DPDK requirements: https://doc.dpdk.org/guides-21.11/linux_gsg/sys_reqs.html
+.. _DPDK supported NIC: https://doc.dpdk.org/guides-22.11/nics/index.html
+.. _DPDK requirements: https://doc.dpdk.org/guides-22.11/linux_gsg/sys_reqs.html
.. _dpdk-install:
@@ -73,9 +73,9 @@ Install DPDK
#. Download the `DPDK sources`_, extract the file and set ``DPDK_DIR``::
$ cd /usr/src/
- $ wget https://fast.dpdk.org/rel/dpdk-21.11.2.tar.xz
- $ tar xf dpdk-21.11.2.tar.xz
- $ export DPDK_DIR=/usr/src/dpdk-stable-21.11.2
+ $ wget https://fast.dpdk.org/rel/dpdk-22.11.1.tar.xz
+ $ tar xf dpdk-22.11.tar.xz
+ $ export DPDK_DIR=/usr/src/dpdk-stable-22.11.1
$ cd $DPDK_DIR
#. Configure and install DPDK using Meson
@@ -121,7 +121,7 @@ Install DPDK
.. _DPDK sources: http://dpdk.org/rel
.. _DPDK documentation:
- https://doc.dpdk.org/guides-21.11/linux_gsg/build_dpdk.html
+ https://doc.dpdk.org/guides-22.11/linux_gsg/build_dpdk.html
Install OVS
~~~~~~~~~~~
@@ -722,7 +722,7 @@ Limitations
release notes`_.
.. _DPDK release notes:
- https://doc.dpdk.org/guides-21.11/rel_notes/release_21_11.html
+ https://doc.dpdk.org/guides-22.11/rel_notes/release_22_11.html
- Upper bound MTU: DPDK device drivers differ in how the L2 frame for a
given MTU value is calculated e.g. i40e driver includes 2 x vlan headers in
diff --git a/Documentation/topics/dpdk/phy.rst b/Documentation/topics/dpdk/phy.rst
index 8fc34a378cb..cb2d5bcb7b3 100644
--- a/Documentation/topics/dpdk/phy.rst
+++ b/Documentation/topics/dpdk/phy.rst
@@ -117,7 +117,7 @@ tool::
For more information, refer to the `DPDK documentation `__.
-.. _dpdk-drivers: https://doc.dpdk.org/guides-21.11/linux_gsg/linux_drivers.html
+.. _dpdk-drivers: https://doc.dpdk.org/guides-22.11/linux_gsg/linux_drivers.html
.. _dpdk-phy-multiqueue:
@@ -235,7 +235,7 @@ To hotplug a port with igb_uio in this case, DPDK must be configured to use
physical addressing for IOVA mode. For more information regarding IOVA modes
in DPDK please refer to the `DPDK IOVA Mode Detection`__.
-__ https://doc.dpdk.org/guides-21.11/prog_guide/env_abstraction_layer.html#iova-mode-detection
+__ https://doc.dpdk.org/guides-22.11/prog_guide/env_abstraction_layer.html#iova-mode-detection
To configure OVS DPDK to use physical addressing for IOVA::
@@ -267,7 +267,7 @@ Representors are multi devices created on top of one PF.
For more information, refer to the `DPDK documentation`__.
-__ https://doc.dpdk.org/guides-21.11/prog_guide/switch_representation.html#port-representors
+__ https://doc.dpdk.org/guides-22.11/prog_guide/switch_representation.html#port-representors
Prior to port representors there was a one-to-one relationship between the PF
and the eth device. With port representors the relationship becomes one PF to
@@ -401,7 +401,7 @@ in the ``options`` column of the ``Interface`` table.
kernel netdevice, and be inherited from it when Open vSwitch is restarted,
even if the options described in this section are unset from Open vSwitch.
-.. _bifurcated-drivers: https://doc.dpdk.org/guides-21.11/linux_gsg/linux_drivers.html#bifurcated-driver
+.. _bifurcated-drivers: https://doc.dpdk.org/guides-22.11/linux_gsg/linux_drivers.html#bifurcated-driver
- Configure the VF MAC address::
diff --git a/Documentation/topics/dpdk/vdev.rst b/Documentation/topics/dpdk/vdev.rst
index 97ac6d9a52a..3383afce562 100644
--- a/Documentation/topics/dpdk/vdev.rst
+++ b/Documentation/topics/dpdk/vdev.rst
@@ -63,4 +63,4 @@ run::
More information on the different types of virtual DPDK PMDs can be found in
the `DPDK documentation`__.
-__ https://doc.dpdk.org/guides-21.11/nics/overview.html
+__ https://doc.dpdk.org/guides-22.11/nics/overview.html
diff --git a/Documentation/topics/dpdk/vhost-user.rst b/Documentation/topics/dpdk/vhost-user.rst
index 8c233c1d305..3a5f5be9887 100644
--- a/Documentation/topics/dpdk/vhost-user.rst
+++ b/Documentation/topics/dpdk/vhost-user.rst
@@ -539,4 +539,4 @@ shown with::
Further information can be found in the
`DPDK documentation
-`__
+`__
diff --git a/Documentation/topics/testing.rst b/Documentation/topics/testing.rst
index 871ce5637d1..abccce1ee60 100644
--- a/Documentation/topics/testing.rst
+++ b/Documentation/topics/testing.rst
@@ -353,7 +353,7 @@ All tests are skipped if no hugepages are configured. User must look into the DP
manual to figure out how to `Configure hugepages`_.
The phy test will skip if no compatible physical device is available.
-.. _Configure hugepages: https://doc.dpdk.org/guides-21.11/linux_gsg/sys_reqs.html
+.. _Configure hugepages: https://doc.dpdk.org/guides-22.11/linux_gsg/sys_reqs.html
All the features documented under `Unit Tests`_ are available for the DPDK
testsuite.
diff --git a/Documentation/topics/userspace-tso.rst b/Documentation/topics/userspace-tso.rst
index 33a85965c19..5a43c2e86b8 100644
--- a/Documentation/topics/userspace-tso.rst
+++ b/Documentation/topics/userspace-tso.rst
@@ -46,7 +46,7 @@ datasheet for compatibility. Secondly, the NIC must have an associated DPDK
Poll Mode Driver (PMD) which supports `TSO`. For a list of features per PMD,
refer to the `DPDK documentation`__.
-__ https://doc.dpdk.org/guides-21.11/nics/overview.html
+__ https://doc.dpdk.org/guides-22.11/nics/overview.html
Enabling TSO
~~~~~~~~~~~~
diff --git a/NEWS b/NEWS
index f6caf1ca7f0..265375e1cb8 100644
--- a/NEWS
+++ b/NEWS
@@ -9,23 +9,7 @@ Post-v3.0.0
- Windows:
* Conntrack IPv6 fragment support.
- DPDK:
- * OVS validated with DPDK 21.11.2.
- DPDK 21.11.2 contains fixes for the following CVEs:
- CVE-2022-28199 cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-28199
- CVE-2022-2132 cve.mitre.org/cgi-bin/cvename.cgi?name=CVE-2022-2132
- A bug was introduced in DPDK 21.11.1 by the commit
- 01e3dee29c02 ("vhost: fix unsafe vring addresses modifications").
- This bug can cause a deadlock when vIOMMU is enabled and NUMA
- reallocation of the virtqueues happen.
- A fix has been posted and pushed to the DPDK 21.11 branch.
- It can be found here:
- https://patches.dpdk.org/project/dpdk/patch/20220725203206.427083-2-david.marchand@redhat.com/.
- If a user wishes to avoid the issue then it is recommended to use
- DPDK 21.11.0 until the release of DPDK 21.11.3.
- It should be noted that DPDK 21.11.0 does not benefit from the numerous
- bug and CVE fixes addressed since its release.
- If a user wishes to benefit from these fixes it is recommended to use
- DPDK 21.11.2.
+ * Add support for DPDK 22.11.1.
- For the QoS max-rate and STP/RSTP path-cost configuration OVS now assumes
10 Gbps link speed by default in case the actual link speed cannot be
determined. Previously it was 10 Mbps. Values can still be overridden
diff --git a/debian/control.in b/debian/control.in
index db52c8a99f0..19f590d0645 100644
--- a/debian/control.in
+++ b/debian/control.in
@@ -21,7 +21,7 @@ Build-Depends:
iproute2,
libcap-ng-dev,
libdbus-1-dev [amd64 i386 ppc64el arm64],
-# DPDK_NETDEV libdpdk-dev (>= 21.11) [amd64 i386 ppc64el arm64],
+# DPDK_NETDEV libdpdk-dev (>= 22.11) [amd64 i386 ppc64el arm64],
libnuma-dev [amd64 i386 ppc64el arm64],
libpcap-dev [amd64 i386 ppc64el arm64],
libssl-dev,
diff --git a/lib/netdev-dpdk.c b/lib/netdev-dpdk.c
index 72e7a32688f..fff57f78279 100644
--- a/lib/netdev-dpdk.c
+++ b/lib/netdev-dpdk.c
@@ -27,9 +27,10 @@
#include
#include
-#include
+#include
#include
#include
+#include
#include
#include
#include
@@ -166,7 +167,6 @@ typedef uint16_t dpdk_port_t;
static const struct rte_eth_conf port_conf = {
.rxmode = {
- .split_hdr_size = 0,
.offloads = 0,
},
.rx_adv_conf = {
@@ -3645,6 +3645,7 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
{
struct netdev_dpdk *dev = netdev_dpdk_cast(netdev);
struct rte_eth_dev_info dev_info;
+ const char *bus_info;
uint32_t link_speed;
uint32_t dev_flags;
@@ -3657,19 +3658,8 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
rte_eth_dev_info_get(dev->port_id, &dev_info);
link_speed = dev->link.link_speed;
dev_flags = *dev_info.dev_flags;
+ bus_info = rte_dev_bus_info(dev_info.device);
ovs_mutex_unlock(&dev->mutex);
- const struct rte_bus *bus;
- const struct rte_pci_device *pci_dev;
- uint16_t vendor_id = RTE_PCI_ANY_ID;
- uint16_t device_id = RTE_PCI_ANY_ID;
- bus = rte_bus_find_by_device(dev_info.device);
- if (bus && !strcmp(bus->name, "pci")) {
- pci_dev = RTE_DEV_TO_PCI(dev_info.device);
- if (pci_dev) {
- vendor_id = pci_dev->id.vendor_id;
- device_id = pci_dev->id.device_id;
- }
- }
ovs_mutex_unlock(&dpdk_mutex);
smap_add_format(args, "port_no", DPDK_PORT_ID_FMT, dev->port_id);
@@ -3693,8 +3683,10 @@ netdev_dpdk_get_status(const struct netdev *netdev, struct smap *args)
smap_add_format(args, "if_type", "%"PRIu32, IF_TYPE_ETHERNETCSMACD);
smap_add_format(args, "if_descr", "%s %s", rte_version(),
dev_info.driver_name);
- smap_add_format(args, "pci-vendor_id", "0x%x", vendor_id);
- smap_add_format(args, "pci-device_id", "0x%x", device_id);
+ smap_add_format(args, "bus_info", "bus_name=%s%s%s",
+ rte_bus_name(rte_dev_bus(dev_info.device)),
+ bus_info != NULL ? ", " : "",
+ bus_info != NULL ? bus_info : "");
/* Not all link speeds are defined in the OpenFlow specs e.g. 25 Gbps.
* In that case the speed will not be reported as part of the usual
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index c21592e47cb..4a3e6294bfb 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -71,7 +71,7 @@ BuildRequires: libcap-ng libcap-ng-devel
%endif
%if %{with dpdk}
BuildRequires: libpcap-devel numactl-devel
-BuildRequires: dpdk-devel >= 21.11
+BuildRequires: dpdk-devel >= 22.11
Provides: %{name}-dpdk = %{version}-%{release}
%endif
%if %{with afxdp}
diff --git a/tests/system-dpdk.at b/tests/system-dpdk.at
index fd7884e0f8c..8dc187a61d4 100644
--- a/tests/system-dpdk.at
+++ b/tests/system-dpdk.at
@@ -78,14 +78,14 @@ AT_CHECK([ovs-vsctl show], [], [stdout])
sleep 2
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
])")
AT_CLEANUP
dnl --------------------------------------------------------------------------
@@ -112,11 +112,11 @@ AT_CHECK([ovs-vsctl add-port br10 dpdkvhostuser0 -- set Interface dpdkvhostuser0
AT_CHECK([ovs-vsctl show], [], [stdout])
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user server: socket created" \
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostuser0) vhost-user server: socket created" \
ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "Socket $OVS_RUNDIR/dpdkvhostuser0 created for vhost-user port dpdkvhostuser0" \
ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: bind to $OVS_RUNDIR/dpdkvhostuser0" ovs-vswitchd.log], [],
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostuser0) binding succeeded" ovs-vswitchd.log], [],
[stdout])
dnl Set up namespaces
@@ -157,8 +157,8 @@ pkill -f -x -9 'tail -f /dev/null'
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuser0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: recvmsg failed@d
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostuser0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostuser0) recvmsg failed@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostuser0) failed to connect: No such file or directory@d
\@dpdkvhostuser ports are considered deprecated; please migrate to dpdkvhostuserclient ports.@d
\@failed to enumerate system datapaths: No such file or directory@d
])")
@@ -187,9 +187,9 @@ AT_CHECK([ovs-vsctl add-port br10 dpdkvhostuserclient0 -- set Interface \
AT_CHECK([ovs-vsctl show], [], [stdout])
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Set up namespaces
ADD_NAMESPACES(ns1, ns2)
@@ -229,8 +229,8 @@ pkill -f -x -9 'tail -f /dev/null'
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: recvmsg failed@d
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) recvmsg failed@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
\@dpdkvhostuser ports are considered deprecated; please migrate to dpdkvhostuserclient ports.@d
\@failed to enumerate system datapaths: No such file or directory@d
])")
@@ -304,14 +304,14 @@ AT_CHECK([ovs-vsctl list interface dpdkvhostuserclient0], [], [stdout])
AT_CHECK([grep -E 'ingress_policing_rate: 0' stdout], [], [stdout])
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
])")
AT_CLEANUP
dnl --------------------------------------------------------------------------
@@ -345,14 +345,14 @@ AT_CHECK([grep -E 'ingress_policing_rate: 0' stdout], [], [stdout])
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
])")
AT_CLEANUP
dnl --------------------------------------------------------------------------
@@ -385,14 +385,14 @@ AT_CHECK([ovs-vsctl list interface dpdkvhostuserclient0], [], [stdout])
AT_CHECK([grep -E 'ingress_policing_rate: 10000' stdout], [], [stdout])
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
])")
AT_CLEANUP
dnl --------------------------------------------------------------------------
@@ -448,9 +448,9 @@ AT_CHECK([ovs-appctl -t ovs-vswitchd qos/show dpdkvhostuserclient0], [], [stdout
sleep 2
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Fail if egress policer could not be created
AT_FAIL_IF([grep "Could not create rte meter for egress policer" ovs-vswitchd.log], [], [stdout])
@@ -465,7 +465,7 @@ AT_CHECK([grep -E 'QoS not configured on dpdkvhostuserclient0' stdout], [], [std
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
])")
AT_CLEANUP
dnl --------------------------------------------------------------------------
@@ -487,9 +487,9 @@ OVS_WAIT_UNTIL([ovs-vsctl set port dpdkvhostuserclient0 qos=@newqos -- --id=@new
sleep 2
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Check egress policer was not created
AT_CHECK([ovs-appctl -t ovs-vswitchd qos/show dpdkvhostuserclient0], [], [stdout])
@@ -498,7 +498,7 @@ AT_CHECK([grep -E 'QoS not configured on dpdkvhostuserclient0' stdout], [], [std
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
\@Could not create rte meter for egress policer@d
\@Failed to set QoS type egress-policer on port dpdkvhostuserclient0: Invalid argument@d
])")
@@ -522,9 +522,9 @@ OVS_WAIT_UNTIL([ovs-vsctl set port dpdkvhostuserclient0 qos=@newqos -- --id=@new
sleep 2
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Check egress policer was not created
AT_CHECK([ovs-appctl -t ovs-vswitchd qos/show dpdkvhostuserclient0], [], [stdout])
@@ -533,7 +533,7 @@ AT_CHECK([grep -E 'QoS not configured on dpdkvhostuserclient0' stdout], [], [std
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
\@Could not create rte meter for egress policer@d
\@Failed to set QoS type egress-policer on port dpdkvhostuserclient0: Invalid argument@d
])")
@@ -646,9 +646,9 @@ AT_CHECK([ovs-vsctl show], [], [stdout])
sleep 2
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Execute testpmd in background
on_exit "pkill -f -x -9 'tail -f /dev/null'"
@@ -675,7 +675,7 @@ pkill -f -x -9 'tail -f /dev/null'
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
])")
AT_CLEANUP
dnl --------------------------------------------------------------------------
@@ -703,9 +703,9 @@ AT_CHECK([ovs-vsctl show], [], [stdout])
sleep 2
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Execute testpmd in background
on_exit "pkill -f -x -9 'tail -f /dev/null'"
@@ -732,7 +732,7 @@ pkill -f -x -9 'tail -f /dev/null'
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
])")
AT_CLEANUP
dnl --------------------------------------------------------------------------
@@ -864,7 +864,7 @@ pkill -f -x -9 'tail -f /dev/null'
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
\@dpdkvhostuserclient0: unsupported MTU 9711@d
\@failed to set MTU for network device dpdkvhostuserclient0: Invalid argument@d
])")
@@ -894,9 +894,9 @@ AT_CHECK([ovs-vsctl show], [], [stdout])
sleep 2
dnl Parse log file
-AT_CHECK([grep "VHOST_CONFIG: vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) vhost-user client: socket created" ovs-vswitchd.log], [], [stdout])
AT_CHECK([grep "vHost User device 'dpdkvhostuserclient0' created in 'client' mode, using client socket" ovs-vswitchd.log], [], [stdout])
-AT_CHECK([grep "VHOST_CONFIG: $OVS_RUNDIR/dpdkvhostclient0: reconnecting..." ovs-vswitchd.log], [], [stdout])
+AT_CHECK([grep "VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) reconnecting..." ovs-vswitchd.log], [], [stdout])
dnl Execute testpmd in background
on_exit "pkill -f -x -9 'tail -f /dev/null'"
@@ -921,7 +921,7 @@ pkill -f -x -9 'tail -f /dev/null'
dnl Clean up
AT_CHECK([ovs-vsctl del-port br10 dpdkvhostuserclient0], [], [stdout], [stderr])
OVS_VSWITCHD_STOP("m4_join([], [SYSTEM_DPDK_ALLOWED_LOGS], [
-\@VHOST_CONFIG: failed to connect to $OVS_RUNDIR/dpdkvhostclient0: No such file or directory@d
+\@VHOST_CONFIG: ($OVS_RUNDIR/dpdkvhostclient0) failed to connect: No such file or directory@d
\@dpdkvhostuserclient0: unsupported MTU 67@d
\@failed to set MTU for network device dpdkvhostuserclient0: Invalid argument@d
])")
From b8bf410a5c94173da02279b369d75875c4035959 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Wed, 21 Sep 2022 22:50:49 +0200
Subject: [PATCH 060/833] db-ctl-base: Use partial map/set updates for last
add/set commands.
Currently, command to add one item into a large set generates the
transaction with the full new content of that set plus 'wait'
operation for the full old content of that set. So, if we're adding
one new load-balancer into a load-balancer group in OVN using
ovn-nbctl, transaction will include all the existing load-balancers
from that groups twice.
IDL supports partial updates for sets and maps. The problem with that
is changes are not visible to the IDL user until the transaction
is committed. That will cause problems for chained ctl commands.
However, we still can optimize the very last command in the list.
It makes sense to do, since it's a common case for manual invocations.
Updating the 'add' command as well as 'set' for a case where we're
actually adding one new element to the map.
One downside is that we can't check the set size without examining
it and checking for duplicates, so allowing the transaction to be
sent and constraints to be checked on the server side in that case.
Not touching 'remove' operation for now, since removals may have
different type, e.g. if elements from the map are removed by the key.
The function will likely need to be fully re-written to accommodate
all the corner cases.
Acked-by: Dumitru Ceara
Signed-off-by: Ilya Maximets
---
lib/db-ctl-base.c | 87 +++++++++++++++++++++++++++++++------------
lib/db-ctl-base.h | 8 +++-
tests/ovs-vsctl.at | 6 ++-
utilities/ovs-vsctl.c | 7 ++--
vtep/vtep-ctl.c | 7 ++--
5 files changed, 83 insertions(+), 32 deletions(-)
diff --git a/lib/db-ctl-base.c b/lib/db-ctl-base.c
index 856832a04d2..134496ef3f6 100644
--- a/lib/db-ctl-base.c
+++ b/lib/db-ctl-base.c
@@ -75,7 +75,7 @@ static struct shash all_commands = SHASH_INITIALIZER(&all_commands);
static char *get_table(const char *, const struct ovsdb_idl_table_class **);
static char *set_column(const struct ovsdb_idl_table_class *,
const struct ovsdb_idl_row *, const char *,
- struct ovsdb_symbol_table *);
+ struct ovsdb_symbol_table *, bool use_partial_update);
static struct option *
@@ -1325,11 +1325,17 @@ cmd_find(struct ctl_context *ctx)
}
/* Sets the column of 'row' in 'table'. Returns NULL on success or a
- * malloc()'ed error message on failure. */
+ * malloc()'ed error message on failure.
+ *
+ * If 'use_partial_update' is true, then this function will try to use
+ * partial set/map updates, if possible. As a side effect, result will
+ * not be reflected in the IDL until the transaction is committed.
+ * The last access to a particular column is a good candidate to use
+ * this option. */
static char * OVS_WARN_UNUSED_RESULT
set_column(const struct ovsdb_idl_table_class *table,
const struct ovsdb_idl_row *row, const char *arg,
- struct ovsdb_symbol_table *symtab)
+ struct ovsdb_symbol_table *symtab, bool use_partial_update)
{
const struct ovsdb_idl_column *column;
char *key_string = NULL;
@@ -1352,7 +1358,7 @@ set_column(const struct ovsdb_idl_table_class *table,
if (key_string) {
union ovsdb_atom key, value;
- struct ovsdb_datum datum;
+ struct ovsdb_datum *datum;
if (column->type.value.type == OVSDB_TYPE_VOID) {
error = xasprintf("cannot specify key to set for non-map column "
@@ -1371,16 +1377,22 @@ set_column(const struct ovsdb_idl_table_class *table,
goto out;
}
- ovsdb_datum_init_empty(&datum);
- ovsdb_datum_add_unsafe(&datum, &key, &value, &column->type, NULL);
+ datum = xmalloc(sizeof *datum);
+ ovsdb_datum_init_empty(datum);
+ ovsdb_datum_add_unsafe(datum, &key, &value, &column->type, NULL);
ovsdb_atom_destroy(&key, column->type.key.type);
ovsdb_atom_destroy(&value, column->type.value.type);
- ovsdb_datum_union(&datum, ovsdb_idl_read(row, column),
- &column->type);
- ovsdb_idl_txn_verify(row, column);
- ovsdb_idl_txn_write(row, column, &datum);
+ if (use_partial_update) {
+ ovsdb_idl_txn_write_partial_map(row, column, datum);
+ } else {
+ ovsdb_datum_union(datum, ovsdb_idl_read(row, column),
+ &column->type);
+ ovsdb_idl_txn_verify(row, column);
+ ovsdb_idl_txn_write(row, column, datum);
+ free(datum);
+ }
} else {
struct ovsdb_datum datum;
@@ -1441,7 +1453,8 @@ cmd_set(struct ctl_context *ctx)
}
for (i = 3; i < ctx->argc; i++) {
- ctx->error = set_column(table, row, ctx->argv[i], ctx->symtab);
+ ctx->error = set_column(table, row, ctx->argv[i], ctx->symtab,
+ ctx->last_command);
if (ctx->error) {
return;
}
@@ -1479,7 +1492,7 @@ cmd_add(struct ctl_context *ctx)
const struct ovsdb_idl_column *column;
const struct ovsdb_idl_row *row;
const struct ovsdb_type *type;
- struct ovsdb_datum old;
+ struct ovsdb_datum new;
int i;
ctx->error = get_table(table_name, &table);
@@ -1503,7 +1516,13 @@ cmd_add(struct ctl_context *ctx)
}
type = &column->type;
- ovsdb_datum_clone(&old, ovsdb_idl_read(row, column));
+
+ if (ctx->last_command) {
+ ovsdb_datum_init_empty(&new);
+ } else {
+ ovsdb_datum_clone(&new, ovsdb_idl_read(row, column));
+ }
+
for (i = 4; i < ctx->argc; i++) {
struct ovsdb_type add_type;
struct ovsdb_datum add;
@@ -1514,23 +1533,41 @@ cmd_add(struct ctl_context *ctx)
ctx->error = ovsdb_datum_from_string(&add, &add_type, ctx->argv[i],
ctx->symtab);
if (ctx->error) {
- ovsdb_datum_destroy(&old, &column->type);
+ ovsdb_datum_destroy(&new, &column->type);
return;
}
- ovsdb_datum_union(&old, &add, type);
+ ovsdb_datum_union(&new, &add, type);
ovsdb_datum_destroy(&add, type);
}
- if (old.n > type->n_max) {
+
+ if (!ctx->last_command && new.n > type->n_max) {
ctl_error(ctx, "\"add\" operation would put %u %s in column %s of "
"table %s but the maximum number is %u",
- old.n,
+ new.n,
type->value.type == OVSDB_TYPE_VOID ? "values" : "pairs",
column->name, table->name, type->n_max);
- ovsdb_datum_destroy(&old, &column->type);
+ ovsdb_datum_destroy(&new, &column->type);
return;
}
- ovsdb_idl_txn_verify(row, column);
- ovsdb_idl_txn_write(row, column, &old);
+
+ if (ctx->last_command) {
+ /* Partial updates can only be made one by one. */
+ for (i = 0; i < new.n; i++) {
+ struct ovsdb_datum *datum = xmalloc(sizeof *datum);
+
+ ovsdb_datum_init_empty(datum);
+ ovsdb_datum_add_from_index_unsafe(datum, &new, i, type);
+ if (ovsdb_type_is_map(type)) {
+ ovsdb_idl_txn_write_partial_map(row, column, datum);
+ } else {
+ ovsdb_idl_txn_write_partial_set(row, column, datum);
+ }
+ }
+ ovsdb_datum_destroy(&new, &column->type);
+ } else {
+ ovsdb_idl_txn_verify(row, column);
+ ovsdb_idl_txn_write(row, column, &new);
+ }
invalidate_cache(ctx);
}
@@ -1769,7 +1806,7 @@ cmd_create(struct ctl_context *ctx)
}
for (i = 2; i < ctx->argc; i++) {
- ctx->error = set_column(table, row, ctx->argv[i], ctx->symtab);
+ ctx->error = set_column(table, row, ctx->argv[i], ctx->symtab, false);
if (ctx->error) {
return;
}
@@ -2620,7 +2657,8 @@ ctl_list_db_tables_usage(void)
/* Initializes 'ctx' from 'command'. */
void
ctl_context_init_command(struct ctl_context *ctx,
- struct ctl_command *command)
+ struct ctl_command *command,
+ bool last)
{
ctx->argc = command->argc;
ctx->argv = command->argv;
@@ -2629,6 +2667,7 @@ ctl_context_init_command(struct ctl_context *ctx,
ds_swap(&ctx->output, &command->output);
ctx->table = command->table;
ctx->try_again = false;
+ ctx->last_command = last;
ctx->error = NULL;
}
@@ -2640,7 +2679,7 @@ ctl_context_init(struct ctl_context *ctx, struct ctl_command *command,
void (*invalidate_cache_cb)(struct ctl_context *))
{
if (command) {
- ctl_context_init_command(ctx, command);
+ ctl_context_init_command(ctx, command, false);
}
ctx->idl = idl;
ctx->txn = txn;
@@ -2684,7 +2723,7 @@ ctl_set_column(const char *table_name, const struct ovsdb_idl_row *row,
if (error) {
return error;
}
- error = set_column(table, row, arg, symtab);
+ error = set_column(table, row, arg, symtab, false);
if (error) {
return error;
}
diff --git a/lib/db-ctl-base.h b/lib/db-ctl-base.h
index 284b573d0bc..ea7e97b7844 100644
--- a/lib/db-ctl-base.h
+++ b/lib/db-ctl-base.h
@@ -239,9 +239,15 @@ struct ctl_context {
/* A command may set this member to true if some prerequisite is not met
* and the caller should wait for something to change and then retry. */
bool try_again;
+
+ /* If set during the context initialization, command implementation
+ * may use optimizations that will leave database changes invisible
+ * to IDL, e.g. use partial set updates. */
+ bool last_command;
};
-void ctl_context_init_command(struct ctl_context *, struct ctl_command *);
+void ctl_context_init_command(struct ctl_context *, struct ctl_command *,
+ bool last);
void ctl_context_init(struct ctl_context *, struct ctl_command *,
struct ovsdb_idl *, struct ovsdb_idl_txn *,
struct ovsdb_symbol_table *,
diff --git a/tests/ovs-vsctl.at b/tests/ovs-vsctl.at
index abf4fb9cf4e..a92156f001c 100644
--- a/tests/ovs-vsctl.at
+++ b/tests/ovs-vsctl.at
@@ -1071,9 +1071,13 @@ AT_CHECK([RUN_OVS_VSCTL([set controller br1 'connection-mode=xyz'])],
AT_CHECK([RUN_OVS_VSCTL([set controller br1 connection-mode:x=y])],
[1], [], [ovs-vsctl: cannot specify key to set for non-map column connection_mode
])
-AT_CHECK([RUN_OVS_VSCTL([add bridge br1 datapath_id x y])],
+AT_CHECK([RUN_OVS_VSCTL([add bridge br1 datapath_id x y -- show])],
[1], [], [ovs-vsctl: "add" operation would put 2 values in column datapath_id of table Bridge but the maximum number is 1
])
+AT_CHECK([RUN_OVS_VSCTL([add bridge br1 datapath_id x y])], [1], [], [stderr])
+AT_CHECK([sed "/^.*|WARN|.*/d" < stderr], [0], [dnl
+ovs-vsctl: transaction error: {"details":"set must have 0 to 1 members but 2 are present","error":"syntax error","syntax":"[[\"set\",[\"x\",\"y\"]]]"}
+])
AT_CHECK([RUN_OVS_VSCTL([remove netflow `cat netflow-uuid` targets '"1.2.3.4:567"'])],
[1], [], [ovs-vsctl: "remove" operation would put 0 values in column targets of table NetFlow but the minimum number is 1
])
diff --git a/utilities/ovs-vsctl.c b/utilities/ovs-vsctl.c
index 1032089fc26..c1d47000616 100644
--- a/utilities/ovs-vsctl.c
+++ b/utilities/ovs-vsctl.c
@@ -2711,9 +2711,9 @@ post_db_reload_do_checks(const struct vsctl_context *vsctl_ctx)
static void
vsctl_context_init_command(struct vsctl_context *vsctl_ctx,
- struct ctl_command *command)
+ struct ctl_command *command, bool last_command)
{
- ctl_context_init_command(&vsctl_ctx->base, command);
+ ctl_context_init_command(&vsctl_ctx->base, command, last_command);
vsctl_ctx->verified_ports = false;
}
@@ -2859,7 +2859,8 @@ do_vsctl(const char *args, struct ctl_command *commands, size_t n_commands,
}
vsctl_context_init(&vsctl_ctx, NULL, idl, txn, ovs, symtab);
for (c = commands; c < &commands[n_commands]; c++) {
- vsctl_context_init_command(&vsctl_ctx, c);
+ vsctl_context_init_command(&vsctl_ctx, c,
+ c == &commands[n_commands - 1]);
if (c->syntax->run) {
(c->syntax->run)(&vsctl_ctx.base);
}
diff --git a/vtep/vtep-ctl.c b/vtep/vtep-ctl.c
index 99c4adcd53d..e5d99714dee 100644
--- a/vtep/vtep-ctl.c
+++ b/vtep/vtep-ctl.c
@@ -2207,9 +2207,9 @@ static const struct ctl_table_class tables[VTEPREC_N_TABLES] = {
static void
vtep_ctl_context_init_command(struct vtep_ctl_context *vtepctl_ctx,
- struct ctl_command *command)
+ struct ctl_command *command, bool last_command)
{
- ctl_context_init_command(&vtepctl_ctx->base, command);
+ ctl_context_init_command(&vtepctl_ctx->base, command, last_command);
vtepctl_ctx->verified_ports = false;
}
@@ -2304,7 +2304,8 @@ do_vtep_ctl(const char *args, struct ctl_command *commands,
}
vtep_ctl_context_init(&vtepctl_ctx, NULL, idl, txn, vtep_global, symtab);
for (c = commands; c < &commands[n_commands]; c++) {
- vtep_ctl_context_init_command(&vtepctl_ctx, c);
+ vtep_ctl_context_init_command(&vtepctl_ctx, c,
+ c == &commands[n_commands - 1]);
if (c->syntax->run) {
(c->syntax->run)(&vtepctl_ctx.base);
}
From 093915e04a978c3c37005968f2a4358ef24a2745 Mon Sep 17 00:00:00 2001
From: Daniel Ding
Date: Thu, 27 Oct 2022 14:01:13 +0800
Subject: [PATCH 061/833] vswitch.ovsschema: Set bfd_status to ephemeral.
When restart openvswitch, the bfd status will be kept
before ovs-vswitchd running. And if the ovs-vswitchd
has high workload, which will defer updating bfd status,
which not we excepted.
Signed-off-by: Daniel Ding
Signed-off-by: Ilya Maximets
---
vswitchd/vswitch.ovsschema | 7 ++++---
1 file changed, 4 insertions(+), 3 deletions(-)
diff --git a/vswitchd/vswitch.ovsschema b/vswitchd/vswitch.ovsschema
index 4873cfde72d..1a49cdffea7 100644
--- a/vswitchd/vswitch.ovsschema
+++ b/vswitchd/vswitch.ovsschema
@@ -1,6 +1,6 @@
{"name": "Open_vSwitch",
- "version": "8.3.0",
- "cksum": "3781850481 26690",
+ "version": "8.3.1",
+ "cksum": "3012963480 26720",
"tables": {
"Open_vSwitch": {
"columns": {
@@ -280,7 +280,8 @@
"min": 0, "max": "unlimited"}},
"bfd_status": {
"type": {"key": "string", "value": "string",
- "min": 0, "max": "unlimited"}},
+ "min": 0, "max": "unlimited"},
+ "ephemeral": true},
"cfm_mpid": {
"type": {
"key": {"type": "integer"},
From e83dad6e53f3fe04ca9c4d6972fcaa7995de2ba2 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Fri, 25 Nov 2022 13:37:04 +0100
Subject: [PATCH 062/833] ovsdb: Count weak reference objects.
OVSDB creates a separate object for each weak reference in order to
track them and there could be a significant amount of these objects
in the database.
We also had problems with number of these objects growing out of
bounds recently. So, adding them to a memory report seems to be
a good thing.
Counting them globally to cover all the copied instances in transactions
and the transaction history (even though there should be none).
It's also hard to count them per-database, because weak references
are stored on destination rows and can be destroyed either while
destroying the destination row or while removing the reference from
the source row. Also, not all the involved functions have direct
access to the database object. So, there is no single clear place
where counters should be updated.
Acked-by: Dumitru Ceara
Acked-by: Han Zhou
Signed-off-by: Ilya Maximets
---
ovsdb/ovsdb.c | 4 ++++
ovsdb/ovsdb.h | 4 ++++
ovsdb/row.c | 5 ++++-
ovsdb/transaction.c | 2 ++
4 files changed, 14 insertions(+), 1 deletion(-)
diff --git a/ovsdb/ovsdb.c b/ovsdb/ovsdb.c
index 1c011fab00d..11786f37660 100644
--- a/ovsdb/ovsdb.c
+++ b/ovsdb/ovsdb.c
@@ -43,6 +43,8 @@
#include "openvswitch/vlog.h"
VLOG_DEFINE_THIS_MODULE(ovsdb);
+size_t n_weak_refs = 0;
+
struct ovsdb_schema *
ovsdb_schema_create(const char *name, const char *version, const char *cksum)
{
@@ -546,6 +548,8 @@ ovsdb_get_memory_usage(const struct ovsdb *db, struct simap *usage)
if (db->storage) {
ovsdb_storage_get_memory_usage(db->storage, usage);
}
+
+ simap_put(usage, "n-weak-refs", n_weak_refs);
}
struct ovsdb_table *
diff --git a/ovsdb/ovsdb.h b/ovsdb/ovsdb.h
index d05e7c64a69..13d8bf407be 100644
--- a/ovsdb/ovsdb.h
+++ b/ovsdb/ovsdb.h
@@ -125,6 +125,10 @@ struct ovsdb {
struct ovsdb_compaction_state *snap_state;
};
+/* Total number of 'weak reference' objects in all databases
+ * and transactions. */
+extern size_t n_weak_refs;
+
struct ovsdb *ovsdb_create(struct ovsdb_schema *, struct ovsdb_storage *);
void ovsdb_destroy(struct ovsdb *);
diff --git a/ovsdb/row.c b/ovsdb/row.c
index 3f0bb8acf12..d7bfbdd365e 100644
--- a/ovsdb/row.c
+++ b/ovsdb/row.c
@@ -21,8 +21,9 @@
#include "openvswitch/dynamic-string.h"
#include "openvswitch/json.h"
-#include "ovsdb-error.h"
#include "openvswitch/shash.h"
+#include "ovsdb-error.h"
+#include "ovsdb.h"
#include "sort.h"
#include "table.h"
#include "util.h"
@@ -78,6 +79,7 @@ ovsdb_weak_ref_clone(struct ovsdb_weak_ref *src)
ovsdb_type_clone(&weak->type, &src->type);
weak->column_idx = src->column_idx;
weak->by_key = src->by_key;
+ n_weak_refs++;
return weak;
}
@@ -130,6 +132,7 @@ ovsdb_weak_ref_destroy(struct ovsdb_weak_ref *weak)
}
ovsdb_type_destroy(&weak->type);
free(weak);
+ n_weak_refs--;
}
struct ovsdb_row *
diff --git a/ovsdb/transaction.c b/ovsdb/transaction.c
index 5d7c70a51c0..03541af85d7 100644
--- a/ovsdb/transaction.c
+++ b/ovsdb/transaction.c
@@ -613,6 +613,8 @@ add_weak_ref(const struct ovsdb_row *src, const struct ovsdb_row *dst_,
weak->column_idx = column->index;
hmap_node_nullify(&weak->dst_node);
ovs_list_push_back(ref_list, &weak->src_node);
+
+ n_weak_refs++;
}
static void
From 6bc92db366631f0996c8cbae2d6e1263d437ce21 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 5 Dec 2022 09:41:21 +0100
Subject: [PATCH 063/833] rculist: Use rculist_back_protected to access prev.
The .prev member of a rculist should not be used directly by users
because it's not rcu-safe. A convenient fake mutex (rculist_fake_mutex)
helps ensuring that in conjunction with clang's thread safety
extensions.
Only writers with exclusive access to the rculist should access .prev
via some of the provided *_protected() accessors.
Use rculist_back_protected() in REVERSE_PROTECTED iterators to avoid
clang's compilation warning.
Acked-by: Mike Pattrick
Signed-off-by: Adrian Moreno
Signed-off-by: Ilya Maximets
---
lib/rculist.h | 8 +++++---
1 file changed, 5 insertions(+), 3 deletions(-)
diff --git a/lib/rculist.h b/lib/rculist.h
index 9bb8cbf3eb2..6df963eb2b8 100644
--- a/lib/rculist.h
+++ b/lib/rculist.h
@@ -378,12 +378,14 @@ rculist_is_singleton_protected(const struct rculist *list)
UPDATE_MULTIVAR(ITER, rculist_next(ITER_VAR(ITER))))
#define RCULIST_FOR_EACH_REVERSE_PROTECTED(ITER, MEMBER, RCULIST) \
- for (INIT_MULTIVAR(ITER, MEMBER, (RCULIST)->prev, struct rculist); \
+ for (INIT_MULTIVAR(ITER, MEMBER, rculist_back_protected(RCULIST), \
+ struct rculist); \
CONDITION_MULTIVAR(ITER, MEMBER, ITER_VAR(ITER) != (RCULIST)); \
- UPDATE_MULTIVAR(ITER, ITER_VAR(ITER)->prev))
+ UPDATE_MULTIVAR(ITER, rculist_back_protected(ITER_VAR(ITER))))
#define RCULIST_FOR_EACH_REVERSE_PROTECTED_CONTINUE(ITER, MEMBER, RCULIST) \
- for (INIT_MULTIVAR(ITER, MEMBER, (ITER)->MEMBER.prev, struct rculist); \
+ for (INIT_MULTIVAR(ITER, MEMBER, rculist_back_protected(ITER->MEMBER), \
+ struct rculist); \
CONDITION_MULTIVAR(ITER, MEMBER, ITER_VAR(ITER) != (RCULIST)); \
UPDATE_MULTIVAR(ITER, ITER_VAR(ITER)->prev))
From 481555587f753d035d011712b4877a4300dbc9d9 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Tue, 6 Dec 2022 14:25:34 +0100
Subject: [PATCH 064/833] faq: Update some wording since kernel module is
already removed.
The kernel module was removed in 3.0 release, but the faq page
still talks about that in a future tense.
Fixes: 3476bd3932b0 ("Documentation: Remove kernel module documentation.")
Reviewed-by: David Marchand
Signed-off-by: Ilya Maximets
---
Documentation/faq/releases.rst | 9 ++++-----
1 file changed, 4 insertions(+), 5 deletions(-)
diff --git a/Documentation/faq/releases.rst b/Documentation/faq/releases.rst
index e19f54c8f01..53ce230047c 100644
--- a/Documentation/faq/releases.rst
+++ b/Documentation/faq/releases.rst
@@ -88,11 +88,10 @@ Q: What Linux kernel versions does each Open vSwitch release work with?
RHEL and CentOS 7 3.10 based kernels since they have diverged from the
Linux kernel.org 3.10 kernels.
- Starting with Open vSwitch 2.15, building the Linux kernel module from
- the Open vSwitch source tree is deprecated. It will not be updated to
- support Linux versions later than 5.8. We will remove the kernel module
- source code from the Open vSwitch source tree for the Open vSwitch 3.0
- release.
+ Building the Linux kernel module from the Open vSwitch source tree was
+ deprecated starting with Open vSwitch 2.15. And the kernel module
+ source code was completely removed from the Open vSwitch source tree in
+ 3.0 release.
Q: Are all features available with all datapaths?
From 739bcf2263b3dfbc8a855c6e5b4a2b77742dd8db Mon Sep 17 00:00:00 2001
From: Emma Finn
Date: Tue, 6 Dec 2022 14:18:00 +0000
Subject: [PATCH 065/833] odp-execute: Fix ipv4 missing clearing of connection
tracking fields.
This patch add clearing of connection tracking fields to the
avx512 implementation of the ipv4 action. This patch also extends
the actions autovalidator to include a compare for packet metadata.
Fixes: 92eb03f7b03a ("odp-execute: Add ISA implementation of set_masked IPv4 action")
Signed-off-by: Emma Finn
Signed-off-by: Ilya Maximets
---
lib/odp-execute-avx512.c | 2 ++
lib/odp-execute-private.c | 12 ++++++++++++
2 files changed, 14 insertions(+)
diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
index 6c77132516a..66b3998dabd 100644
--- a/lib/odp-execute-avx512.c
+++ b/lib/odp-execute-avx512.c
@@ -477,6 +477,8 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch *batch,
th->tcp_csum = tcp_checksum;
}
+
+ pkt_metadata_init_conn(&packet->md);
}
/* Write back the modified IPv4 addresses. */
_mm256_mask_storeu_epi32((void *) nh, 0x1F, v_new_hdr);
diff --git a/lib/odp-execute-private.c b/lib/odp-execute-private.c
index f80ae5a239c..57be5cfe75a 100644
--- a/lib/odp-execute-private.c
+++ b/lib/odp-execute-private.c
@@ -229,6 +229,18 @@ action_autoval_generic(struct dp_packet_batch *batch, const struct nlattr *a)
}
}
+ /* Compare packet metadata. */
+ if (memcmp(&good_pkt->md, &test_pkt->md, sizeof good_pkt->md)) {
+ ds_put_format(&log_msg, "Autovalidation metadata failed\n");
+ ds_put_format(&log_msg, "Good packet metadata:\n");
+ ds_put_sparse_hex_dump(&log_msg, &good_pkt->md,
+ sizeof good_pkt->md, 0, false);
+ ds_put_format(&log_msg, "Test packet metadata:\n");
+ ds_put_sparse_hex_dump(&log_msg, &test_pkt->md,
+ sizeof test_pkt->md, 0, false);
+ failed = true;
+ }
+
if (failed) {
VLOG_ERR("Autovalidation of %s failed. Details:\n%s",
action_impls[impl].name, ds_cstr(&log_msg));
From a787fbbf9dd6a108a53053afb45fb59a0b58b514 Mon Sep 17 00:00:00 2001
From: Dumitru Ceara
Date: Tue, 13 Dec 2022 18:11:18 +0100
Subject: [PATCH 066/833] ovsdb-cs: Consider default conditions implicitly
acked.
When initializing a monitor table the default monitor condition is
[True] which matches the behavior of the server (to send all rows of
that table). There's no need to include this default condition in the
initial monitor request so we can consider it implicitly acked by the
server.
This fixes the incorrect (one too large) expected condition sequence
number reported by ovsdb_idl_set_condition() when application is
trying to set a [True] condition for a new table.
Reported-by: Numan Siddique
Suggested-by: Ilya Maximets
Signed-off-by: Dumitru Ceara
Signed-off-by: Ilya Maximets
---
lib/ovsdb-cs.c | 2 +-
python/ovs/db/idl.py | 4 +-
tests/ovsdb-idl.at | 105 +++++++++++++++++++++++++++++--------------
tests/test-ovsdb.c | 38 ++++++++++++----
tests/test-ovsdb.py | 37 +++++++++++----
5 files changed, 133 insertions(+), 53 deletions(-)
diff --git a/lib/ovsdb-cs.c b/lib/ovsdb-cs.c
index a6fbd290c87..0fca03d7231 100644
--- a/lib/ovsdb-cs.c
+++ b/lib/ovsdb-cs.c
@@ -892,7 +892,7 @@ ovsdb_cs_db_get_table(struct ovsdb_cs_db *db, const char *table)
t = xzalloc(sizeof *t);
t->name = xstrdup(table);
- t->new_cond = json_array_create_1(json_boolean_create(true));
+ t->ack_cond = json_array_create_1(json_boolean_create(true));
hmap_insert(&db->tables, &t->hmap_node, hash);
return t;
}
diff --git a/python/ovs/db/idl.py b/python/ovs/db/idl.py
index fe66402cff4..9fc2159b04a 100644
--- a/python/ovs/db/idl.py
+++ b/python/ovs/db/idl.py
@@ -85,9 +85,9 @@ class Monitor(enum.IntEnum):
class ConditionState(object):
def __init__(self):
- self._ack_cond = None
+ self._ack_cond = [True]
self._req_cond = None
- self._new_cond = [True]
+ self._new_cond = None
def __iter__(self):
return iter([self._new_cond, self._req_cond, self._ack_cond])
diff --git a/tests/ovsdb-idl.at b/tests/ovsdb-idl.at
index c2970984bae..5a7e76eaa95 100644
--- a/tests/ovsdb-idl.at
+++ b/tests/ovsdb-idl.at
@@ -576,9 +576,9 @@ OVSDB_CHECK_IDL([simple idl, conditional, false condition],
"b": true}}]']],
[['condition simple []' \
'condition simple [true]']],
- [[000: change conditions
+ [[000: simple: change conditions
001: empty
-002: change conditions
+002: simple: change conditions
003: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
004: done
]])
@@ -592,13 +592,40 @@ OVSDB_CHECK_IDL([simple idl, conditional, true condition],
"b": true}}]']],
[['condition simple []' \
'condition simple [true]']],
- [[000: change conditions
+ [[000: simple: change conditions
001: empty
-002: change conditions
+002: simple: change conditions
003: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
004: done
]])
+dnl This test ensures that the first explicitly set monitor condition
+dnl is sent to the server.
+OVSDB_CHECK_IDL([simple idl, conditional, wait for condition],
+ [],
+ [['["idltest",
+ {"op": "insert",
+ "table": "simple",
+ "row": {"i": 1,
+ "r": 2.0,
+ "b": true}}]' \
+ 'condition simple [true]' \
+ '^["idltest",
+ {"op": "insert",
+ "table": "simple",
+ "row": {"i": 2,
+ "r": 4.0,
+ "b": true}}]']],
+ [[000: empty
+001: {"error":null,"result":[{"uuid":["uuid","<0>"]}]}
+002: table simple: i=1 r=2 b=true s= u=<1> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<0>
+003: simple: conditions unchanged
+004: {"error":null,"result":[{"uuid":["uuid","<2>"]}]}
+005: table simple: i=1 r=2 b=true s= u=<1> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<0>
+005: table simple: i=2 r=4 b=true s= u=<1> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2>
+006: done
+]])
+
OVSDB_CHECK_IDL([simple idl, conditional, multiple clauses in condition],
[['["idltest",
{"op": "insert",
@@ -613,9 +640,9 @@ OVSDB_CHECK_IDL([simple idl, conditional, multiple clauses in condition],
"b": true}}]']],
[['condition simple []' \
'condition simple [["i","==",1],["i","==",2]]']],
- [[000: change conditions
+ [[000: simple: change conditions
001: empty
-002: change conditions
+002: simple: change conditions
003: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
003: table simple: i=2 r=3 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2>
004: done
@@ -630,9 +657,9 @@ OVSDB_CHECK_IDL([simple idl, conditional, modify as insert due to condition],
"b": true}}]']],
[['condition simple []' \
'condition simple [["i","==",1]]']],
- [[000: change conditions
+ [[000: simple: change conditions
001: empty
-002: change conditions
+002: simple: change conditions
003: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
004: done
]])
@@ -653,11 +680,11 @@ OVSDB_CHECK_IDL([simple idl, conditional, modify as delete due to condition],
"row": {"i": 2,
"r": 3.0,
"b": true}}]']],
- [[000: change conditions
+ [[000: simple: change conditions
001: empty
-002: change conditions
+002: simple: change conditions
003: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
-004: change conditions
+004: simple: change conditions
005: empty
006: {"error":null,"result":[{"uuid":["uuid","<2>"]}]}
007: table simple: i=2 r=3 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<2>
@@ -688,14 +715,16 @@ OVSDB_CHECK_IDL([simple idl, conditional, multiple tables],
"table": "link2",
"row": {"i": 3},
"uuid-name": "row0"}]']],
- [[000: change conditions
+ [[000: link1: change conditions
+000: link2: change conditions
+000: simple: change conditions
001: empty
-002: change conditions
+002: simple: change conditions
003: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
-004: change conditions
+004: link1: change conditions
005: table link1: i=0 k=0 ka=[] l2= uuid=<2>
005: table simple: i=1 r=2 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
-006: change conditions
+006: link2: change conditions
007: {"error":null,"result":[{"uuid":["uuid","<3>"]}]}
008: table link1: i=0 k=0 ka=[] l2= uuid=<2>
008: table link2: i=3 l1= uuid=<3>
@@ -1266,10 +1295,10 @@ OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, orphan weak refer
{"op": "delete",
"table": "simple6",
"where": []}]']],
- [[000: change conditions
+ [[000: simple: change conditions
001: table simple6: inserted row: name=first_row weak_ref=[] uuid=<0>
001: table simple6: updated columns: name weak_ref
-002: change conditions
+002: simple: change conditions
003: table simple6: name=first_row weak_ref=[<1>] uuid=<0>
003: table simple: inserted row: i=0 r=0 b=false s=row1_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
003: table simple: updated columns: s
@@ -1308,19 +1337,19 @@ OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, orphan rows, cond
{"op": "delete",
"table": "simple6",
"where": []}]']],
- [[000: change conditions
+ [[000: simple: change conditions
001: table simple6: inserted row: name=first_row weak_ref=[] uuid=<0>
001: table simple6: updated columns: name weak_ref
-002: change conditions
+002: simple: change conditions
003: table simple6: name=first_row weak_ref=[<1>] uuid=<0>
003: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
003: table simple: updated columns: s
-004: change conditions
+004: simple: change conditions
005: table simple6: name=first_row weak_ref=[] uuid=<0>
005: table simple: deleted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
005: table simple: inserted row: i=0 r=0 b=false s=row1_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3>
005: table simple: updated columns: s
-006: change conditions
+006: simple: change conditions
007: table simple6: name=first_row weak_ref=[<1>] uuid=<0>
007: table simple: deleted row: i=0 r=0 b=false s=row1_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3>
007: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
@@ -1362,14 +1391,14 @@ OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, references, condi
{"op": "delete",
"table": "simple6",
"where": []}]']],
- [[000: change conditions
+ [[000: simple: change conditions
001: table simple6: inserted row: name=first_row weak_ref=[] uuid=<0>
001: table simple6: updated columns: name weak_ref
-002: change conditions
+002: simple: change conditions
003: table simple6: name=first_row weak_ref=[<1>] uuid=<0>
003: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
003: table simple: updated columns: s
-004: change conditions
+004: simple: change conditions
005: table simple6: name=first_row weak_ref=[<3>] uuid=<0>
005: table simple: deleted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
005: table simple: inserted row: i=1 r=0 b=false s=row1_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<3>
@@ -1405,7 +1434,8 @@ OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, references, singl
{"op": "insert",
"table": "simple",
"row": {"s": "row0_s"}}]']],
- [[000: change conditions
+ [[000: simple6: conditions unchanged
+000: simple: conditions unchanged
001: table simple6: inserted row: name=row0_s6 weak_ref=[<0>] uuid=<1>
001: table simple6: updated columns: name weak_ref
001: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<0>
@@ -1447,7 +1477,8 @@ OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, weak references,
{"op": "insert",
"table": "simple",
"row": {"s": "row0_s"}}]']],
- [[000: change conditions
+ [[000: simple6: conditions unchanged
+000: simple: conditions unchanged
001: table simple6: inserted row: name=row0_s6 weak_ref=[<0>] uuid=<1>
001: table simple6: updated columns: name weak_ref
001: table simple: inserted row: i=0 r=0 b=false s=row0_s u=<2> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<0>
@@ -1487,7 +1518,9 @@ OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, strong references
{"op": "insert",
"table": "simple",
"row": {"s": "row0_s"}}]']],
- [[000: change conditions
+ [[000: simple3: conditions unchanged
+000: simple4: conditions unchanged
+000: simple: conditions unchanged
001: table simple3: inserted row: name=row0_s3 uset=[] uref=[<0>] uuid=<1>
001: table simple3: updated columns: name uref
001: table simple4: inserted row: name=row0_s4 uuid=<0>
@@ -1522,12 +1555,14 @@ OVSDB_CHECK_IDL_TRACK([track, simple idl, initially populated, strong references
{"op": "insert",
"table": "simple",
"row": {"s": "row0_s"}}]']],
- [[000: change conditions
+ [[000: simple3: conditions unchanged
+000: simple4: conditions unchanged
+000: simple: conditions unchanged
001: table simple3: inserted row: name=row0_s3 uset=[] uref=[<0>] uuid=<1>
001: table simple3: updated columns: name uref
001: table simple4: inserted row: name=row0_s4 uuid=<0>
001: table simple4: updated columns: name
-002: change conditions
+002: simple4: change conditions
003: table simple3: name=row0_s3 uset=[] uref=[] uuid=<1>
003: table simple4: deleted row: name=row0_s4 uuid=<0>
004: {"error":null,"result":[{"uuid":["uuid","<2>"]}]}
@@ -1558,10 +1593,12 @@ OVSDB_CHECK_IDL([simple idl, initially populated, strong references, conditional
{"op": "insert",
"table": "simple",
"row": {"s": "row0_s"}}]']],
- [[000: change conditions
+ [[000: simple3: conditions unchanged
+000: simple4: conditions unchanged
+000: simple: conditions unchanged
001: table simple3: name=row0_s3 uset=[] uref=[<0>] uuid=<1>
001: table simple4: name=row0_s4 uuid=<0>
-002: change conditions
+002: simple4: change conditions
003: table simple3: name=row0_s3 uset=[] uref=[] uuid=<1>
004: {"error":null,"result":[{"uuid":["uuid","<2>"]}]}
005: table simple3: name=row0_s3 uset=[] uref=[] uuid=<1>
@@ -2370,11 +2407,11 @@ OVSDB_CHECK_CLUSTER_IDL([simple idl, monitor_cond_since, cluster disconnect],
"table": "simple",
"where": [["i", "==", 1]],
"row": {"r": 2.0 }}]']],
- [[000: change conditions
+ [[000: simple: change conditions
001: empty
-002: change conditions
+002: simple: change conditions
003: table simple: i=2 r=1 b=true s= u=<0> ia=[] ra=[] ba=[] sa=[] ua=[] uuid=<1>
-004: change conditions
+004: simple: change conditions
005: reconnect
006: table simple
007: {"error":null,"result":[{"count":1}]}
diff --git a/tests/test-ovsdb.c b/tests/test-ovsdb.c
index 84fe232765a..1bc5ac17a01 100644
--- a/tests/test-ovsdb.c
+++ b/tests/test-ovsdb.c
@@ -2627,11 +2627,12 @@ parse_link2_json_clause(struct ovsdb_idl_condition *cond,
}
}
-static void
-update_conditions(struct ovsdb_idl *idl, char *commands)
+static unsigned int
+update_conditions(struct ovsdb_idl *idl, char *commands, int step)
{
- char *cmd, *save_ptr1 = NULL;
const struct ovsdb_idl_table_class *tc;
+ unsigned int next_cond_seqno = 0;
+ char *cmd, *save_ptr1 = NULL;
for (cmd = strtok_r(commands, ";", &save_ptr1); cmd;
cmd = strtok_r(NULL, ";", &save_ptr1)) {
@@ -2682,15 +2683,20 @@ update_conditions(struct ovsdb_idl *idl, char *commands)
unsigned int seqno = ovsdb_idl_get_condition_seqno(idl);
unsigned int next_seqno = ovsdb_idl_set_condition(idl, tc, &cond);
if (seqno == next_seqno ) {
- ovs_fatal(0, "condition unchanged");
+ print_and_log("%03d: %s: conditions unchanged",
+ step, table_name);
+ } else {
+ print_and_log("%03d: %s: change conditions", step, table_name);
}
unsigned int new_next_seqno = ovsdb_idl_set_condition(idl, tc, &cond);
if (next_seqno != new_next_seqno) {
ovs_fatal(0, "condition expected seqno changed");
}
+ next_cond_seqno = MAX(next_cond_seqno, next_seqno);
ovsdb_idl_condition_destroy(&cond);
json_destroy(json);
}
+ return next_cond_seqno;
}
static void
@@ -2699,6 +2705,7 @@ do_idl(struct ovs_cmdl_context *ctx)
struct test_ovsdb_pvt_context *pvt = ctx->pvt;
struct jsonrpc *rpc;
struct ovsdb_idl *idl;
+ unsigned int next_cond_seqno = 0;
unsigned int seqno = 0;
struct ovsdb_symbol_table *symtab;
size_t n_uuids = 0;
@@ -2735,8 +2742,8 @@ do_idl(struct ovs_cmdl_context *ctx)
const char remote_s[] = "set-remote ";
const char cond_s[] = "condition ";
if (ctx->argc > 2 && strstr(ctx->argv[2], cond_s)) {
- update_conditions(idl, ctx->argv[2] + strlen(cond_s));
- print_and_log("%03d: change conditions", step++);
+ next_cond_seqno =
+ update_conditions(idl, ctx->argv[2] + strlen(cond_s), step++);
i = 3;
} else {
i = 2;
@@ -2755,6 +2762,21 @@ do_idl(struct ovs_cmdl_context *ctx)
if (*arg == '+') {
/* The previous transaction didn't change anything. */
arg++;
+ } else if (*arg == '^') {
+ /* Wait for condition change to be acked by the server. */
+ arg++;
+ for (;;) {
+ ovsdb_idl_run(idl);
+ ovsdb_idl_check_consistency(idl);
+ if (ovsdb_idl_get_condition_seqno(idl) == next_cond_seqno) {
+ break;
+ }
+ jsonrpc_run(rpc);
+
+ ovsdb_idl_wait(idl);
+ jsonrpc_wait(rpc);
+ poll_block();
+ }
} else {
/* Wait for update. */
for (;;) {
@@ -2789,8 +2811,8 @@ do_idl(struct ovs_cmdl_context *ctx)
arg + strlen(remote_s),
ovsdb_idl_is_connected(idl) ? "true" : "false");
} else if (!strncmp(arg, cond_s, strlen(cond_s))) {
- update_conditions(idl, arg + strlen(cond_s));
- print_and_log("%03d: change conditions", step++);
+ next_cond_seqno = update_conditions(idl, arg + strlen(cond_s),
+ step++);
} else if (arg[0] != '[') {
if (!idl_set(idl, arg, step++)) {
/* If idl_set() returns false, then no transaction
diff --git a/tests/test-ovsdb.py b/tests/test-ovsdb.py
index cca1818ea3a..a841adba4e1 100644
--- a/tests/test-ovsdb.py
+++ b/tests/test-ovsdb.py
@@ -626,7 +626,8 @@ def notify(event, row, updates=None):
return status != ovs.db.idl.Transaction.ERROR
-def update_condition(idl, commands):
+def update_condition(idl, commands, step):
+ next_cond_seqno = 0
commands = commands[len("condition "):].split(";")
for command in commands:
command = command.split(" ")
@@ -637,7 +638,20 @@ def update_condition(idl, commands):
table = command[0]
cond = ovs.json.from_string(command[1])
- idl.cond_change(table, cond)
+ next_seqno = idl.cond_change(table, cond)
+ if idl.cond_seqno == next_seqno:
+ sys.stdout.write("%03d: %s: conditions unchanged\n" %
+ (step, table))
+ else:
+ sys.stdout.write("%03d: %s: change conditions\n" %
+ (step, table))
+ sys.stdout.flush()
+
+ assert next_seqno == idl.cond_change(table, cond), \
+ "condition expected seqno changed"
+ next_cond_seqno = max(next_cond_seqno, next_seqno)
+
+ return next_cond_seqno
def do_idl(schema_file, remote, *commands):
@@ -694,6 +708,7 @@ def do_idl(schema_file, remote, *commands):
else:
rpc = None
+ next_cond_seqno = 0
symtab = {}
seqno = 0
step = 0
@@ -717,9 +732,7 @@ def mock_notify(event, row, updates=None):
commands = list(commands)
if len(commands) >= 1 and "condition" in commands[0]:
- update_condition(idl, commands.pop(0))
- sys.stdout.write("%03d: change conditions\n" % step)
- sys.stdout.flush()
+ next_cond_seqno = update_condition(idl, commands.pop(0), step)
step += 1
for command in commands:
@@ -732,6 +745,16 @@ def mock_notify(event, row, updates=None):
if command.startswith("+"):
# The previous transaction didn't change anything.
command = command[1:]
+ elif command.startswith("^"):
+ # Wait for condition change to be acked by the server.
+ command = command[1:]
+ while idl.cond_seqno != next_cond_seqno and not idl.run():
+ rpc.run()
+
+ poller = ovs.poller.Poller()
+ idl.wait(poller)
+ rpc.wait(poller)
+ poller.block()
else:
# Wait for update.
while idl.change_seqno == seqno and not idl.run():
@@ -753,9 +776,7 @@ def mock_notify(event, row, updates=None):
step += 1
idl.force_reconnect()
elif "condition" in command:
- update_condition(idl, command)
- sys.stdout.write("%03d: change conditions\n" % step)
- sys.stdout.flush()
+ next_cond_seqno = update_condition(idl, command, step)
step += 1
elif not command.startswith("["):
if not idl_set(idl, command, step):
From 69e71bf791c89690e38afe3b7012066e5d64a129 Mon Sep 17 00:00:00 2001
From: Emma Finn
Date: Thu, 8 Dec 2022 15:59:35 +0000
Subject: [PATCH 067/833] odp-execute: Add check for L4 header size.
This patch adds check for L4 header size for avx512
implementation of the ipv4 action.
Fixes: 92eb03f7b03a ("odp-execute: Add ISA implementation of set_masked IPv4 action")
Signed-off-by: Emma Finn
Signed-off-by: Ilya Maximets
---
lib/odp-execute-avx512.c | 6 ++++--
1 file changed, 4 insertions(+), 2 deletions(-)
diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
index 66b3998dabd..5207ece15d9 100644
--- a/lib/odp-execute-avx512.c
+++ b/lib/odp-execute-avx512.c
@@ -453,8 +453,9 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch *batch,
uint16_t delta_checksum = avx512_ipv4_addr_csum_delta(v_packet,
v_new_hdr);
+ size_t l4_size = dp_packet_l4_size(packet);
- if (nh->ip_proto == IPPROTO_UDP) {
+ if (nh->ip_proto == IPPROTO_UDP && l4_size >= UDP_HEADER_LEN) {
/* New UDP checksum. */
struct udp_header *uh = dp_packet_l4(packet);
if (uh->udp_csum) {
@@ -468,7 +469,8 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch *batch,
/* Insert new udp checksum. */
uh->udp_csum = udp_checksum;
}
- } else if (nh->ip_proto == IPPROTO_TCP) {
+ } else if (nh->ip_proto == IPPROTO_TCP &&
+ l4_size >= TCP_HEADER_LEN) {
/* New TCP checksum. */
struct tcp_header *th = dp_packet_l4(packet);
uint16_t old_tcp_checksum = ~th->tcp_csum;
From 1ea0fa4ad7dc2dbfdb1f221eff97efbf3e1af894 Mon Sep 17 00:00:00 2001
From: Timothy Redaelli
Date: Fri, 16 Dec 2022 16:29:46 +0100
Subject: [PATCH 068/833] rhel: Avoid creating an empty database file.
In 59e8cb8a053d ("rhel: Move conf.db to /var/lib/openvswitch, using symlinks.")
conf.db is created as empty file in /var/lib/openvswitch, if it doesn't
exists, but this prevent ovsdb-server to start.
This commit changes the previous behaviour to set
/var/lib/openvswitch owner to openvswitch:hugetlbfs, if built with
dpdk, or openvswitch:openvswitch.
Fixes: 59e8cb8a053d ("rhel: Move conf.db to /var/lib/openvswitch, using symlinks.")
Reported-at: https://mail.openvswitch.org/pipermail/ovs-dev/2022-December/400045.html
Reported-by: Roi Dayan
Signed-off-by: Timothy Redaelli
Signed-off-by: Ilya Maximets
---
rhel/openvswitch-fedora.spec.in | 12 +++++-------
1 file changed, 5 insertions(+), 7 deletions(-)
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 4a3e6294bfb..17aab796fca 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -339,12 +339,6 @@ for base in conf.db .conf.db.~lock~; do
if test ! -e $old && test ! -h $old; then
ln -s $new $old
fi
- touch $new
-%if %{with dpdk}
- chown openvswitch:hugetlbfs $new
-%else
- chown openvswitch:openvswitch $new
-%endif
done
%if 0%{?systemd_post:1}
@@ -505,7 +499,11 @@ fi
%{_prefix}/lib/udev/rules.d/91-vfio.rules
%endif
%doc NOTICE README.rst NEWS rhel/README.RHEL.rst
-/var/lib/openvswitch
+%if %{with dpdk}
+%attr(750,openvswitch,hugetlbfs) /var/lib/openvswitch
+%else
+%attr(750,openvswitch,openvswitch) /var/lib/openvswitch
+%endif
%attr(750,root,root) /var/log/openvswitch
%ghost %attr(755,root,root) %{_rundir}/openvswitch
%ghost %attr(644,root,root) %{_rundir}/openvswitch.useropts
From bf8fa1fe414e92f8386ca2b7745822ced63385ee Mon Sep 17 00:00:00 2001
From: David Marchand
Date: Thu, 8 Dec 2022 09:06:58 +0100
Subject: [PATCH 069/833] dpdk: Fix typo in v22.11.1 tarball extract example.
There was a small typo that slipped in when updating to v22.11.1 tag.
Fixes: a77c7796f23a ("dpdk: Update to use v22.11.1.")
Signed-off-by: David Marchand
Signed-off-by: Ilya Maximets
---
Documentation/intro/install/dpdk.rst | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/Documentation/intro/install/dpdk.rst b/Documentation/intro/install/dpdk.rst
index e360ee83ddc..63a0ebb23bb 100644
--- a/Documentation/intro/install/dpdk.rst
+++ b/Documentation/intro/install/dpdk.rst
@@ -74,7 +74,7 @@ Install DPDK
$ cd /usr/src/
$ wget https://fast.dpdk.org/rel/dpdk-22.11.1.tar.xz
- $ tar xf dpdk-22.11.tar.xz
+ $ tar xf dpdk-22.11.1.tar.xz
$ export DPDK_DIR=/usr/src/dpdk-stable-22.11.1
$ cd $DPDK_DIR
From 79e7756a5d9e10c18343096187744f95a793ccf8 Mon Sep 17 00:00:00 2001
From: Eelco Chaudron
Date: Wed, 7 Dec 2022 17:26:39 +0100
Subject: [PATCH 070/833] utilities: Add a GDB macro to dump hmap structures.
Add a new GDB macro called ovs_dump_hmap, which can be used to dump any
cmap structure. For example
(gdb) ovs_dump_hmap "&'all_bridges.lto_priv.0'" "struct bridge" "node"
(struct bridge *) 0x55ec43069c70
(struct bridge *) 0x55ec430428a0
(struct bridge *) 0x55ec430a55f0
Signed-off-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
utilities/gdb/ovs_gdb.py | 53 +++++++++++++++++++++++++++++++++++++++-
1 file changed, 52 insertions(+), 1 deletion(-)
diff --git a/utilities/gdb/ovs_gdb.py b/utilities/gdb/ovs_gdb.py
index 7f63dd0d592..982395dd1d2 100644
--- a/utilities/gdb/ovs_gdb.py
+++ b/utilities/gdb/ovs_gdb.py
@@ -30,6 +30,8 @@
# - ovs_dump_netdev_provider
# - ovs_dump_ovs_list {[] [] {dump}]}
# - ovs_dump_packets [tcpdump options]
+# - ovs_dump_cmap {[] [] {dump}]}
+# - ovs_dump_hmap {dump}
# - ovs_dump_simap
# - ovs_dump_smap
# - ovs_dump_udpif_keys {|} {short}
@@ -876,7 +878,7 @@ class CmdDumpCmap(gdb.Command):
"""
def __init__(self):
super(CmdDumpCmap, self).__init__("ovs_dump_cmap",
- gdb.COMMAND_DATA)
+ gdb.COMMAND_DATA)
def invoke(self, arg, from_tty):
arg_list = gdb.string_to_argv(arg)
@@ -914,6 +916,54 @@ def invoke(self, arg, from_tty):
member).dereference()))
+#
+# Implements the GDB "ovs_dump_hmap" command
+#
+class CmdDumpHmap(gdb.Command):
+ """Dump all nodes of a given hmap
+ Usage:
+ ovs_dump_hmap {dump}
+
+ For example dump all the bridges when the all_bridges variable is
+ optimized out due to LTO:
+
+ (gdb) ovs_dump_hmap "&'all_bridges.lto_priv.0'" "struct bridge" "node"
+ (struct bridge *) 0x55ec43069c70
+ (struct bridge *) 0x55ec430428a0
+ (struct bridge *) 0x55ec430a55f0
+
+ The 'dump' option will also include the full structure content in the
+ output.
+ """
+ def __init__(self):
+ super(CmdDumpHmap, self).__init__("ovs_dump_hmap",
+ gdb.COMMAND_DATA)
+
+ def invoke(self, arg, from_tty):
+ arg_list = gdb.string_to_argv(arg)
+ typeobj = None
+ member = None
+ dump = False
+
+ if len(arg_list) != 3 and len(arg_list) != 4:
+ print("usage: ovs_dump_hmap "
+ " {dump}")
+ return
+
+ hmap = gdb.parse_and_eval(arg_list[0]).cast(
+ gdb.lookup_type('struct hmap').pointer())
+
+ typeobj = arg_list[1]
+ member = arg_list[2]
+ if len(arg_list) == 4 and arg_list[3] == "dump":
+ dump = True
+
+ for node in ForEachHMAP(hmap.dereference(), typeobj, member):
+ print("({} *) {} {}".format(typeobj, node, "=" if dump else ""))
+ if dump:
+ print(" {}\n".format(node.dereference()))
+
+
#
# Implements the GDB "ovs_dump_simap" command
#
@@ -1515,6 +1565,7 @@ def extract_pkt(self, pkt):
CmdDumpOvsList()
CmdDumpPackets()
CmdDumpCmap()
+CmdDumpHmap()
CmdDumpSimap()
CmdDumpSmap()
CmdDumpUdpifKeys()
From c82f496c3b69a036432af7c79adbc00545621ed1 Mon Sep 17 00:00:00 2001
From: Eelco Chaudron
Date: Mon, 28 Nov 2022 09:53:30 +0100
Subject: [PATCH 071/833] dpif-netdev: Use unmasked key when adding datapath
flows.
The datapath supports installing wider flows, and OVS relies on
this behavior. For example if ipv4(src=1.1.1.1/192.0.0.0,
dst=1.1.1.2/192.0.0.0) exists, a wider flow (smaller mask) of
ipv4(src=192.1.1.1/128.0.0.0,dst=192.1.1.2/128.0.0.0) is allowed
to be added.
However, if we try to add a wildcard rule, the installation fails:
# ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
ipv4(src=1.1.1.1/192.0.0.0,dst=1.1.1.2/192.0.0.0,frag=no)" 2
# ovs-appctl dpctl/add-flow system@myDP "in_port(1),eth_type(0x0800), \
ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.2/0.0.0.0,frag=no)" 2
ovs-vswitchd: updating flow table (File exists)
The reason is that the key used to determine if the flow is already
present in the system uses the original key ANDed with the mask.
This results in the IP address not being part of the (miniflow) key,
i.e., being substituted with an all-zero value. When doing the actual
lookup, this results in the key wrongfully matching the first flow,
and therefore the flow does not get installed. The solution is to use
the unmasked key for the existence check, the same way this is handled
in the "slow" dpif_flow_put() case.
OVS relies on the fact that overlapping flows can exist if one is a
superset of the other. Note that this is only true when the same set
of actions is applied. This is due to how the revalidator process
works. During revalidation, OVS removes too generic flows from the
datapath to avoid incorrect matches but allows too narrow flows to
stay in the datapath to avoid the data plane disruption and also to
avoid constant flow deletions if the datapath ignores wildcards on
certain fields/bits. See flow_wildcards_has_extra() check in the
revalidate_ukey__() function.
The problem here is that we have a too narrow flow installed, and now
OpenFlow rules got changed, so the actual flow should be more generic.
Revalidators will not remove the narrow flow, and we will eventually get
an upcall on the packet that doesn't match the narrow flow, but we will
not be able to install a more generic flow because after masking with
the new wider mask, the key matches on the narrow flow, so we get EEXIST.
Fixes: beb75a40fdc2 ("userspace: Switching of L3 packets in L2 pipeline")
Signed-off-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
lib/dpif-netdev.c | 33 +++++++++++++++++++++++++++++----
tests/dpif-netdev.at | 14 ++++++++++++++
2 files changed, 43 insertions(+), 4 deletions(-)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 2c08a71c8db..9331f2cbac6 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -3320,6 +3320,28 @@ netdev_flow_key_init_masked(struct netdev_flow_key *dst,
(dst_u64 - miniflow_get_values(&dst->mf)) * 8);
}
+/* Initializes 'key' as a copy of 'flow'. */
+static inline void
+netdev_flow_key_init(struct netdev_flow_key *key,
+ const struct flow *flow)
+{
+ uint64_t *dst = miniflow_values(&key->mf);
+ uint32_t hash = 0;
+ uint64_t value;
+
+ miniflow_map_init(&key->mf, flow);
+ miniflow_init(&key->mf, flow);
+
+ size_t n = dst - miniflow_get_values(&key->mf);
+
+ FLOW_FOR_EACH_IN_MAPS (value, flow, key->mf.map) {
+ hash = hash_add64(hash, value);
+ }
+
+ key->hash = hash_finish(hash, n * 8);
+ key->len = netdev_flow_key_size(n);
+}
+
static inline void
emc_change_entry(struct emc_entry *ce, struct dp_netdev_flow *flow,
const struct netdev_flow_key *key)
@@ -4194,7 +4216,7 @@ static int
dpif_netdev_flow_put(struct dpif *dpif, const struct dpif_flow_put *put)
{
struct dp_netdev *dp = get_dp_netdev(dpif);
- struct netdev_flow_key key, mask;
+ struct netdev_flow_key key;
struct dp_netdev_pmd_thread *pmd;
struct match match;
ovs_u128 ufid;
@@ -4243,9 +4265,12 @@ dpif_netdev_flow_put(struct dpif *dpif, const struct dpif_flow_put *put)
/* Must produce a netdev_flow_key for lookup.
* Use the same method as employed to create the key when adding
- * the flow to the dplcs to make sure they match. */
- netdev_flow_mask_init(&mask, &match);
- netdev_flow_key_init_masked(&key, &match.flow, &mask);
+ * the flow to the dplcs to make sure they match.
+ * We need to put in the unmasked key as flow_put_on_pmd() will first try
+ * to see if an entry exists doing a packet type lookup. As masked-out
+ * fields are interpreted as zeros, they could falsely match a wider IP
+ * address mask. Installation of the flow will use the match variable. */
+ netdev_flow_key_init(&key, &match.flow);
if (put->pmd_id == PMD_ID_NULL) {
if (cmap_count(&dp->poll_threads) == 0) {
diff --git a/tests/dpif-netdev.at b/tests/dpif-netdev.at
index 6aff1eda7b0..9af70a68d75 100644
--- a/tests/dpif-netdev.at
+++ b/tests/dpif-netdev.at
@@ -636,6 +636,20 @@ OVS_VSWITCHD_STOP(["/flow: in_port is not an exact match/d
/failed to put/d"])
AT_CLEANUP
+AT_SETUP([dpif-netdev - check dpctl/add-flow wider ip match])
+OVS_VSWITCHD_START(
+ [add-port br0 p1 \
+ -- set interface p1 type=dummy options:pstream=punix:$OVS_RUNDIR/p0.sock \
+ -- set bridge br0 datapath-type=dummy])
+
+AT_CHECK([ovs-appctl revalidator/pause])
+AT_CHECK([ovs-appctl dpctl/add-flow "in_port(1),eth_type(0x0800),ipv4(src=0.0.0.0/192.0.0.0,dst=0.0.0.0/192.0.0.0,frag=no)" "3"])
+AT_CHECK([ovs-appctl dpctl/add-flow "in_port(1),eth_type(0x0800),ipv4(src=192.1.1.1/0.0.0.0,dst=49.1.1.1/0.0.0.0,frag=no)" "3"])
+AT_CHECK([ovs-appctl revalidator/resume])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
# SEND_UDP_PKTS([p_name], [p_ofport])
#
# Sends 128 packets to port 'p_name' with different UDP destination ports.
From d34245ea150a7ae4dbae9e7fc37e3adfcbbf0bc6 Mon Sep 17 00:00:00 2001
From: Mike Pattrick
Date: Mon, 19 Dec 2022 08:38:38 -0500
Subject: [PATCH 072/833] ovs-ctl: Allow inclusion of hugepages in coredumps.
Add new option --dump-hugepages option in ovs-ctl to enable the addition
of hugepages in the core dump filter.
Reviewed-by: David Marchand
Signed-off-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
NEWS | 4 ++++
utilities/ovs-ctl.in | 15 +++++++++++----
2 files changed, 15 insertions(+), 4 deletions(-)
diff --git a/NEWS b/NEWS
index 265375e1cb8..95d82632f25 100644
--- a/NEWS
+++ b/NEWS
@@ -14,6 +14,10 @@ Post-v3.0.0
10 Gbps link speed by default in case the actual link speed cannot be
determined. Previously it was 10 Mbps. Values can still be overridden
by specifying 'max-rate' or '[r]stp-path-cost' accordingly.
+ - ovs-ctl:
+ * New option '--dump-hugepages' to include hugepages in core dumps. This
+ can assist with postmortem analysis involving DPDK, but may also produce
+ significantly larger core dump files.
v3.0.0 - 15 Aug 2022
diff --git a/utilities/ovs-ctl.in b/utilities/ovs-ctl.in
index eba9512fe8b..d9155258868 100644
--- a/utilities/ovs-ctl.in
+++ b/utilities/ovs-ctl.in
@@ -103,8 +103,13 @@ set_system_ids () {
action "Configuring Open vSwitch system IDs" "$@" $extra_ids
}
-check_force_cores () {
- if test X"$FORCE_COREFILES" = Xyes; then
+check_core_config () {
+ if test X"$DUMP_HUGEPAGES" = Xyes; then
+ echo 0x7f > /proc/self/coredump_filter
+ if test X"$FORCE_COREFILES" = Xyes; then
+ ulimit -c unlimited
+ fi
+ elif test X"$FORCE_COREFILES" = Xyes; then
ulimit -c 67108864
fi
}
@@ -116,7 +121,7 @@ del_transient_ports () {
}
do_start_ovsdb () {
- check_force_cores
+ check_core_config
if daemon_is_running ovsdb-server; then
log_success_msg "ovsdb-server is already running"
@@ -193,7 +198,7 @@ add_managers () {
}
do_start_forwarding () {
- check_force_cores
+ check_core_config
insert_mod_if_required || return 1
@@ -330,6 +335,7 @@ set_defaults () {
DAEMON_CWD=/
FORCE_COREFILES=yes
+ DUMP_HUGEPAGES=no
MLOCKALL=yes
SELF_CONFINEMENT=yes
MONITOR=yes
@@ -419,6 +425,7 @@ Other important options for "start", "restart" and "force-reload-kmod":
Less important options for "start", "restart" and "force-reload-kmod":
--daemon-cwd=DIR set working dir for OVS daemons (default: $DAEMON_CWD)
--no-force-corefiles do not force on core dumps for OVS daemons
+ --dump-hugepages include hugepages in core dumps
--no-mlockall do not lock all of ovs-vswitchd into memory
--ovsdb-server-priority=NICE set ovsdb-server's niceness (default: $OVSDB_SERVER_PRIORITY)
--ovsdb-server-options=OPTIONS additional options for ovsdb-server (example: '-vconsole:dbg -vfile:dbg')
From 0d23948a598ac609e9865174e0874e782a48d6a8 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 19:29:06 +0100
Subject: [PATCH 073/833] ovs-thread: Detect changes in number of CPUs.
Currently, things like the number of handler and revalidator threads are
calculated based on the number of available CPUs. However, this number
is considered static and only calculated once, hence ignoring events
such as cpus being hotplugged, switched on/off or affinity mask
changing.
On the other hand, checking the number of available CPUs multiple times
per second seems like an overkill.
Affinity should not change that often and, even if it does, the impact
of destroying and recreating all the threads so often is probably a
price too expensive to pay.
I tested the impact of updating the threads every 5 seconds and saw
an impact in the main loop duration of <1% and a worst-case scenario
impact in throughput of < 5% [1]. This patch sets the default period to
10 seconds just to be safer.
[1] Tested in the worst-case scenario of disabling the kernel cache
(other_config:flow-size=0), modifying ovs-vswithd's affinity so the
number of handlers go up and down every 5 seconds and calculated the
difference in netperf's ops/sec.
Signed-off-by: Adrian Moreno
Signed-off-by: Ilya Maximets
---
NEWS | 2 ++
lib/ovs-thread.c | 67 +++++++++++++++++++++++++++++++-----------------
2 files changed, 45 insertions(+), 24 deletions(-)
diff --git a/NEWS b/NEWS
index 95d82632f25..c79d9f97dc4 100644
--- a/NEWS
+++ b/NEWS
@@ -1,5 +1,7 @@
Post-v3.0.0
--------------------
+ - ovs-vswitchd now detects changes in CPU affinity and adjusts the number
+ of handler and revalidator threads if necessary.
- ovs-appctl:
* "ovs-appctl ofproto/trace" command can now display port names with the
"--names" option.
diff --git a/lib/ovs-thread.c b/lib/ovs-thread.c
index 78ed3e9707e..2d382f1e8bc 100644
--- a/lib/ovs-thread.c
+++ b/lib/ovs-thread.c
@@ -31,6 +31,7 @@
#include "openvswitch/poll-loop.h"
#include "seq.h"
#include "socket-util.h"
+#include "timeval.h"
#include "util.h"
#ifdef __CHECKER__
@@ -627,42 +628,60 @@ ovs_thread_stats_next_bucket(const struct ovsthread_stats *stats, size_t i)
}
-/* Returns the total number of cores available to this process, or 0 if the
- * number cannot be determined. */
-int
-count_cpu_cores(void)
+static int
+count_cpu_cores__(void)
{
- static struct ovsthread_once once = OVSTHREAD_ONCE_INITIALIZER;
- static long int n_cores;
+ long int n_cores;
- if (ovsthread_once_start(&once)) {
#ifndef _WIN32
- n_cores = sysconf(_SC_NPROCESSORS_ONLN);
+ n_cores = sysconf(_SC_NPROCESSORS_ONLN);
+#else
+ SYSTEM_INFO sysinfo;
+ GetSystemInfo(&sysinfo);
+ n_cores = sysinfo.dwNumberOfProcessors;
+#endif
#ifdef __linux__
- if (n_cores > 0) {
- cpu_set_t *set = CPU_ALLOC(n_cores);
+ if (n_cores > 0) {
+ cpu_set_t *set = CPU_ALLOC(n_cores);
- if (set) {
- size_t size = CPU_ALLOC_SIZE(n_cores);
+ if (set) {
+ size_t size = CPU_ALLOC_SIZE(n_cores);
- if (!sched_getaffinity(0, size, set)) {
- n_cores = CPU_COUNT_S(size, set);
- }
- CPU_FREE(set);
+ if (!sched_getaffinity(0, size, set)) {
+ n_cores = CPU_COUNT_S(size, set);
}
+ CPU_FREE(set);
}
-#endif
-#else
- SYSTEM_INFO sysinfo;
- GetSystemInfo(&sysinfo);
- n_cores = sysinfo.dwNumberOfProcessors;
-#endif
- ovsthread_once_done(&once);
}
-
+#endif
return n_cores > 0 ? n_cores : 0;
}
+/* It's unlikely that the available cpus change several times per second and
+ * even if it does, it's not needed (or desired) to react to such changes so
+ * quickly. */
+#define COUNT_CPU_UPDATE_TIME_MS 10000
+
+static struct ovs_mutex cpu_cores_mutex = OVS_MUTEX_INITIALIZER;
+
+/* Returns the current total number of cores available to this process, or 0
+ * if the number cannot be determined. */
+int
+count_cpu_cores(void)
+{
+ static long long int last_updated = 0;
+ long long int now = time_msec();
+ static int cpu_cores;
+
+ ovs_mutex_lock(&cpu_cores_mutex);
+ if (now - last_updated >= COUNT_CPU_UPDATE_TIME_MS) {
+ last_updated = now;
+ cpu_cores = count_cpu_cores__();
+ }
+ ovs_mutex_unlock(&cpu_cores_mutex);
+ return cpu_cores;
+}
+
/* Returns the total number of cores on the system, or 0 if the
* number cannot be determined. */
int
From 7490f281f09a8455c48e19b0cf1b99ab758ee4f4 Mon Sep 17 00:00:00 2001
From: Qian Chen
Date: Tue, 20 Dec 2022 09:36:08 -0500
Subject: [PATCH 074/833] lldp: Fix bugs when parsing malformed AutoAttach.
The OVS LLDP implementation includes support for AutoAttach standard, which
the 'upstream' lldpd project does not include. As part of adding this
support, the message parsing for these TLVs did not include proper length
checks for the LLDP_TLV_AA_ELEMENT_SUBTYPE and the
LLDP_TLV_AA_ISID_VLAN_ASGNS_SUBTYPE elements. The result is that a message
without a proper boundary will cause an overread of memory, and lead to
undefined results, including crashes or other unidentified behavior.
The fix is to introduce proper bounds checking for these elements. Introduce
a unit test to ensure that we have some proper rejection in this code
base in the future.
Fixes: be53a5c447c3 ("auto-attach: Initial support for Auto-Attach standard")
Signed-off-by: Qian Chen
Co-authored-by: Aaron Conole
Signed-off-by: Aaron Conole
Signed-off-by: Ilya Maximets
---
lib/lldp/lldp.c | 2 ++
tests/ofproto-dpif.at | 19 +++++++++++++++++++
2 files changed, 21 insertions(+)
diff --git a/lib/lldp/lldp.c b/lib/lldp/lldp.c
index dfeb2a80024..6fdcfef5694 100644
--- a/lib/lldp/lldp.c
+++ b/lib/lldp/lldp.c
@@ -583,6 +583,7 @@ lldp_decode(struct lldpd *cfg OVS_UNUSED, char *frame, int s,
switch(tlv_subtype) {
case LLDP_TLV_AA_ELEMENT_SUBTYPE:
+ CHECK_TLV_SIZE(50, "ELEMENT");
PEEK_BYTES(&msg_auth_digest, sizeof msg_auth_digest);
aa_element_dword = PEEK_UINT32;
@@ -629,6 +630,7 @@ lldp_decode(struct lldpd *cfg OVS_UNUSED, char *frame, int s,
break;
case LLDP_TLV_AA_ISID_VLAN_ASGNS_SUBTYPE:
+ CHECK_TLV_SIZE(36, "ISID_VLAN_ASGNS");
PEEK_BYTES(&msg_auth_digest, sizeof msg_auth_digest);
/* Subtract off tlv type and length (2Bytes) + OUI (3B) +
diff --git a/tests/ofproto-dpif.at b/tests/ofproto-dpif.at
index eb4cd189609..fa6111c1ed2 100644
--- a/tests/ofproto-dpif.at
+++ b/tests/ofproto-dpif.at
@@ -62,6 +62,25 @@ AT_CHECK([ovs-appctl coverage/read-counter rev_reconfigure], [0], [dnl
OVS_VSWITCHD_STOP
AT_CLEANUP
+AT_SETUP([ofproto-dpif - malformed lldp autoattach tlv])
+OVS_VSWITCHD_START()
+add_of_ports br0 1
+
+dnl Enable lldp
+AT_CHECK([ovs-vsctl set interface p1 lldp:enable=true])
+
+dnl Send a malformed lldp packet
+packet="0180c200000ef6b426aa5f0088cc020704f6b426aa5f000403057632060200780c"dnl
+"5044454144424545464445414442454546444541444245454644454144424545464445414"dnl
+"4424545464445414442454546444541444245454644454144424545464445414442454546"dnl
+"4445414442454546fe0500040d0c010000"
+AT_CHECK([ovs-appctl netdev-dummy/receive p1 "$packet"], [0], [stdout])
+
+OVS_WAIT_UNTIL([grep -q "ISID_VLAN_ASGNS TLV too short" ovs-vswitchd.log])
+
+OVS_VSWITCHD_STOP(["/|WARN|ISID_VLAN_ASGNS TLV too short received on/d"])
+AT_CLEANUP
+
AT_SETUP([ofproto-dpif - active-backup bonding (with primary)])
dnl Create br0 with members p1, p2 and p7, creating bond0 with p1 and
From c1daeb4b41c48635032039cc556412c836d47c5d Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Tue, 20 Dec 2022 18:02:01 +0100
Subject: [PATCH 075/833] AUTHORS: Add Qian Chen.
Signed-off-by: Ilya Maximets
---
AUTHORS.rst | 3 ++-
1 file changed, 2 insertions(+), 1 deletion(-)
diff --git a/AUTHORS.rst b/AUTHORS.rst
index 7bb4e41a05d..2df76c56f11 100644
--- a/AUTHORS.rst
+++ b/AUTHORS.rst
@@ -350,8 +350,9 @@ Pim van den Berg pim@nethuis.nl
pritesh pritesh.kothari@cisco.com
Pravin B Shelar pshelar@ovn.org
Przemyslaw Szczerbik przemyslawx.szczerbik@intel.com
-Quentin Monnet quentin.monnet@6wind.com
+Qian Chen cq674350529@163.com
Qiuyu Xiao qiuyu.xiao.qyx@gmail.com
+Quentin Monnet quentin.monnet@6wind.com
Raju Subramanian
Rami Rosen ramirose@gmail.com
Ramu Ramamurthy ramu.ramamurthy@us.ibm.com
From a879beb4dbeed0376f12627cb7c6f71ba81bdb9e Mon Sep 17 00:00:00 2001
From: Emma Finn
Date: Thu, 8 Dec 2022 16:01:23 +0000
Subject: [PATCH 076/833] odp-execute: Add ISA implementation of set_masked
IPv6 action
This commit adds support for the AVX512 implementation of the
ipv6_set_addrs action as well as an AVX512 implementation of
updating the L4 checksums.
Here are some relative performance numbers for this patch:
+-----------------------------+----------------+
| Actions | AVX with patch |
+-----------------------------+----------------+
| ipv6_src | 1.14x |
+-----------------------------+----------------+
| ipv6_src + ipv6_dst | 1.40x |
+-----------------------------+----------------+
| ipv6_label | 1.14x |
+-----------------------------+----------------+
| mod_ipv6 4 x field | 1.43x |
+-----------------------------+----------------+
Signed-off-by: Emma Finn
Acked-by: Eelco Chaudron
Signed-off-by: Ian Stokes
---
lib/odp-execute-avx512.c | 222 ++++++++++++++++++++++++++++++++++++++
lib/odp-execute-private.c | 14 +++
lib/odp-execute-private.h | 1 +
lib/packets.c | 2 +-
lib/packets.h | 2 +
5 files changed, 240 insertions(+), 1 deletion(-)
diff --git a/lib/odp-execute-avx512.c b/lib/odp-execute-avx512.c
index 5207ece15d9..c28461ec1a0 100644
--- a/lib/odp-execute-avx512.c
+++ b/lib/odp-execute-avx512.c
@@ -20,6 +20,9 @@
#include
#include
+#include
+#include
+#include
#include "csum.h"
#include "dp-packet.h"
@@ -28,6 +31,7 @@
#include "odp-execute-private.h"
#include "odp-netlink.h"
#include "openvswitch/vlog.h"
+#include "packets.h"
VLOG_DEFINE_THIS_MODULE(odp_execute_avx512);
@@ -75,6 +79,26 @@ BUILD_ASSERT_DECL(offsetof(struct ovs_key_ipv4, ipv4_tos) +
MEMBER_SIZEOF(struct ovs_key_ipv4, ipv4_tos) ==
offsetof(struct ovs_key_ipv4, ipv4_ttl));
+BUILD_ASSERT_DECL(offsetof(struct ovs_key_ipv6, ipv6_src) +
+ MEMBER_SIZEOF(struct ovs_key_ipv6, ipv6_src) ==
+ offsetof(struct ovs_key_ipv6, ipv6_dst));
+
+BUILD_ASSERT_DECL(offsetof(struct ovs_key_ipv6, ipv6_dst) +
+ MEMBER_SIZEOF(struct ovs_key_ipv6, ipv6_dst) ==
+ offsetof(struct ovs_key_ipv6, ipv6_label));
+
+BUILD_ASSERT_DECL(offsetof(struct ovs_key_ipv6, ipv6_label) +
+ MEMBER_SIZEOF(struct ovs_key_ipv6, ipv6_label) ==
+ offsetof(struct ovs_key_ipv6, ipv6_proto));
+
+BUILD_ASSERT_DECL(offsetof(struct ovs_key_ipv6, ipv6_proto) +
+ MEMBER_SIZEOF(struct ovs_key_ipv6, ipv6_proto) ==
+ offsetof(struct ovs_key_ipv6, ipv6_tclass));
+
+BUILD_ASSERT_DECL(offsetof(struct ovs_key_ipv6, ipv6_tclass) +
+ MEMBER_SIZEOF(struct ovs_key_ipv6, ipv6_tclass) ==
+ offsetof(struct ovs_key_ipv6, ipv6_hlimit));
+
/* Array of callback functions, one for each masked operation. */
odp_execute_action_cb impl_set_masked_funcs[__OVS_KEY_ATTR_MAX];
@@ -487,6 +511,198 @@ action_avx512_ipv4_set_addrs(struct dp_packet_batch *batch,
}
}
+#if HAVE_AVX512VBMI
+static inline uint16_t ALWAYS_INLINE
+__attribute__((__target__("avx512vbmi")))
+avx512_ipv6_sum_header(__m512i ip6_header)
+{
+ __m256i v_zeros = _mm256_setzero_si256();
+ __m512i v_shuf_src_dst = _mm512_setr_epi64(0x01, 0x02, 0x03, 0x04,
+ 0xFF, 0xFF, 0xFF, 0xFF);
+
+ /* Shuffle ip6 src and dst to beginning of register. */
+ __m512i v_ip6_hdr_shuf = _mm512_permutexvar_epi64(v_shuf_src_dst,
+ ip6_header);
+
+ /* Extract ip6 src and dst into smaller 256-bit wide register. */
+ __m256i v_ip6_src_dst = _mm512_extracti64x4_epi64(v_ip6_hdr_shuf, 0);
+
+ /* These two shuffle masks, v_swap16a and v_swap16b, are to shuffle the
+ * src and dst fields and add padding after each 16-bit value for the
+ * following carry over addition. */
+ __m256i v_swap16a = _mm256_setr_epi16(0x0100, 0xFFFF, 0x0302, 0xFFFF,
+ 0x0504, 0xFFFF, 0x0706, 0xFFFF,
+ 0x0100, 0xFFFF, 0x0302, 0xFFFF,
+ 0x0504, 0xFFFF, 0x0706, 0xFFFF);
+ __m256i v_swap16b = _mm256_setr_epi16(0x0908, 0xFFFF, 0x0B0A, 0xFFFF,
+ 0x0D0C, 0xFFFF, 0x0F0E, 0xFFFF,
+ 0x0908, 0xFFFF, 0x0B0A, 0xFFFF,
+ 0x0D0C, 0xFFFF, 0x0F0E, 0xFFFF);
+ __m256i v_shuf_old1 = _mm256_shuffle_epi8(v_ip6_src_dst, v_swap16a);
+ __m256i v_shuf_old2 = _mm256_shuffle_epi8(v_ip6_src_dst, v_swap16b);
+
+ /* Add each part of the old and new headers together. */
+ __m256i v_delta = _mm256_add_epi32(v_shuf_old1, v_shuf_old2);
+
+ /* Perform horizontal add to go from 8x32-bits to 2x32-bits. */
+ v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
+ v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
+
+ /* Shuffle 32-bit value from 3rd lane into first lane for final
+ * horizontal add. */
+ __m256i v_swap32a = _mm256_setr_epi32(0x0, 0x4, 0xF, 0xF,
+ 0xF, 0xF, 0xF, 0xF);
+
+ v_delta = _mm256_permutexvar_epi32(v_swap32a, v_delta);
+ v_delta = _mm256_hadd_epi32(v_delta, v_zeros);
+ v_delta = _mm256_hadd_epi16(v_delta, v_zeros);
+
+ /* Extract delta value. */
+ return _mm256_extract_epi16(v_delta, 0);
+}
+
+static inline uint16_t ALWAYS_INLINE
+__attribute__((__target__("avx512vbmi")))
+avx512_ipv6_addr_csum_delta(__m512i old_header, __m512i new_header)
+{
+ uint16_t old_delta = avx512_ipv6_sum_header(old_header);
+ uint16_t new_delta = avx512_ipv6_sum_header(new_header);
+ uint32_t csum_delta = ((uint16_t) ~old_delta) + new_delta;
+
+ return ~csum_finish(csum_delta);
+}
+
+/* This function performs the same operation on each packet in the batch as
+ * the scalar odp_set_ipv6() function. */
+static void
+__attribute__((__target__("avx512vbmi")))
+action_avx512_set_ipv6(struct dp_packet_batch *batch, const struct nlattr *a)
+{
+ const struct ovs_key_ipv6 *key, *mask;
+ struct dp_packet *packet;
+
+ a = nl_attr_get(a);
+ key = nl_attr_get(a);
+ mask = odp_get_key_mask(a, struct ovs_key_ipv6);
+
+ /* Read the content of the key and mask in the respective registers. We
+ * only load the size of the actual structure, which is only 40 bytes. */
+ __m512i v_key = _mm512_maskz_loadu_epi64(0x1F, (void *) key);
+ __m512i v_mask = _mm512_maskz_loadu_epi64(0x1F, (void *) mask);
+
+ /* This shuffle mask v_shuffle, is to shuffle key and mask to match the
+ * ip6_hdr structure layout. */
+ static const uint8_t ip_shuffle_mask[64] = {
+ 0x20, 0x21, 0x22, 0x23, 0xFF, 0xFF, 0x24, 0x26,
+ 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
+ 0x08, 0x09, 0x0A, 0x0B, 0x0C, 0x0D, 0x0E, 0x0F,
+ 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
+ 0x18, 0x19, 0x1A, 0x1B, 0x1C, 0x1D, 0x1E, 0x1F,
+ 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0XFF, 0xFF, 0xFF,
+ 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF,
+ 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0xFF, 0XFF, 0xFF
+ };
+
+ __m512i v_shuffle = _mm512_loadu_si512((void *) ip_shuffle_mask);
+
+ /* This shuffle is required for key and mask to match the layout of the
+ * ip6_hdr struct. */
+ __m512i v_key_shuf = _mm512_permutexvar_epi8(v_shuffle, v_key);
+ __m512i v_mask_shuf = _mm512_permutexvar_epi8(v_shuffle, v_mask);
+
+ /* Set the v_zero register to all zero's. */
+ const __m128i v_zeros = _mm_setzero_si128();
+
+ /* Set the v_all_ones register to all one's. */
+ const __m128i v_all_ones = _mm_cmpeq_epi16(v_zeros, v_zeros);
+
+ /* Load ip6 src and dst masks respectively into 128-bit wide registers. */
+ __m128i v_src = _mm_loadu_si128((void *) &mask->ipv6_src);
+ __m128i v_dst = _mm_loadu_si128((void *) &mask->ipv6_dst);
+
+ /* Perform a bitwise OR between src and dst registers. */
+ __m128i v_or = _mm_or_si128(v_src, v_dst);
+
+ /* Will return true if any bit has been set in v_or, else it will return
+ * false. */
+ bool do_checksum = !_mm_test_all_zeros(v_or, v_all_ones);
+
+ DP_PACKET_BATCH_FOR_EACH (i, packet, batch) {
+ struct ovs_16aligned_ip6_hdr *nh = dp_packet_l3(packet);
+
+ /* Load the 40 bytes of the IPv6 header. */
+ __m512i v_packet = _mm512_maskz_loadu_epi64(0x1F, (void *) nh);
+
+ /* AND the v_pkt_mask to the packet data (v_packet). */
+ __m512i v_pkt_masked = _mm512_andnot_si512(v_mask_shuf, v_packet);
+
+ /* OR the new addresses (v_key_shuf) with the masked packet addresses
+ * (v_pkt_masked). */
+ __m512i v_new_hdr = _mm512_or_si512(v_key_shuf, v_pkt_masked);
+
+ /* If ip6_src or ip6_dst has been modified, L4 checksum needs to be
+ * updated. */
+ uint8_t proto = 0;
+ bool rh_present;
+ bool do_csum = do_checksum;
+
+ rh_present = packet_rh_present(packet, &proto, &do_csum);
+
+ if (do_csum) {
+ size_t l4_size = dp_packet_l4_size(packet);
+ __m512i v_new_hdr_for_cksum = v_new_hdr;
+ uint16_t delta_checksum;
+
+ /* In case of routing header being present, checksum should not be
+ * updated for the destination address. */
+ if (rh_present) {
+ v_new_hdr_for_cksum = _mm512_mask_blend_epi64(0x18, v_new_hdr,
+ v_packet);
+ }
+
+ delta_checksum = avx512_ipv6_addr_csum_delta(v_packet,
+ v_new_hdr_for_cksum);
+
+ if (proto == IPPROTO_UDP && l4_size >= UDP_HEADER_LEN) {
+ struct udp_header *uh = dp_packet_l4(packet);
+
+ if (uh->udp_csum) {
+ uint16_t old_udp_checksum = ~uh->udp_csum;
+ uint32_t udp_checksum = old_udp_checksum + delta_checksum;
+
+ udp_checksum = csum_finish(udp_checksum);
+
+ if (!udp_checksum) {
+ udp_checksum = htons(0xffff);
+ }
+
+ uh->udp_csum = udp_checksum;
+ }
+ } else if (proto == IPPROTO_TCP && l4_size >= TCP_HEADER_LEN) {
+ struct tcp_header *th = dp_packet_l4(packet);
+ uint16_t old_tcp_checksum = ~th->tcp_csum;
+ uint32_t tcp_checksum = old_tcp_checksum + delta_checksum;
+
+ tcp_checksum = csum_finish(tcp_checksum);
+ th->tcp_csum = tcp_checksum;
+ } else if (proto == IPPROTO_ICMPV6 &&
+ l4_size >= sizeof(struct icmp6_header)) {
+ struct icmp6_header *icmp = dp_packet_l4(packet);
+ uint16_t old_icmp6_checksum = ~icmp->icmp6_cksum;
+ uint32_t icmp6_checksum = old_icmp6_checksum + delta_checksum;
+
+ icmp6_checksum = csum_finish(icmp6_checksum);
+ icmp->icmp6_cksum = icmp6_checksum;
+ }
+
+ pkt_metadata_init_conn(&packet->md);
+ }
+ /* Write back the modified IPv6 addresses. */
+ _mm512_mask_storeu_epi64((void *) nh, 0x1F, v_new_hdr);
+ }
+}
+#endif /* HAVE_AVX512VBMI */
+
static void
action_avx512_set_masked(struct dp_packet_batch *batch, const struct nlattr *a)
{
@@ -518,6 +734,12 @@ action_avx512_init(struct odp_execute_action_impl *self OVS_UNUSED)
impl_set_masked_funcs[OVS_KEY_ATTR_ETHERNET] = action_avx512_eth_set_addrs;
impl_set_masked_funcs[OVS_KEY_ATTR_IPV4] = action_avx512_ipv4_set_addrs;
+#if HAVE_AVX512VBMI
+ if (action_avx512vbmi_isa_probe()) {
+ impl_set_masked_funcs[OVS_KEY_ATTR_IPV6] = action_avx512_set_ipv6;
+ }
+#endif
+
return 0;
}
diff --git a/lib/odp-execute-private.c b/lib/odp-execute-private.c
index 57be5cfe75a..8b7a6b4ab0e 100644
--- a/lib/odp-execute-private.c
+++ b/lib/odp-execute-private.c
@@ -60,6 +60,20 @@ action_avx512_isa_probe(void)
#endif
+#if ACTION_IMPL_AVX512_CHECK && HAVE_AVX512VBMI
+bool
+action_avx512vbmi_isa_probe(void)
+{
+ return cpu_has_isa(OVS_CPU_ISA_X86_AVX512VBMI);
+}
+#else
+bool
+action_avx512vbmi_isa_probe(void)
+{
+ return false;
+}
+#endif
+
static struct odp_execute_action_impl action_impls[] = {
[ACTION_IMPL_AUTOVALIDATOR] = {
.available = false,
diff --git a/lib/odp-execute-private.h b/lib/odp-execute-private.h
index 940180c99f9..643f41c2a61 100644
--- a/lib/odp-execute-private.h
+++ b/lib/odp-execute-private.h
@@ -78,6 +78,7 @@ BUILD_ASSERT_DECL(ACTION_IMPL_AUTOVALIDATOR == 1);
#define ACTION_IMPL_BEGIN (ACTION_IMPL_AUTOVALIDATOR + 1)
bool action_avx512_isa_probe(void);
+bool action_avx512vbmi_isa_probe(void);
/* Odp execute init handles setting up the state of the actions functions at
* initialization time. It cannot return errors, as it must always succeed in
diff --git a/lib/packets.c b/lib/packets.c
index 1dcd4a6fcd2..06f516cb1af 100644
--- a/lib/packets.c
+++ b/lib/packets.c
@@ -1152,7 +1152,7 @@ packet_set_ipv4_addr(struct dp_packet *packet,
* segements_left > 0.
*
* This function assumes that L3 and L4 offsets are set in the packet. */
-static bool
+bool
packet_rh_present(struct dp_packet *packet, uint8_t *nexthdr, bool *first_frag)
{
const struct ovs_16aligned_ip6_hdr *nh;
diff --git a/lib/packets.h b/lib/packets.h
index 5bdf6e4bbd9..8626aac8d53 100644
--- a/lib/packets.h
+++ b/lib/packets.h
@@ -1642,6 +1642,8 @@ void packet_put_ra_prefix_opt(struct dp_packet *,
ovs_be32 preferred_lifetime,
const ovs_be128 router_prefix);
uint32_t packet_csum_pseudoheader(const struct ip_header *);
+bool packet_rh_present(struct dp_packet *packet, uint8_t *nexthdr,
+ bool *first_frag);
void IP_ECN_set_ce(struct dp_packet *pkt, bool is_ipv6);
#define DNS_HEADER_LEN 12
From 363cc26839ed4587640620e80a226bf739f2257f Mon Sep 17 00:00:00 2001
From: Cian Ferriter
Date: Fri, 16 Sep 2022 10:12:04 +0000
Subject: [PATCH 077/833] dpif-netdev/dpcls: Specialize 8, 1 and 5, 2
signatures.
The subtable signatures being specialized here were found in an NVGRE
tunnel scenario.
Signed-off-by: Cian Ferriter
Acked-by: Sunil Pai G
Acked-by: Eelco Chaudron
Signed-off-by: Ian Stokes
---
lib/dpif-netdev-lookup-avx512-gather.c | 4 ++++
lib/dpif-netdev-lookup-generic.c | 4 ++++
2 files changed, 8 insertions(+)
diff --git a/lib/dpif-netdev-lookup-avx512-gather.c b/lib/dpif-netdev-lookup-avx512-gather.c
index 7d3d81151f1..b916b24875e 100644
--- a/lib/dpif-netdev-lookup-avx512-gather.c
+++ b/lib/dpif-netdev-lookup-avx512-gather.c
@@ -380,7 +380,9 @@ avx512_lookup_impl(struct dpcls_subtable *subtable,
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 1)
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(8, 1)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 3)
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 2)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0)
@@ -419,7 +421,9 @@ dpcls_subtable_avx512_gather_probe__(uint32_t u0_bits, uint32_t u1_bits,
CHECK_LOOKUP_FUNCTION(9, 4, use_vpop);
CHECK_LOOKUP_FUNCTION(9, 1, use_vpop);
+ CHECK_LOOKUP_FUNCTION(8, 1, use_vpop);
CHECK_LOOKUP_FUNCTION(5, 3, use_vpop);
+ CHECK_LOOKUP_FUNCTION(5, 2, use_vpop);
CHECK_LOOKUP_FUNCTION(5, 1, use_vpop);
CHECK_LOOKUP_FUNCTION(4, 1, use_vpop);
CHECK_LOOKUP_FUNCTION(4, 0, use_vpop);
diff --git a/lib/dpif-netdev-lookup-generic.c b/lib/dpif-netdev-lookup-generic.c
index 6c74ac3a1b7..76f92dd5e69 100644
--- a/lib/dpif-netdev-lookup-generic.c
+++ b/lib/dpif-netdev-lookup-generic.c
@@ -284,7 +284,9 @@ dpcls_subtable_lookup_generic(struct dpcls_subtable *subtable,
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 4)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(9, 1)
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(8, 1)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 3)
+DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 2)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(5, 1)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 1)
DECLARE_OPTIMIZED_LOOKUP_FUNCTION(4, 0)
@@ -308,7 +310,9 @@ dpcls_subtable_generic_probe(uint32_t u0_bits, uint32_t u1_bits)
CHECK_LOOKUP_FUNCTION(9, 4);
CHECK_LOOKUP_FUNCTION(9, 1);
+ CHECK_LOOKUP_FUNCTION(8, 1);
CHECK_LOOKUP_FUNCTION(5, 3);
+ CHECK_LOOKUP_FUNCTION(5, 2);
CHECK_LOOKUP_FUNCTION(5, 1);
CHECK_LOOKUP_FUNCTION(4, 1);
CHECK_LOOKUP_FUNCTION(4, 0);
From 9855f35dd219f48ea274500a83bf27d63f679cc5 Mon Sep 17 00:00:00 2001
From: Cian Ferriter
Date: Fri, 16 Sep 2022 10:12:05 +0000
Subject: [PATCH 078/833] dpif-netdev/mfex: Add AVX512 NVGRE traffic profiles.
A typical NVGRE encapsulated packet starts with the ETH/IP/GRE
protocols. Miniflow extract will parse just the ETH and IP headers. The
GRE header will be processed later as part of the pop action. Add
support for parsing the ETH/IP headers in this scenario.
Signed-off-by: Cian Ferriter
Acked-by: Sunil Pai G
Acked-by: Eelco Chaudron
Signed-off-by: Ian Stokes
---
lib/dp-packet.h | 59 +++++++++++++++++++++++--------
lib/dpif-netdev-extract-avx512.c | 43 ++++++++++++++++++++--
lib/dpif-netdev-private-extract.c | 10 ++++++
lib/dpif-netdev-private-extract.h | 5 +++
4 files changed, 101 insertions(+), 16 deletions(-)
diff --git a/lib/dp-packet.h b/lib/dp-packet.h
index a8ea5b40f71..ed1e5b3f6d1 100644
--- a/lib/dp-packet.h
+++ b/lib/dp-packet.h
@@ -1087,8 +1087,29 @@ dp_packet_l4_checksum_bad(const struct dp_packet *p)
DP_PACKET_OL_RX_L4_CKSUM_BAD;
}
+static inline uint32_t ALWAYS_INLINE
+dp_packet_calc_hash_ipv4(const uint8_t *pkt, const uint16_t l3_ofs,
+ uint32_t hash)
+{
+ const void *ipv4_src = &pkt[l3_ofs + offsetof(struct ip_header, ip_src)];
+ const void *ipv4_dst = &pkt[l3_ofs + offsetof(struct ip_header, ip_dst)];
+ uint32_t ip_src, ip_dst;
+
+ memcpy(&ip_src, ipv4_src, sizeof ip_src);
+ memcpy(&ip_dst, ipv4_dst, sizeof ip_dst);
+
+ /* IPv4 Src and Dst. */
+ hash = hash_add(hash, ip_src);
+ hash = hash_add(hash, ip_dst);
+
+ /* IPv4 proto. */
+ hash = hash_add(hash, pkt[l3_ofs + offsetof(struct ip_header, ip_proto)]);
+
+ return hash;
+}
+
static inline void ALWAYS_INLINE
-dp_packet_update_rss_hash_ipv4_tcp_udp(struct dp_packet *packet)
+dp_packet_update_rss_hash_ipv4(struct dp_packet *packet)
{
if (dp_packet_rss_valid(packet)) {
return;
@@ -1096,26 +1117,36 @@ dp_packet_update_rss_hash_ipv4_tcp_udp(struct dp_packet *packet)
const uint8_t *pkt = dp_packet_data(packet);
const uint16_t l3_ofs = packet->l3_ofs;
- const void *ipv4_src = &pkt[l3_ofs + offsetof(struct ip_header, ip_src)];
- const void *ipv4_dst = &pkt[l3_ofs + offsetof(struct ip_header, ip_dst)];
+ uint32_t hash = 0;
+
+ /* IPv4 Src, Dst and proto. */
+ hash = dp_packet_calc_hash_ipv4(pkt, l3_ofs, hash);
+
+ hash = hash_finish(hash, 42);
+ dp_packet_set_rss_hash(packet, hash);
+}
+
+static inline void ALWAYS_INLINE
+dp_packet_update_rss_hash_ipv4_tcp_udp(struct dp_packet *packet)
+{
+ if (dp_packet_rss_valid(packet)) {
+ return;
+ }
+
+ const uint8_t *pkt = dp_packet_data(packet);
const void *l4_ports = &pkt[packet->l4_ofs];
- uint32_t ip_src, ip_dst, ports;
+ const uint16_t l3_ofs = packet->l3_ofs;
uint32_t hash = 0;
+ uint32_t ports;
- memcpy(&ip_src, ipv4_src, sizeof ip_src);
- memcpy(&ip_dst, ipv4_dst, sizeof ip_dst);
- memcpy(&ports, l4_ports, sizeof ports);
+ /* IPv4 Src, Dst and proto. */
+ hash = dp_packet_calc_hash_ipv4(pkt, l3_ofs, hash);
- /* IPv4 Src and Dst. */
- hash = hash_add(hash, ip_src);
- hash = hash_add(hash, ip_dst);
- /* IPv4 proto. */
- hash = hash_add(hash,
- pkt[l3_ofs + offsetof(struct ip_header, ip_proto)]);
/* L4 ports. */
+ memcpy(&ports, l4_ports, sizeof ports);
hash = hash_add(hash, ports);
- hash = hash_finish(hash, 42);
+ hash = hash_finish(hash, 42);
dp_packet_set_rss_hash(packet, hash);
}
diff --git a/lib/dpif-netdev-extract-avx512.c b/lib/dpif-netdev-extract-avx512.c
index 4afbed97eac..968845f2d3b 100644
--- a/lib/dpif-netdev-extract-avx512.c
+++ b/lib/dpif-netdev-extract-avx512.c
@@ -194,6 +194,7 @@ _mm512_maskz_permutexvar_epi8_selector(__mmask64 k_shuf, __m512i v_shuf,
#define PATTERN_IPV4_MASK PATTERN_IPV4_GEN(0xFF, 0xBF, 0xFF, 0xFF)
#define PATTERN_IPV4_UDP PATTERN_IPV4_GEN(0x45, 0, 0, 0x11)
#define PATTERN_IPV4_TCP PATTERN_IPV4_GEN(0x45, 0, 0, 0x06)
+#define PATTERN_IPV4_NVGRE PATTERN_IPV4_GEN(0x45, 0, 0, 0x2f)
#define PATTERN_TCP_GEN(data_offset) \
0, 0, 0, 0, /* sport, dport */ \
@@ -218,6 +219,12 @@ _mm512_maskz_permutexvar_epi8_selector(__mmask64 k_shuf, __m512i v_shuf,
NU, NU, NU, NU, NU, NU, NU, NU, 34, 35, 36, 37, NU, NU, NU, NU, /* TCP */ \
NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused. */
+#define PATTERN_IPV4_NVGRE_SHUFFLE \
+ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, NU, NU, /* Ether */ \
+ 26, 27, 28, 29, 30, 31, 32, 33, NU, NU, NU, NU, 20, 15, 22, 23, /* IPv4 */ \
+ NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused */\
+ NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, NU, /* Unused */
+
#define PATTERN_DT1Q_IPV4_UDP_SHUFFLE \
/* Ether (2 blocks): Note that *VLAN* type is written here. */ \
0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 16, 17, 0, 0, \
@@ -286,6 +293,9 @@ _mm512_maskz_permutexvar_epi8_selector(__mmask64 k_shuf, __m512i v_shuf,
#define KMASK_DT1Q_IPV6 0xFF0FULL
#define KMASK_IPV6_NOHDR 0x00FFULL
+#define PATTERN_IPV4_KMASK \
+ (KMASK_ETHER | (KMASK_IPV4 << 16))
+
#define PATTERN_IPV4_UDP_KMASK \
(KMASK_ETHER | (KMASK_IPV4 << 16) | (KMASK_UDP << 32))
@@ -332,6 +342,7 @@ _mm512_maskz_permutexvar_epi8_selector(__mmask64 k_shuf, __m512i v_shuf,
#define PKT_OFFSET_VLAN_IPV6_L4 (PKT_OFFSET_VLAN_L3 + IPV6_HEADER_LEN)
#define PKT_OFFSET_IPV6_L4 (PKT_OFFSET_L3 + IPV6_HEADER_LEN)
+#define PKT_MIN_ETH_IPV4 (ETH_HEADER_LEN + IP_HEADER_LEN)
#define PKT_MIN_ETH_IPV4_UDP (PKT_OFFSET_IPV4_L4 + UDP_HEADER_LEN)
#define PKT_MIN_ETH_VLAN_IPV4_UDP (PKT_OFFSET_VLAN_IPV4_L4 + UDP_HEADER_LEN)
#define PKT_MIN_ETH_IPV4_TCP (PKT_OFFSET_IPV4_L4 + TCP_HEADER_LEN)
@@ -352,8 +363,8 @@ _mm512_maskz_permutexvar_epi8_selector(__mmask64 k_shuf, __m512i v_shuf,
| MF_BIT(dl_dst) | MF_BIT(dl_src)| MF_BIT(dl_type))
#define MF_ETH_VLAN (MF_ETH | MF_BIT(vlans))
-#define MF_IPV4_UDP (MF_BIT(nw_src) | MF_BIT(ipv6_label) | MF_BIT(tp_src) | \
- MF_BIT(tp_dst))
+#define MF_IPV4 (MF_BIT(nw_src) | MF_BIT(ipv6_label))
+#define MF_IPV4_UDP (MF_IPV4 | MF_BIT(tp_src) | MF_BIT(tp_dst))
#define MF_IPV4_TCP (MF_IPV4_UDP | MF_BIT(tcp_flags) | MF_BIT(arp_tha.ea[2]))
#define MF_IPV6_UDP (MF_BIT(ipv6_label) | MF_WORD(ipv6_src, 2) | \
@@ -449,6 +460,7 @@ enum MFEX_PROFILES {
PROFILE_ETH_IPV6_TCP,
PROFILE_ETH_VLAN_IPV6_TCP,
PROFILE_ETH_VLAN_IPV6_UDP,
+ PROFILE_ETH_IPV4_NVGRE,
PROFILE_COUNT,
};
@@ -608,6 +620,21 @@ static const struct mfex_profile mfex_profiles[PROFILE_COUNT] =
},
.dp_pkt_min_size = PKT_MIN_ETH_VLAN_IPV6_UDP,
},
+
+ [PROFILE_ETH_IPV4_NVGRE] = {
+ .probe_mask.u8_data = { PATTERN_ETHERTYPE_MASK PATTERN_IPV4_MASK },
+ .probe_data.u8_data = { PATTERN_ETHERTYPE_IPV4 PATTERN_IPV4_NVGRE},
+
+ .store_shuf.u8_data = { PATTERN_IPV4_NVGRE_SHUFFLE },
+ .strip_mask.u8_data = { PATTERN_STRIP_IPV4_MASK },
+ .store_kmsk = PATTERN_IPV4_KMASK,
+
+ .mf_bits = { MF_ETH, MF_IPV4},
+ .dp_pkt_offs = {
+ 0, UINT16_MAX, PKT_OFFSET_L3, PKT_OFFSET_IPV4_L4,
+ },
+ .dp_pkt_min_size = PKT_MIN_ETH_IPV4,
+ },
};
/* IPv6 header helper function to fix TC, flow label and next header. */
@@ -959,6 +986,17 @@ mfex_avx512_process(struct dp_packet_batch *packets,
mfex_handle_ipv6_l4((void *)&pkt[58], &blocks[10]);
dp_packet_update_rss_hash_ipv6_tcp_udp(packet);
} break;
+
+ case PROFILE_ETH_IPV4_NVGRE: {
+ /* Handle dynamic l2_pad_size. */
+ uint32_t size_from_ipv4 = size - sizeof(struct eth_header);
+ struct ip_header *nh = (void *)&pkt[sizeof(struct eth_header)];
+ if (mfex_ipv4_set_l2_pad_size(packet, nh, size_from_ipv4, 0)) {
+ continue;
+ }
+ dp_packet_update_rss_hash_ipv4(packet);
+ } break;
+
default:
break;
};
@@ -1013,6 +1051,7 @@ DECLARE_MFEX_FUNC(ipv6_udp, PROFILE_ETH_IPV6_UDP)
DECLARE_MFEX_FUNC(ipv6_tcp, PROFILE_ETH_IPV6_TCP)
DECLARE_MFEX_FUNC(dot1q_ipv6_tcp, PROFILE_ETH_VLAN_IPV6_TCP)
DECLARE_MFEX_FUNC(dot1q_ipv6_udp, PROFILE_ETH_VLAN_IPV6_UDP)
+DECLARE_MFEX_FUNC(ip_nvgre, PROFILE_ETH_IPV4_NVGRE)
#endif /* __CHECKER__ */
#endif /* __x86_64__ */
diff --git a/lib/dpif-netdev-private-extract.c b/lib/dpif-netdev-private-extract.c
index 1a9b354201a..ded08fd3ef2 100644
--- a/lib/dpif-netdev-private-extract.c
+++ b/lib/dpif-netdev-private-extract.c
@@ -184,6 +184,16 @@ static struct dpif_miniflow_extract_impl mfex_impls[] = {
.extract_func = mfex_avx512_dot1q_ipv6_udp,
.name = "avx512_dot1q_ipv6_udp",
},
+#if HAVE_AVX512VBMI
+ [MFEX_IMPL_VBMI_IPv4_NVGRE] = {
+ .probe = mfex_avx512_vbmi_probe,
+ .extract_func = mfex_avx512_vbmi_ip_nvgre,
+ .name = "avx512_vbmi_ipv4_nvgre", },
+#endif
+ [MFEX_IMPL_IPv4_NVGRE] = {
+ .probe = mfex_avx512_probe,
+ .extract_func = mfex_avx512_ip_nvgre,
+ .name = "avx512_ipv4_nvgre", },
#endif
};
diff --git a/lib/dpif-netdev-private-extract.h b/lib/dpif-netdev-private-extract.h
index 8a7f9b01aff..48549beaa0e 100644
--- a/lib/dpif-netdev-private-extract.h
+++ b/lib/dpif-netdev-private-extract.h
@@ -117,6 +117,10 @@ enum dpif_miniflow_extract_impl_idx {
MFEX_IMPL_VBMI_DOT1Q_IPv6_UDP,
#endif
MFEX_IMPL_DOT1Q_IPv6_UDP,
+#if HAVE_AVX512VBMI
+ MFEX_IMPL_VBMI_IPv4_NVGRE,
+#endif
+ MFEX_IMPL_IPv4_NVGRE,
#endif
MFEX_IMPL_MAX
};
@@ -230,6 +234,7 @@ DECLARE_AVX512_MFEX_PROTOTYPE(ipv6_udp);
DECLARE_AVX512_MFEX_PROTOTYPE(ipv6_tcp);
DECLARE_AVX512_MFEX_PROTOTYPE(dot1q_ipv6_tcp);
DECLARE_AVX512_MFEX_PROTOTYPE(dot1q_ipv6_udp);
+DECLARE_AVX512_MFEX_PROTOTYPE(ip_nvgre);
#endif /* __x86_64__ */
From c627cfd9cb630c052285a540cd65dd809be0ea95 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:42 +0100
Subject: [PATCH 079/833] python: Fix datapath flow decoders.
Fix the following erros in odp decoding:
- Missing push_mpls action
- Typos in collector_set_id, tp_src/tp_dst and csum
- Missing two fields in vxlan match
Signed-off-by: Adrian Moreno
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
python/ovs/flow/odp.py | 27 ++++++++++++++++++++++-----
1 file changed, 22 insertions(+), 5 deletions(-)
diff --git a/python/ovs/flow/odp.py b/python/ovs/flow/odp.py
index 87a3bae2f9a..3bc3aec8e00 100644
--- a/python/ovs/flow/odp.py
+++ b/python/ovs/flow/odp.py
@@ -225,7 +225,7 @@ def _action_decoders_args():
KVDecoders(
{
"probability": decode_int,
- "collector_sed_id": decode_int,
+ "collector_set_id": decode_int,
"obs_domain_id": decode_int,
"obs_point_id": decode_int,
"output_port": decode_default,
@@ -303,6 +303,21 @@ def _action_decoders_args():
),
"pop_nsh": decode_flag,
"tnl_pop": decode_int,
+ "pop_mpls": KVDecoders({"eth_type": decode_int}),
+ **dict.fromkeys(
+ ["push_mpls", "add_mpls"],
+ nested_kv_decoder(
+ KVDecoders(
+ {
+ "label": decode_int,
+ "tc": decode_int,
+ "ttl": decode_int,
+ "bos": decode_int,
+ "eth_type": decode_int,
+ }
+ )
+ ),
+ ),
"ct_clear": decode_flag,
"ct": nested_kv_decoder(
KVDecoders(
@@ -412,7 +427,7 @@ def _tnl_action_decoder_args():
{
"src": decode_int,
"dst": decode_int,
- "dsum": Mask16,
+ "csum": Mask16,
}
)
),
@@ -499,8 +514,8 @@ def _field_decoders_args():
"src": IPMask,
"dst": IPMask,
"proto": Mask8,
- "tcp_src": Mask16,
- "tcp_dst": Mask16,
+ "tp_src": Mask16,
+ "tp_dst": Mask16,
}
)
),
@@ -541,6 +556,8 @@ def _field_decoders_args():
"vxlan": nested_kv_decoder(
KVDecoders(
{
+ "flags": decode_int,
+ "vni": decode_int,
"gbp": nested_kv_decoder(
KVDecoders(
{
@@ -548,7 +565,7 @@ def _field_decoders_args():
"flags": Mask8,
}
)
- )
+ ),
}
)
),
From 3648fec08f15b3f2cc37cd4b85eaccb773d1f444 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:43 +0100
Subject: [PATCH 080/833] python: Include aliases in ofp_fields.py.
We currently auto-generate a dictionary of field names and decoders.
However, sometimes fields can be specified by their cannonical NXM or
OXM names.
Modify gen_ofp_field_decoders to also generate a dictionary of aliases
so it's easy to map OXM/NXM names to their fields and decoding
information.
Signed-off-by: Adrian Moreno
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
build-aux/gen_ofp_field_decoders | 15 +++++++++++++++
1 file changed, 15 insertions(+)
diff --git a/build-aux/gen_ofp_field_decoders b/build-aux/gen_ofp_field_decoders
index 96f99e860f7..0b797ee8c8c 100755
--- a/build-aux/gen_ofp_field_decoders
+++ b/build-aux/gen_ofp_field_decoders
@@ -22,12 +22,16 @@ def main():
fields = extract_fields.extract_ofp_fields(args.metaflow)
field_decoders = {}
+ aliases = {}
for field in fields:
decoder = get_decoder(field)
field_decoders[field.get("name")] = decoder
if field.get("extra_name"):
field_decoders[field.get("extra_name")] = decoder
+ for nxm in field.get("OXM", []):
+ aliases[nxm[1]] = field.get("name")
+
code = """
# This file is auto-generated. Do not edit!
@@ -35,14 +39,25 @@ from ovs.flow import decoders
field_decoders = {{
{decoders}
+}}
+
+field_aliases = {{
+{aliases}
}}""".format(
decoders="\n".join(
[
" '{name}': {decoder},".format(name=name, decoder=decoder)
for name, decoder in field_decoders.items()
]
+ ),
+ aliases="\n".join(
+ [
+ " '{alias}': '{name}',".format(name=name, alias=alias)
+ for alias, name in aliases.items()
+ ]
)
)
+
print(code)
From fe204743cbc609dc5dfefd1437fc058b7ad3ca52 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:44 +0100
Subject: [PATCH 081/833] python: Add explicit decoders for all ofp actions.
We were silently relying on some ofp actions to be decoded by the
default decoder which would yield decent string values.
In order to be more safe and robust, add an explicit decoder for all
missing actions.
This patch also reworks the learn action decoding to make it more
explicit and verify all the fields specified in the learn action are
actually valid fields.
Signed-off-by: Adrian Moreno
Signed-off-by: Ilya Maximets
---
python/ovs/flow/kv.py | 13 +++---
python/ovs/flow/ofp.py | 50 ++++++++++++++-------
python/ovs/flow/ofp_act.py | 85 +++++++++++++++++++++++++-----------
python/ovs/tests/test_ofp.py | 6 +--
4 files changed, 105 insertions(+), 49 deletions(-)
diff --git a/python/ovs/flow/kv.py b/python/ovs/flow/kv.py
index cceb95e4387..383d7ee7878 100644
--- a/python/ovs/flow/kv.py
+++ b/python/ovs/flow/kv.py
@@ -87,10 +87,11 @@ class KVDecoders(object):
Args:
decoders (dict): Optional; A dictionary of decoders indexed by keyword.
- default (callable): Optional; A decoder used if a match is not found in
- configured decoders. If not provided, the default behavior is to
- try to decode the value into an integer and, if that fails,
- just return the string as-is.
+ default (callable): Optional; A function to use if a match is not
+ found in configured decoders. If not provided, the default behavior
+ is to try to decode the value into an integer and, if that fails,
+ just return the string as-is. The function must accept a the key
+ and the value and return the decoded (key, value) tuple back.
default_free (callable): Optional; The decoder used if a match is not
found in configured decoders and it's a free value (e.g:
a value without a key) Defaults to returning the free value as
@@ -100,7 +101,7 @@ class KVDecoders(object):
def __init__(self, decoders=None, default=None, default_free=None):
self._decoders = decoders or dict()
- self._default = default or decode_default
+ self._default = default or (lambda k, v: (k, decode_default(v)))
self._default_free = default_free or self._default_free_decoder
def decode(self, keyword, value_str):
@@ -126,7 +127,7 @@ def decode(self, keyword, value_str):
return keyword, value
else:
if value_str:
- return keyword, self._default(value_str)
+ return self._default(keyword, value_str)
else:
return self._default_free(keyword)
diff --git a/python/ovs/flow/ofp.py b/python/ovs/flow/ofp.py
index 0bc110c576e..3d79ed6ad77 100644
--- a/python/ovs/flow/ofp.py
+++ b/python/ovs/flow/ofp.py
@@ -243,6 +243,7 @@ def _gen_action_decoders():
**OFPFlow._fw_action_decoders_args(),
**OFPFlow._control_action_decoders_args(),
**OFPFlow._other_action_decoders_args(),
+ **OFPFlow._instruction_action_decoders_args(),
}
clone_actions = OFPFlow._clone_actions_decoders_args(actions)
actions.update(clone_actions)
@@ -272,6 +273,8 @@ def _encap_actions_decoders_args():
"pop_vlan": decode_flag,
"strip_vlan": decode_flag,
"push_vlan": decode_default,
+ "pop_mpls": decode_int,
+ "push_mpls": decode_int,
"decap": decode_flag,
"encap": decode_encap,
}
@@ -286,8 +289,8 @@ def _field_action_decoders_args():
"set_mpls_ttl",
"mod_nw_tos",
"mod_nw_ecn",
- "mod_tcp_src",
- "mod_tcp_dst",
+ "mod_tp_src",
+ "mod_tp_dst",
]
return {
"load": decode_load_field,
@@ -299,9 +302,15 @@ def _field_action_decoders_args():
"mod_dl_src": EthMask,
"mod_nw_dst": IPMask,
"mod_nw_src": IPMask,
+ "mod_nw_ttl": decode_int,
+ "mod_vlan_vid": decode_int,
+ "set_vlan_vid": decode_int,
+ "mod_vlan_pcp": decode_int,
+ "set_vlan_pcp": decode_int,
"dec_ttl": decode_dec_ttl,
"dec_mpls_ttl": decode_flag,
"dec_nsh_ttl": decode_flag,
+ "delete_field": decode_field,
"check_pkt_larger": decode_chk_pkt_larger,
**{field: decode_default for field in field_default_decoders},
}
@@ -342,6 +351,14 @@ def _fw_action_decoders_args():
)
),
"ct_clear": decode_flag,
+ "fin_timeout": nested_kv_decoder(
+ KVDecoders(
+ {
+ "idle_timeout": decode_time,
+ "hard_timeout": decode_time,
+ }
+ )
+ ),
}
@staticmethod
@@ -382,22 +399,13 @@ def _clone_actions_decoders_args(action_decoders):
actions.
"""
return {
- "learn": decode_learn(
- {
- **action_decoders,
- "fin_timeout": nested_kv_decoder(
- KVDecoders(
- {
- "idle_timeout": decode_time,
- "hard_timeout": decode_time,
- }
- )
- ),
- }
- ),
+ "learn": decode_learn(action_decoders),
"clone": functools.partial(
decode_exec, KVDecoders(action_decoders)
),
+ "write_actions": functools.partial(
+ decode_exec, KVDecoders(action_decoders)
+ ),
}
@staticmethod
@@ -426,3 +434,15 @@ def _other_action_decoders_args():
)
),
}
+
+ @staticmethod
+ def _instruction_action_decoders_args():
+ """Generate the decoder arguments for instruction actions
+ (see man(7) ovs-actions)."""
+ return {
+ "meter": decode_int,
+ "clear_actions": decode_flag,
+ # write_actions moved to _clone actions
+ "write_metadata": decode_mask(64),
+ "goto_table": decode_int,
+ }
diff --git a/python/ovs/flow/ofp_act.py b/python/ovs/flow/ofp_act.py
index acb16cd9a62..c481d6fc721 100644
--- a/python/ovs/flow/ofp_act.py
+++ b/python/ovs/flow/ofp_act.py
@@ -9,9 +9,15 @@
decode_flag,
decode_int,
)
-from ovs.flow.kv import nested_kv_decoder, KVDecoders, KeyValue, KVParser
+from ovs.flow.kv import (
+ nested_kv_decoder,
+ KVDecoders,
+ KeyValue,
+ KVParser,
+ ParseError,
+)
from ovs.flow.list import nested_list_decoder, ListDecoders
-from ovs.flow.ofp_fields import field_decoders
+from ovs.flow.ofp_fields import field_decoders, field_aliases
def decode_output(value):
@@ -20,7 +26,9 @@ def decode_output(value):
Does not support field specification.
"""
if len(value.split(",")) > 1:
- return nested_kv_decoder()(value)
+ return nested_kv_decoder(
+ KVDecoders({"port": decode_default, "max_len": decode_int})
+ )(value)
try:
return {"port": int(value)}
except ValueError:
@@ -41,7 +49,17 @@ def decode_controller(value):
except ValueError:
pass
# controller(key[=val], ...)
- return nested_kv_decoder()(value)
+ return nested_kv_decoder(
+ KVDecoders(
+ {
+ "max_len": decode_int,
+ "reason": decode_default,
+ "id": decode_int,
+ "userdata": decode_default,
+ "pause": decode_flag,
+ }
+ )
+ )(value)
def decode_bundle_load(value):
@@ -141,6 +159,12 @@ def decode_field(value):
man page:
http://www.openvswitch.org/support/dist-docs/ovs-actions.7.txt."""
parts = value.strip("]\n\r").split("[")
+ if (
+ parts[0] not in field_decoders.keys()
+ and parts[0] not in field_aliases.keys()
+ ):
+ raise ParseError("Field not supported: {}".format(parts[0]))
+
result = {
"field": parts[0],
}
@@ -269,31 +293,36 @@ def decode_learn(action_decoders):
action decoding.
"""
- def decode_learn_field(decoder, value):
- """Generates a decoder to be used for the 'field' argument of the
- 'learn' action.
-
- The field can hold a value that should be decoded, either as a field,
- or as a the value (see man(7) ovs-actions).
-
- Args:
- decoder (callable): The decoder.
+ def learn_field_decoding_kv(key, value):
+ """Decodes a key, value pair from the learn action.
+ The key must be a decodable field. The value can be either a value
+ in the format defined for the field or another field.
"""
- if value in field_decoders.keys():
- # It's a field
- return value
- else:
- return decoder(value)
-
- learn_field_decoders = {
- field: functools.partial(decode_learn_field, decoder)
- for field, decoder in field_decoders.items()
- }
+ key_field = decode_field(key)
+ try:
+ return key, decode_field(value)
+ except ParseError:
+ return key, field_decoders.get(key_field.get("field"))(value)
+
+ def learn_field_decoding_free(key):
+ """Decodes the free fields found in the learn action.
+ Free fields indicate that the filed is to be copied from the original.
+ In order to express that in a dictionary, return the fieldspec as
+ value. So, the free fild NXM_OF_IP_SRC[], is encoded as:
+ "NXM_OF_IP_SRC[]": {
+ "field": "NXM_OF_IP_SRC"
+ }
+ That way we also ensure the actual free key is correct.
+ """
+ key_field = decode_field(key)
+ return key, key_field
+
learn_decoders = {
**action_decoders,
- **learn_field_decoders,
"idle_timeout": decode_time,
"hard_timeout": decode_time,
+ "fin_idle_timeout": decode_time,
+ "fin_hard_timeout": decode_time,
"priority": decode_int,
"cookie": decode_int,
"send_flow_rem": decode_flag,
@@ -303,4 +332,10 @@ def decode_learn_field(decoder, value):
"result_dst": decode_field,
}
- return functools.partial(decode_exec, KVDecoders(learn_decoders))
+ learn_decoder = KVDecoders(
+ learn_decoders,
+ default=learn_field_decoding_kv,
+ default_free=learn_field_decoding_free,
+ )
+
+ return functools.partial(decode_exec, learn_decoder)
diff --git a/python/ovs/tests/test_ofp.py b/python/ovs/tests/test_ofp.py
index 7a93b2fd453..389c4544a2e 100644
--- a/python/ovs/tests/test_ofp.py
+++ b/python/ovs/tests/test_ofp.py
@@ -331,12 +331,12 @@
{"table": 69},
{"delete_learned": True},
{"cookie": 3664728752},
- {"OXM_OF_METADATA[]": True},
+ {"OXM_OF_METADATA[]": {"field": "OXM_OF_METADATA"}},
{"eth_type": 2048},
- {"NXM_OF_IP_SRC[]": True},
+ {"NXM_OF_IP_SRC[]": {"field": "NXM_OF_IP_SRC"}},
{"ip_dst": IPMask("172.30.204.105/32")},
{"nw_proto": 6},
- {"NXM_OF_TCP_SRC[]": "NXM_OF_TCP_DST[]"},
+ {"NXM_OF_TCP_SRC[]": {"field": "NXM_OF_TCP_DST"}},
{
"load": {
"value": 1,
From d33e548fc7d7ae03cfeba8b70ba84b5b998beca8 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:45 +0100
Subject: [PATCH 082/833] python: Make key-value matching strict by default.
Currently, if a key is not found in the decoder information, we use the
default decoder which typically returns a string.
This not only means we can go out of sync with the C code without
noticing but it's also error prone as malformed flows could be parsed
without warning.
Make KeyValue parsing strict, raising an error if a decoder is not found
for a key.
This behaviour can be turned off globally by running 'KVDecoders.strict
= False' but it's generally not recommended. Also, if a KVDecoder does
need this default behavior, it can be explicitly configured specifying
it's default decoder.
Signed-off-by: Adrian Moreno
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
python/ovs/flow/kv.py | 25 ++++++++++++++++++-------
python/ovs/flow/list.py | 7 ++++++-
python/ovs/tests/test_kv.py | 20 ++++++++++----------
python/ovs/tests/test_ofp.py | 28 +++++++++++++++++++++++++++-
4 files changed, 61 insertions(+), 19 deletions(-)
diff --git a/python/ovs/flow/kv.py b/python/ovs/flow/kv.py
index 383d7ee7878..32463254b07 100644
--- a/python/ovs/flow/kv.py
+++ b/python/ovs/flow/kv.py
@@ -85,13 +85,17 @@ class KVDecoders(object):
reason, the default_free decoder, must return both the key and value to be
stored.
+ Globally defined "strict" variable controls what to do when decoders do not
+ contain a valid decoder for a key and a default function is not provided.
+ If set to True (default), a ParseError is raised.
+ If set to False, the value will be decoded as a string.
+
Args:
decoders (dict): Optional; A dictionary of decoders indexed by keyword.
default (callable): Optional; A function to use if a match is not
found in configured decoders. If not provided, the default behavior
- is to try to decode the value into an integer and, if that fails,
- just return the string as-is. The function must accept a the key
- and the value and return the decoded (key, value) tuple back.
+ depends on "strict". The function must accept a the key and a value
+ and return the decoded (key, value) tuple back.
default_free (callable): Optional; The decoder used if a match is not
found in configured decoders and it's a free value (e.g:
a value without a key) Defaults to returning the free value as
@@ -99,9 +103,11 @@ class KVDecoders(object):
The callable must accept a string and return a key-value pair.
"""
+ strict = True
+
def __init__(self, decoders=None, default=None, default_free=None):
self._decoders = decoders or dict()
- self._default = default or (lambda k, v: (k, decode_default(v)))
+ self._default = default
self._default_free = default_free or self._default_free_decoder
def decode(self, keyword, value_str):
@@ -127,9 +133,14 @@ def decode(self, keyword, value_str):
return keyword, value
else:
if value_str:
- return self._default(keyword, value_str)
- else:
- return self._default_free(keyword)
+ if self._default:
+ return self._default(keyword, value_str)
+ if self.strict:
+ raise ParseError(
+ "Cannot parse key {}: No decoder found".format(keyword)
+ )
+ return keyword, decode_default(value_str)
+ return self._default_free(keyword)
@staticmethod
def _default_free_decoder(key):
diff --git a/python/ovs/flow/list.py b/python/ovs/flow/list.py
index b1e9e3fcaa6..bc466ef89f0 100644
--- a/python/ovs/flow/list.py
+++ b/python/ovs/flow/list.py
@@ -31,7 +31,12 @@ def decode(self, index, value_str):
value_str (str): The value string to decode.
"""
if index < 0 or index >= len(self._decoders):
- return self._default_decoder(index, value_str)
+ if self._default_decoder:
+ return self._default_decoder(index, value_str)
+ else:
+ raise ParseError(
+ f"Cannot decode element {index} in list: {value_str}"
+ )
try:
key = self._decoders[index][0]
diff --git a/python/ovs/tests/test_kv.py b/python/ovs/tests/test_kv.py
index c5b66de887b..76887498a57 100644
--- a/python/ovs/tests/test_kv.py
+++ b/python/ovs/tests/test_kv.py
@@ -1,6 +1,9 @@
import pytest
-from ovs.flow.kv import KVParser, KeyValue
+from ovs.flow.kv import KVParser, KVDecoders, KeyValue
+from ovs.flow.decoders import decode_default
+
+decoders = KVDecoders(default=lambda k, v: (k, decode_default(v)))
@pytest.mark.parametrize(
@@ -9,7 +12,7 @@
(
(
"cookie=0x0, duration=147566.365s, table=0, n_packets=39, n_bytes=2574, idle_age=65534, hard_age=65534", # noqa: E501
- None,
+ decoders,
),
[
KeyValue("cookie", 0),
@@ -24,7 +27,7 @@
(
(
"load:0x4->NXM_NX_REG13[],load:0x9->NXM_NX_REG11[],load:0x8->NXM_NX_REG12[],load:0x1->OXM_OF_METADATA[],load:0x1->NXM_NX_REG14[],mod_dl_src:0a:58:a9:fe:00:02,resubmit(,8)", # noqa: E501
- None,
+ decoders,
),
[
KeyValue("load", "0x4->NXM_NX_REG13[]"),
@@ -36,20 +39,17 @@
KeyValue("resubmit", ",8"),
],
),
+ (("l1(l2(l3(l4())))", decoders), [KeyValue("l1", "l2(l3(l4()))")]),
(
- ("l1(l2(l3(l4())))", None),
- [KeyValue("l1", "l2(l3(l4()))")]
- ),
- (
- ("l1(l2(l3(l4()))),foo:bar", None),
+ ("l1(l2(l3(l4()))),foo:bar", decoders),
[KeyValue("l1", "l2(l3(l4()))"), KeyValue("foo", "bar")],
),
(
- ("enqueue:1:2,output=2", None),
+ ("enqueue:1:2,output=2", decoders),
[KeyValue("enqueue", "1:2"), KeyValue("output", 2)],
),
(
- ("value_to_reg(100)->someReg[10],foo:bar", None),
+ ("value_to_reg(100)->someReg[10],foo:bar", decoders),
[
KeyValue("value_to_reg", "(100)->someReg[10]"),
KeyValue("foo", "bar"),
diff --git a/python/ovs/tests/test_ofp.py b/python/ovs/tests/test_ofp.py
index 389c4544a2e..328ab7285ea 100644
--- a/python/ovs/tests/test_ofp.py
+++ b/python/ovs/tests/test_ofp.py
@@ -2,7 +2,7 @@
import pytest
from ovs.flow.ofp import OFPFlow
-from ovs.flow.kv import KeyValue
+from ovs.flow.kv import KeyValue, ParseError
from ovs.flow.decoders import EthMask, IPMask, decode_mask
@@ -509,11 +509,37 @@
),
],
),
+ (
+ "actions=doesnotexist(1234)",
+ ParseError,
+ ),
+ (
+ "actions=learn(eth_type=nofield)",
+ ParseError,
+ ),
+ (
+ "actions=learn(nofield=eth_type)",
+ ParseError,
+ ),
+ (
+ "nofield=0x123 actions=drop",
+ ParseError,
+ ),
+ (
+ "actions=load:0x12334->NOFILED",
+ ParseError,
+ ),
],
)
def test_act(input_string, expected):
+ if isinstance(expected, type):
+ with pytest.raises(expected):
+ ofp = OFPFlow(input_string)
+ return
+
ofp = OFPFlow(input_string)
actions = ofp.actions_kv
+
for i in range(len(expected)):
assert expected[i].key == actions[i].key
assert expected[i].value == actions[i].value
From 75a6e8db9c5f9dc2887cae1555d977f0fdf08471 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:46 +0100
Subject: [PATCH 083/833] python: Return list of actions for odp action clone.
Sometimes we don't want to return the result of a nested key-value
decoding as a dictionary but as a list of dictionaries. This happens
when we parse actions where keys can be repeated.
Refactor code that already takes that into account from ofp_act.py to
kv.py and use it for datapath action "clone".
Signed-off-by: Adrian Moreno
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
python/ovs/flow/kv.py | 21 +++++++++++++++++++-
python/ovs/flow/odp.py | 6 ++++--
python/ovs/flow/ofp.py | 14 ++++++-------
python/ovs/flow/ofp_act.py | 18 +----------------
python/ovs/tests/test_odp.py | 38 +++++++++++++++++++++++++-----------
5 files changed, 59 insertions(+), 38 deletions(-)
diff --git a/python/ovs/flow/kv.py b/python/ovs/flow/kv.py
index 32463254b07..3138db00880 100644
--- a/python/ovs/flow/kv.py
+++ b/python/ovs/flow/kv.py
@@ -320,7 +320,26 @@ def decode_nested_kv(decoders, value):
return {kv.key: kv.value for kv in parser.kv()}
-def nested_kv_decoder(decoders=None):
+def decode_nested_kv_list(decoders, value):
+ """A key-value decoder that extracts nested key-value pairs and returns
+ them in a list of dictionary.
+
+ Args:
+ decoders (KVDecoders): The KVDecoders to use.
+ value (str): The value string to decode.
+ """
+ if not value:
+ # Mark as flag
+ return True
+
+ parser = KVParser(value, decoders)
+ parser.parse()
+ return [{kv.key: kv.value} for kv in parser.kv()]
+
+
+def nested_kv_decoder(decoders=None, is_list=False):
"""Helper function that creates a nested kv decoder with given
KVDecoders."""
+ if is_list:
+ return functools.partial(decode_nested_kv_list, decoders)
return functools.partial(decode_nested_kv, decoders)
diff --git a/python/ovs/flow/odp.py b/python/ovs/flow/odp.py
index 3bc3aec8e00..db63afc8d64 100644
--- a/python/ovs/flow/odp.py
+++ b/python/ovs/flow/odp.py
@@ -337,7 +337,8 @@ def _action_decoders_args():
}
_decoders["clone"] = nested_kv_decoder(
- KVDecoders(decoders=_decoders, default_free=decode_free_output)
+ KVDecoders(decoders=_decoders, default_free=decode_free_output),
+ is_list=True,
)
return {
@@ -350,7 +351,8 @@ def _action_decoders_args():
KVDecoders(
decoders=_decoders,
default_free=decode_free_output,
- )
+ ),
+ is_list=True,
),
}
)
diff --git a/python/ovs/flow/ofp.py b/python/ovs/flow/ofp.py
index 3d79ed6ad77..8f272736173 100644
--- a/python/ovs/flow/ofp.py
+++ b/python/ovs/flow/ofp.py
@@ -31,7 +31,6 @@
decode_dec_ttl,
decode_chk_pkt_larger,
decode_zone,
- decode_exec,
decode_learn,
)
@@ -336,8 +335,7 @@ def _fw_action_decoders_args():
"table": decode_int,
"nat": decode_nat,
"force": decode_flag,
- "exec": functools.partial(
- decode_exec,
+ "exec": nested_kv_decoder(
KVDecoders(
{
**OFPFlow._encap_actions_decoders_args(),
@@ -345,6 +343,7 @@ def _fw_action_decoders_args():
**OFPFlow._meta_action_decoders_args(),
}
),
+ is_list=True,
),
"alg": decode_default,
}
@@ -359,6 +358,7 @@ def _fw_action_decoders_args():
}
)
),
+ # learn moved to _clone actions.
}
@staticmethod
@@ -400,11 +400,11 @@ def _clone_actions_decoders_args(action_decoders):
"""
return {
"learn": decode_learn(action_decoders),
- "clone": functools.partial(
- decode_exec, KVDecoders(action_decoders)
+ "clone": nested_kv_decoder(
+ KVDecoders(action_decoders), is_list=True
),
- "write_actions": functools.partial(
- decode_exec, KVDecoders(action_decoders)
+ "write_actions": nested_kv_decoder(
+ KVDecoders(action_decoders), is_list=True
),
}
diff --git a/python/ovs/flow/ofp_act.py b/python/ovs/flow/ofp_act.py
index c481d6fc721..5eaf0b2185a 100644
--- a/python/ovs/flow/ofp_act.py
+++ b/python/ovs/flow/ofp_act.py
@@ -1,8 +1,5 @@
"""Defines decoders for OpenFlow actions.
"""
-
-import functools
-
from ovs.flow.decoders import (
decode_default,
decode_time,
@@ -258,19 +255,6 @@ def decode_zone(value):
return decode_field(value)
-def decode_exec(action_decoders, value):
- """Decodes the value of the 'exec' keyword (part of the ct action).
-
- Args:
- decode_actions (KVDecoders): The decoders to be used to decode the
- nested exec.
- value (string): The string to be decoded.
- """
- exec_parser = KVParser(value, action_decoders)
- exec_parser.parse()
- return [{kv.key: kv.value} for kv in exec_parser.kv()]
-
-
def decode_learn(action_decoders):
"""Create the decoder to be used to decode the 'learn' action.
@@ -338,4 +322,4 @@ def learn_field_decoding_free(key):
default_free=learn_field_decoding_free,
)
- return functools.partial(decode_exec, learn_decoder)
+ return nested_kv_decoder(learn_decoder, is_list=True)
diff --git a/python/ovs/tests/test_odp.py b/python/ovs/tests/test_odp.py
index 715be386940..f8017ca8a16 100644
--- a/python/ovs/tests/test_odp.py
+++ b/python/ovs/tests/test_odp.py
@@ -453,21 +453,37 @@ def test_odp_fields(input_string, expected):
],
),
(
- "actions:clone(1)" ",clone(clone(push_vlan(vid=12,pcp=0),2),1)",
+ "actions:clone(1),clone(clone(push_vlan(vid=12,pcp=0),2),1)",
[
- KeyValue("clone", {"output": {"port": 1}}),
+ KeyValue("clone", [{"output": {"port": 1}}]),
KeyValue(
"clone",
- {
- "output": {"port": 1},
- "clone": {
- "push_vlan": {
- "vid": 12,
- "pcp": 0,
- },
- "output": {"port": 2},
+ [
+ {
+ "clone": [
+ {
+ "push_vlan": {
+ "vid": 12,
+ "pcp": 0,
+ },
+ },
+ {"output": {"port": 2}},
+ ]
},
- },
+ {"output": {"port": 1}},
+ ],
+ ),
+ ],
+ ),
+ (
+ "actions:clone(recirc(0x1),recirc(0x2))",
+ [
+ KeyValue(
+ "clone",
+ [
+ {"recirc": 1},
+ {"recirc": 2},
+ ],
),
],
),
From 1850e5e6891282d84bdeb7b7100166cfd8deed28 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:47 +0100
Subject: [PATCH 084/833] python: Support case-insensitive OpenFlow actions.
OpenFlow actions names can be capitalized so in order to support this,
support case-insensitive KVDecoders and use it in Openflow actions.
Signed-off-by: Adrian Moreno
Signed-off-by: Ilya Maximets
---
python/ovs/flow/kv.py | 17 ++++++++++++++---
python/ovs/flow/ofp.py | 7 ++++---
python/ovs/tests/test_ofp.py | 15 +++++++++++++++
3 files changed, 33 insertions(+), 6 deletions(-)
diff --git a/python/ovs/flow/kv.py b/python/ovs/flow/kv.py
index 3138db00880..f7d7be0cf1e 100644
--- a/python/ovs/flow/kv.py
+++ b/python/ovs/flow/kv.py
@@ -105,10 +105,17 @@ class KVDecoders(object):
strict = True
- def __init__(self, decoders=None, default=None, default_free=None):
- self._decoders = decoders or dict()
+ def __init__(self, decoders=None, default=None, default_free=None,
+ ignore_case=False):
+ if not decoders:
+ self._decoders = dict()
+ elif ignore_case:
+ self._decoders = {k.lower(): v for k, v in decoders.items()}
+ else:
+ self._decoders = decoders
self._default = default
self._default_free = default_free or self._default_free_decoder
+ self._ignore_case = ignore_case
def decode(self, keyword, value_str):
"""Decode a keyword and value.
@@ -121,7 +128,11 @@ def decode(self, keyword, value_str):
The key (str) and value(any) to be stored.
"""
- decoder = self._decoders.get(keyword)
+ decoder = None
+ if self._ignore_case:
+ decoder = self._decoders.get(keyword.lower())
+ else:
+ decoder = self._decoders.get(keyword)
if decoder:
result = decoder(value_str)
if isinstance(result, KeyValue):
diff --git a/python/ovs/flow/ofp.py b/python/ovs/flow/ofp.py
index 8f272736173..bf832f71b98 100644
--- a/python/ovs/flow/ofp.py
+++ b/python/ovs/flow/ofp.py
@@ -246,7 +246,8 @@ def _gen_action_decoders():
}
clone_actions = OFPFlow._clone_actions_decoders_args(actions)
actions.update(clone_actions)
- return KVDecoders(actions, default_free=decode_free_output)
+ return KVDecoders(actions, default_free=decode_free_output,
+ ignore_case=True)
@staticmethod
def _output_actions_decoders_args():
@@ -401,10 +402,10 @@ def _clone_actions_decoders_args(action_decoders):
return {
"learn": decode_learn(action_decoders),
"clone": nested_kv_decoder(
- KVDecoders(action_decoders), is_list=True
+ KVDecoders(action_decoders, ignore_case=True), is_list=True
),
"write_actions": nested_kv_decoder(
- KVDecoders(action_decoders), is_list=True
+ KVDecoders(action_decoders, ignore_case=True), is_list=True
),
}
diff --git a/python/ovs/tests/test_ofp.py b/python/ovs/tests/test_ofp.py
index 328ab7285ea..5aa8d591bf6 100644
--- a/python/ovs/tests/test_ofp.py
+++ b/python/ovs/tests/test_ofp.py
@@ -509,6 +509,21 @@
),
],
),
+ (
+ "actions=POP_VLAN,push_vlan:0x8100,NORMAL,clone(MOD_NW_SRC:192.168.1.1,resubmit(,10))", # noqa: E501
+ [
+ KeyValue("POP_VLAN", True),
+ KeyValue("push_vlan", 0x8100),
+ KeyValue("output", {"port": "NORMAL"}),
+ KeyValue(
+ "clone",
+ [
+ {"MOD_NW_SRC": netaddr.IPAddress("192.168.1.1")},
+ {"resubmit": {"port": "", "table": 10}},
+ ]
+ ),
+ ],
+ ),
(
"actions=doesnotexist(1234)",
ParseError,
From 542fdad701403c11cfe8356957f934fa657c1742 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:48 +0100
Subject: [PATCH 085/833] python: Fix output=CONTROLLER action.
When CONTROLLER is used as free key, it means output=CONTROLLER which is
handled by decode_controller. However, it must output the KV in the
right format: "output": {"format": "CONTROLLER"}.
Signed-off-by: Adrian Moreno
Signed-off-by: Ilya Maximets
---
python/ovs/flow/ofp_act.py | 2 +-
python/ovs/tests/test_ofp.py | 10 +++++++++-
2 files changed, 10 insertions(+), 2 deletions(-)
diff --git a/python/ovs/flow/ofp_act.py b/python/ovs/flow/ofp_act.py
index 5eaf0b2185a..c540443eaea 100644
--- a/python/ovs/flow/ofp_act.py
+++ b/python/ovs/flow/ofp_act.py
@@ -35,7 +35,7 @@ def decode_output(value):
def decode_controller(value):
"""Decodes the controller action."""
if not value:
- return KeyValue("output", "controller")
+ return KeyValue("output", {"port": "CONTROLLER"})
else:
# Try controller:max_len
try:
diff --git a/python/ovs/tests/test_ofp.py b/python/ovs/tests/test_ofp.py
index 5aa8d591bf6..e17188e2b44 100644
--- a/python/ovs/tests/test_ofp.py
+++ b/python/ovs/tests/test_ofp.py
@@ -22,7 +22,7 @@
(
"actions=controller,controller:200",
[
- KeyValue("output", "controller"),
+ KeyValue("output", {"port": "CONTROLLER"}),
KeyValue("controller", {"max_len": 200}),
],
),
@@ -524,6 +524,14 @@
),
],
),
+ (
+ "actions=MOD_NW_SRC:192.168.1.1,CONTROLLER,CONTROLLER:123",
+ [
+ KeyValue("MOD_NW_SRC", netaddr.IPAddress("192.168.1.1")),
+ KeyValue("output", {"port": "CONTROLLER"}),
+ KeyValue("CONTROLLER", {"max_len": 123}),
+ ],
+ ),
(
"actions=doesnotexist(1234)",
ParseError,
From c395e9810e07ab957676b4f75e9cacd39dca6839 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:49 +0100
Subject: [PATCH 086/833] python: Interpret free keys as output in clone.
clone-like actions can also output to ports by specifying the port name.
Signed-off-by: Adrian Moreno
Signed-off-by: Ilya Maximets
---
python/ovs/flow/ofp.py | 6 ++++--
python/ovs/tests/test_ofp.py | 13 +++++++++++++
2 files changed, 17 insertions(+), 2 deletions(-)
diff --git a/python/ovs/flow/ofp.py b/python/ovs/flow/ofp.py
index bf832f71b98..eac8d08513f 100644
--- a/python/ovs/flow/ofp.py
+++ b/python/ovs/flow/ofp.py
@@ -402,10 +402,12 @@ def _clone_actions_decoders_args(action_decoders):
return {
"learn": decode_learn(action_decoders),
"clone": nested_kv_decoder(
- KVDecoders(action_decoders, ignore_case=True), is_list=True
+ KVDecoders(action_decoders, default_free=decode_free_output,
+ ignore_case=True), is_list=True
),
"write_actions": nested_kv_decoder(
- KVDecoders(action_decoders, ignore_case=True), is_list=True
+ KVDecoders(action_decoders, default_free=decode_free_output,
+ ignore_case=True), is_list=True
),
}
diff --git a/python/ovs/tests/test_ofp.py b/python/ovs/tests/test_ofp.py
index e17188e2b44..27bcf0c47cb 100644
--- a/python/ovs/tests/test_ofp.py
+++ b/python/ovs/tests/test_ofp.py
@@ -532,6 +532,19 @@
KeyValue("CONTROLLER", {"max_len": 123}),
],
),
+ (
+ "actions=LOCAL,clone(myport,CONTROLLER)",
+ [
+ KeyValue("output", {"port": "LOCAL"}),
+ KeyValue(
+ "clone",
+ [
+ {"output": {"port": "myport"}},
+ {"output": {"port": "CONTROLLER"}},
+ ]
+ ),
+ ],
+ ),
(
"actions=doesnotexist(1234)",
ParseError,
From fc3f918cb56110884092106af8723ff24e63a9c2 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:50 +0100
Subject: [PATCH 087/833] tests: Verify flows in ofp-actions are parseable.
Create a small helper script and check that flows used in ofp-actions.at
are parseable.
Signed-off-by: Adrian Moreno
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
tests/automake.mk | 2 ++
tests/ofp-actions.at | 18 +++++++++++++++++
tests/test-ofparse.py | 45 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 65 insertions(+)
create mode 100755 tests/test-ofparse.py
diff --git a/tests/automake.mk b/tests/automake.mk
index d509cf93504..63a0490adfb 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -19,6 +19,7 @@ EXTRA_DIST += \
$(OVSDB_CLUSTER_TESTSUITE) \
tests/atlocal.in \
$(srcdir)/package.m4 \
+ $(srcdir)/tests/test-ofparse.py \
$(srcdir)/tests/testsuite \
$(srcdir)/tests/testsuite.patch
@@ -522,6 +523,7 @@ CHECK_PYFILES = \
tests/test-json.py \
tests/test-jsonrpc.py \
tests/test-l7.py \
+ tests/test-ofparse.py \
tests/test-ovsdb.py \
tests/test-reconnect.py \
tests/test-stream.py \
diff --git a/tests/ofp-actions.at b/tests/ofp-actions.at
index 9d820eba6d4..40a23bb15dc 100644
--- a/tests/ofp-actions.at
+++ b/tests/ofp-actions.at
@@ -329,6 +329,7 @@ AT_CAPTURE_FILE([experr])
AT_CHECK(
[ovs-ofctl '-vPATTERN:console:%c|%p|%m' parse-actions OpenFlow10 < input.txt],
[0], [expout], [experr])
+AT_CHECK([cat expout | grep 'actions=' | test-ofparse.py])
AT_CLEANUP
AT_SETUP([OpenFlow 1.0 "instruction" translations])
@@ -359,6 +360,7 @@ AT_CAPTURE_FILE([experr])
AT_CHECK(
[ovs-ofctl '-vPATTERN:console:%c|%p|%m' parse-instructions OpenFlow10 < input.txt],
[0], [expout], [experr])
+AT_CHECK([cat expout | grep 'actions=' | test-ofparse.py])
AT_CLEANUP
AT_SETUP([OpenFlow 1.1 action translation])
@@ -502,6 +504,7 @@ AT_CAPTURE_FILE([experr])
AT_CHECK(
[ovs-ofctl '-vPATTERN:console:%c|%p|%m' parse-actions OpenFlow11 < input.txt],
[0], [expout], [experr])
+AT_CHECK([cat expout | grep 'actions=' | test-ofparse.py])
AT_CLEANUP
AT_SETUP([OpenFlow 1.1 instruction translation])
@@ -737,6 +740,7 @@ AT_CAPTURE_FILE([experr])
AT_CHECK(
[ovs-ofctl '-vPATTERN:console:%c|%p|%m' parse-actions OpenFlow12 < input.txt],
[0], [expout], [experr])
+AT_CHECK([cat expout | grep 'actions=' | test-ofparse.py])
AT_CLEANUP
dnl Our primary goal here is to verify OpenFlow 1.3-specific changes,
@@ -798,6 +802,7 @@ AT_CAPTURE_FILE([experr])
AT_CHECK(
[ovs-ofctl '-vPATTERN:console:%c|%p|%m' parse-actions OpenFlow13 < input.txt],
[0], [expout], [experr])
+AT_CHECK([cat expout | grep 'actions=' | test-ofparse.py])
AT_CLEANUP
dnl Our primary goal here is to verify that OpenFlow 1.5-specific changes,
@@ -827,17 +832,20 @@ AT_CAPTURE_FILE([experr])
AT_CHECK(
[ovs-ofctl '-vPATTERN:console:%c|%p|%m' parse-actions OpenFlow15 < input.txt],
[0], [expout], [experr])
+AT_CHECK([cat expout | grep 'actions=' | test-ofparse.py])
AT_CLEANUP
AT_SETUP([ofp-actions - inconsistent MPLS actions])
OVS_VSWITCHD_START
dnl OK: Use fin_timeout action on TCP flow
AT_CHECK([ovs-ofctl -O OpenFlow11 -vwarn add-flow br0 'tcp actions=fin_timeout(idle_timeout=1)'])
+AT_CHECK([echo 'tcp actions=fin_timeout(idle_timeout=1)' | test-ofparse.py])
dnl Bad: Use fin_timeout action on TCP flow that has been converted to MPLS
AT_CHECK([ovs-ofctl -O OpenFlow11 -vwarn add-flow br0 'tcp actions=push_mpls:0x8847,fin_timeout(idle_timeout=1)'],
[1], [], [dnl
ovs-ofctl: none of the usable flow formats (OpenFlow10,NXM) is among the allowed flow formats (OpenFlow11)
])
+AT_CHECK([echo 'tcp actions=push_mpls:0x8847,fin_timeout(idle_timeout=1)' | test-ofparse.py])
OVS_VSWITCHD_STOP
AT_CLEANUP
@@ -853,6 +861,8 @@ AT_CHECK([ovs-ofctl -O OpenFlow10 dump-flows br0 | ofctl_strip], [0], [dnl
NXST_FLOW reply:
mpls actions=load:0xa->OXM_OF_MPLS_LABEL[[]]
])
+AT_CHECK([echo 'mpls actions=set_field:10->mpls_label' | test-ofparse.py])
+AT_CHECK([echo 'mpls actions=load:0xa->OXM_OF_MPLS_LABEL[[]]'| test-ofparse.py])
OVS_VSWITCHD_STOP
AT_CLEANUP
@@ -862,14 +872,17 @@ OVS_VSWITCHD_START
dnl OpenFlow 1.0 has an "enqueue" action. For OpenFlow 1.1+, we translate
dnl it to a series of actions that accomplish the same thing.
AT_CHECK([ovs-ofctl -O OpenFlow10 add-flow br0 'actions=enqueue(123,456)'])
+AT_CHECK([echo 'actions=enqueue(123,456)' | test-ofparse.py])
AT_CHECK([ovs-ofctl -O OpenFlow10 dump-flows br0 | ofctl_strip], [0], [dnl
NXST_FLOW reply:
actions=enqueue:123:456
])
+AT_CHECK([echo 'actions=enqueue:123:456' | test-ofparse.py])
AT_CHECK([ovs-ofctl -O OpenFlow13 dump-flows br0 | ofctl_strip], [0], [dnl
OFPST_FLOW reply (OF1.3):
reset_counts actions=set_queue:456,output:123,pop_queue
])
+AT_CHECK([echo 'actions=set_queue:456,output:123,pop_queue' | test-ofparse.py])
OVS_VSWITCHD_STOP
AT_CLEANUP
@@ -887,6 +900,8 @@ AT_CHECK([ovs-ofctl -O OpenFlow11 dump-flows br0 | ofctl_strip], [0], [dnl
OFPST_FLOW reply (OF1.1):
ip actions=mod_nw_ttl:123
])
+AT_CHECK([echo 'ip,actions=mod_nw_ttl:123' | test-ofparse.py])
+AT_CHECK([echo 'ip actions=load:0x7b->NXM_NX_IP_TTL[[]]' | test-ofparse.py])
OVS_VSWITCHD_STOP
AT_CLEANUP
@@ -898,10 +913,12 @@ dnl OpenFlow 1.1, but no other version, has a "mod_nw_ecn" action.
dnl Check that we translate it properly for OF1.0 and OF1.2.
dnl (OF1.3+ should be the same as OF1.2.)
AT_CHECK([ovs-ofctl -O OpenFlow11 add-flow br0 'ip,actions=mod_nw_ecn:2'])
+AT_CHECK([echo 'ip,actions=mod_nw_ecn:2' | test-ofparse.py])
AT_CHECK([ovs-ofctl -O OpenFlow10 dump-flows br0 | ofctl_strip], [0], [dnl
NXST_FLOW reply:
ip actions=load:0x2->NXM_NX_IP_ECN[[]]
])
+AT_CHECK([echo 'ip actions=load:0x2->NXM_NX_IP_ECN[[]]' | test-ofparse.py])
AT_CHECK([ovs-ofctl -O OpenFlow11 dump-flows br0 | ofctl_strip], [0], [dnl
OFPST_FLOW reply (OF1.1):
ip actions=mod_nw_ecn:2
@@ -910,6 +927,7 @@ AT_CHECK([ovs-ofctl -O OpenFlow12 dump-flows br0 | ofctl_strip], [0], [dnl
OFPST_FLOW reply (OF1.2):
ip actions=set_field:2->nw_ecn
])
+AT_CHECK([echo 'ip actions=set_field:2->nw_ecn' | test-ofparse.py])
dnl Check that OF1.2+ set_field to set ECN is translated into the OF1.1
dnl mod_nw_ecn action.
diff --git a/tests/test-ofparse.py b/tests/test-ofparse.py
new file mode 100755
index 00000000000..ba96e8344c2
--- /dev/null
+++ b/tests/test-ofparse.py
@@ -0,0 +1,45 @@
+#!/usr/bin/env python3
+# Copyright (c) 2022 Red Hat, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""test-ofparse reads flows from stdin and tries to parse them using
+the python flow parsing library.
+"""
+
+import fileinput
+import sys
+
+try:
+ from ovs.flow.ofp import OFPFlow
+except ImportError:
+ sys.exit(0)
+
+
+def main():
+ for flow in fileinput.input():
+ try:
+ result_flow = OFPFlow(flow)
+ if flow != str(result_flow):
+ print("in: {}".format(flow))
+ print("out: {}".format(str(result_flow)))
+ raise ValueError("Flow conversion back to string failed")
+ except Exception as e:
+ print("Error parsing flow {}: {}".format(flow, e))
+ return 1
+
+ return 0
+
+
+if __name__ == "__main__":
+ sys.exit(main())
From 22eb2243864d42580dd1447cf09906d4d34fbb68 Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:51 +0100
Subject: [PATCH 088/833] tests: Verify flows in odp.at are parseable.
Create a small helper script and check that flows tested in odp.at are
parseable.
Signed-off-by: Adrian Moreno
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
tests/automake.mk | 2 ++
tests/odp.at | 12 +++++++++++-
tests/test-dpparse.py | 45 +++++++++++++++++++++++++++++++++++++++++++
3 files changed, 58 insertions(+), 1 deletion(-)
create mode 100755 tests/test-dpparse.py
diff --git a/tests/automake.mk b/tests/automake.mk
index 63a0490adfb..4091a2796d8 100644
--- a/tests/automake.mk
+++ b/tests/automake.mk
@@ -19,6 +19,7 @@ EXTRA_DIST += \
$(OVSDB_CLUSTER_TESTSUITE) \
tests/atlocal.in \
$(srcdir)/package.m4 \
+ $(srcdir)/tests/test-dpparse.py \
$(srcdir)/tests/test-ofparse.py \
$(srcdir)/tests/testsuite \
$(srcdir)/tests/testsuite.patch
@@ -520,6 +521,7 @@ CHECK_PYFILES = \
tests/mfex_fuzzy.py \
tests/ovsdb-monitor-sort.py \
tests/test-daemon.py \
+ tests/test-dpparse.py \
tests/test-json.py \
tests/test-jsonrpc.py \
tests/test-l7.py \
diff --git a/tests/odp.at b/tests/odp.at
index 88b7cfd917f..41eb726e922 100644
--- a/tests/odp.at
+++ b/tests/odp.at
@@ -104,9 +104,9 @@ dnl specified. We can skip these.
sed -i'back' 's/\(skb_mark(0)\),\(ct\)/\1,ct_state(0),ct_zone(0),\2/' odp-out.txt
sed -i'back' 's/\(skb_mark([[^)]]*)\),\(recirc\)/\1,ct_state(0),ct_zone(0),ct_mark(0),ct_label(0),\2/' odp-out.txt
sed -i'back' 's/\(in_port(1)\),\(eth\)/\1,packet_type(ns=0,id=0),\2/' odp-out.txt
-
AT_CHECK_UNQUOTED([ovstest test-odp parse-keys < odp-in.txt], [0], [`cat odp-out.txt`
])
+AT_CHECK_UNQUOTED([cat odp-in.txt | sed 's/^#.*//' | sed 's/$/ actions:drop/' | test-dpparse.py])
AT_CLEANUP
AT_SETUP([OVS datapath wildcarded key parsing and formatting - valid forms])
@@ -194,6 +194,7 @@ sed -n 's/,frag=no),.*/,frag=later)/p' odp-base.txt
AT_CAPTURE_FILE([odp.txt])
AT_CHECK_UNQUOTED([ovstest test-odp parse-wc-keys < odp.txt], [0], [`cat odp.txt`
])
+AT_CHECK_UNQUOTED([cat odp.txt | sed 's/^#.*//' | sed 's/$/ actions:drop/' | test-dpparse.py])
AT_CLEANUP
AT_SETUP([OVS datapath wildcarded key filtering.])
@@ -241,24 +242,31 @@ in_port(1),eth(src=00:01:02:03:04:05,dst=10:11:12:13:14:15),eth_type(0x86dd),ipv
])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='dl_type=0x1235' < odp-base.txt], [0], [`cat odp-eth-type.txt`
])
+AT_CHECK_UNQUOTED([cat odp-eth-type.txt | sed 's/^#.*//' | sed 's/$/ actions:drop/' | test-dpparse.py])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='dl_vlan=99' < odp-vlan-base.txt], [0], [`cat odp-vlan.txt`
])
+AT_CHECK_UNQUOTED([cat odp-vlan.txt | sed 's/^#.*//' | sed 's/$/ actions:drop/' | test-dpparse.py])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='dl_vlan=99,ip' < odp-vlan-base.txt], [0], [`cat odp-vlan.txt`
])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='ip,nw_src=35.8.2.199' < odp-base.txt], [0], [`cat odp-ipv4.txt`
])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='ip,nw_dst=172.16.0.199' < odp-base.txt], [0], [`cat odp-ipv4.txt`
])
+AT_CHECK_UNQUOTED([cat odp-ipv4.txt | sed 's/^#.*//' | sed 's/$/ actions:drop/' | test-dpparse.py])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='dl_type=0x0800,nw_src=35.8.2.199,nw_dst=172.16.0.199' < odp-base.txt], [0], [`cat odp-ipv4.txt`
])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='icmp,nw_src=35.8.2.199' < odp-base.txt], [0], [`cat odp-icmp.txt`
])
+AT_CHECK_UNQUOTED([cat odp-icmp.txt | sed 's/^#.*//' | sed 's/$/ actions:drop/' | test-dpparse.py])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='arp,arp_spa=1.2.3.5' < odp-base.txt], [0], [`cat odp-arp.txt`
])
+AT_CHECK_UNQUOTED([cat odp-arp.txt | sed 's/^#.*//' | sed 's/$/ actions:drop/' | test-dpparse.py])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='tcp,tp_src=90' < odp-base.txt], [0], [`cat odp-tcp.txt`
])
+AT_CHECK_UNQUOTED([cat odp-tcp.txt | sed 's/^#.*//' | sed 's/$/ actions:drop/' | test-dpparse.py])
AT_CHECK_UNQUOTED([ovstest test-odp parse-filter filter='tcp6,tp_src=90' < odp-base.txt], [0], [`cat odp-tcp6.txt`
])
+AT_CHECK_UNQUOTED([cat odp-tcp6.txt | sed 's/^#.*//' | sed 's/$/ actions:drop/' | test-dpparse.py])
AT_CLEANUP
AT_SETUP([OVS datapath actions parsing and formatting - valid forms])
@@ -391,6 +399,7 @@ add_mpls(label=200,tc=7,ttl=64,bos=1,eth_type=0x8847)
AT_CHECK_UNQUOTED([ovstest test-odp parse-actions < actions.txt], [0],
[`cat actions.txt`
])
+AT_CHECK_UNQUOTED([cat actions.txt | sed 's/^/actions:/' | test-dpparse.py])
AT_CLEANUP
AT_SETUP([OVS datapath actions parsing and formatting - invalid forms])
@@ -436,6 +445,7 @@ odp_actions_from_string: error
`cat actions.txt | head -3 | tail -1`
odp_actions_from_string: error
])
+AT_CHECK_UNQUOTED([cat actions.txt | sed 's/^/actions:/' | test-dpparse.py])
AT_CLEANUP
AT_SETUP([OVS datapath actions parsing and formatting - actions too long])
diff --git a/tests/test-dpparse.py b/tests/test-dpparse.py
new file mode 100755
index 00000000000..7762e5e8a90
--- /dev/null
+++ b/tests/test-dpparse.py
@@ -0,0 +1,45 @@
+#!/usr/bin/env python3
+# Copyright (c) 2022 Red Hat, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+"""test-dpparse reads flows from stdin and tries to parse them using
+the python flow parsing library.
+"""
+
+import fileinput
+import sys
+
+try:
+ from ovs.flow.odp import ODPFlow
+except ImportError:
+ sys.exit(0)
+
+
+def main():
+ for flow in fileinput.input():
+ try:
+ result_flow = ODPFlow(flow)
+ if flow != str(result_flow):
+ print("in: {}".format(flow))
+ print("out: {}".format(str(result_flow)))
+ raise ValueError("Flow conversion back to string failed")
+ except Exception as e:
+ print("Error parsing flow {}: {}".format(flow, e))
+ return 1
+
+ return 0
+
+
+if __name__ == "__main__":
+ sys.exit(main())
From 863d2e1a8c2a6ced49a49024c094ef6a9aa7e55a Mon Sep 17 00:00:00 2001
From: Adrian Moreno
Date: Mon, 19 Dec 2022 17:13:52 +0100
Subject: [PATCH 089/833] python: Don't exit OFPFlow constructor.
Returning None in a constructor does not make sense and is just error
prone. Removing what was a leftover from an attempt to handle a common
error case of trying to parse what is commonly outputted by ovs-ofctl.
This should be done by the caller anyway.
Signed-off-by: Adrian Moreno
Acked-by: Mike Pattrick
Signed-off-by: Ilya Maximets
---
python/ovs/flow/ofp.py | 3 ---
1 file changed, 3 deletions(-)
diff --git a/python/ovs/flow/ofp.py b/python/ovs/flow/ofp.py
index eac8d08513f..20231fd9f38 100644
--- a/python/ovs/flow/ofp.py
+++ b/python/ovs/flow/ofp.py
@@ -104,9 +104,6 @@ def __init__(self, ofp_string, id=None):
ValueError if the string is malformed.
ParseError if an error in parsing occurs.
"""
- if " reply " in ofp_string:
- return None
-
sections = list()
parts = ofp_string.split("actions=")
if len(parts) != 2:
From 685973a9f1cb2c9a49ea517a8feab7012a35a1fd Mon Sep 17 00:00:00 2001
From: Dan Williams
Date: Wed, 14 Dec 2022 10:29:16 -0600
Subject: [PATCH 090/833] ovsdb-server: Don't log when
memory-trim-on-compaction doesn't change.
But log at least once even if the value hasn't changed, for
informational purposes.
Signed-off-by: Dan Williams
Signed-off-by: Ilya Maximets
---
ovsdb/ovsdb-server.c | 9 +++++++--
1 file changed, 7 insertions(+), 2 deletions(-)
diff --git a/ovsdb/ovsdb-server.c b/ovsdb/ovsdb-server.c
index 7a6bfe0a03c..33ca4910d70 100644
--- a/ovsdb/ovsdb-server.c
+++ b/ovsdb/ovsdb-server.c
@@ -1600,6 +1600,8 @@ ovsdb_server_memory_trim_on_compaction(struct unixctl_conn *conn,
const char *argv[],
void *arg OVS_UNUSED)
{
+ bool old_trim_memory = trim_memory;
+ static bool have_logged = false;
const char *command = argv[1];
#if !HAVE_DECL_MALLOC_TRIM
@@ -1615,8 +1617,11 @@ ovsdb_server_memory_trim_on_compaction(struct unixctl_conn *conn,
unixctl_command_reply_error(conn, "invalid argument");
return;
}
- VLOG_INFO("memory trimming after compaction %s.",
- trim_memory ? "enabled" : "disabled");
+ if (!have_logged || (trim_memory != old_trim_memory)) {
+ have_logged = true;
+ VLOG_INFO("memory trimming after compaction %s.",
+ trim_memory ? "enabled" : "disabled");
+ }
unixctl_command_reply(conn, NULL);
}
From d5469cb743c284461739cb99c686dfbe92ded70c Mon Sep 17 00:00:00 2001
From: Eelco Chaudron
Date: Thu, 8 Dec 2022 10:48:06 +0100
Subject: [PATCH 091/833] Makefile: Add USDT scripts to make install and
fedora/debian test rpm.
This change will install all the USDT scripts to the
{_datadir}/openvswitch/scripts/usdt directory with the
make install command.
In addition it will also add them to the Fedora
and Debian openvswitch-test rpm.
Signed-off-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
Makefile.am | 2 ++
debian/openvswitch-test.install | 1 +
rhel/openvswitch-fedora.spec.in | 1 +
utilities/automake.mk | 4 ++++
4 files changed, 8 insertions(+)
diff --git a/Makefile.am b/Makefile.am
index d4385386743..606bcc22e12 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -120,6 +120,7 @@ OVSIDL_BUILT =
pkgdata_DATA =
sbin_SCRIPTS =
scripts_SCRIPTS =
+usdt_SCRIPTS =
completion_SCRIPTS =
scripts_DATA =
SUFFIXES =
@@ -133,6 +134,7 @@ C ?= 1
endif
scriptsdir = $(pkgdatadir)/scripts
+usdtdir = $(pkgdatadir)/scripts/usdt
completiondir = $(sysconfdir)/bash_completion.d
pkgconfigdir = $(libdir)/pkgconfig
diff --git a/debian/openvswitch-test.install b/debian/openvswitch-test.install
index b3a80d86ae2..88c82528054 100644
--- a/debian/openvswitch-test.install
+++ b/debian/openvswitch-test.install
@@ -2,3 +2,4 @@ usr/bin/ovs-l3ping
usr/bin/ovs-test
usr/share/man/man8/ovs-l3ping.8
usr/share/man/man8/ovs-test.8
+usr/share/openvswitch/scripts/usdt/*
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 17aab796fca..8fc6e8ab233 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -396,6 +396,7 @@ fi
%{_bindir}/ovs-pcap
%{_bindir}/ovs-tcpdump
%{_bindir}/ovs-tcpundump
+%{_datadir}/openvswitch/scripts/usdt/*
%{_mandir}/man8/ovs-test.8*
%{_mandir}/man8/ovs-vlan-test.8*
%{_mandir}/man8/ovs-l3ping.8*
diff --git a/utilities/automake.mk b/utilities/automake.mk
index eb57653a1cd..132a16942e8 100644
--- a/utilities/automake.mk
+++ b/utilities/automake.mk
@@ -20,6 +20,10 @@ scripts_SCRIPTS += \
utilities/ovs-kmod-ctl \
utilities/ovs-save
scripts_DATA += utilities/ovs-lib
+usdt_SCRIPTS += \
+ utilities/usdt-scripts/bridge_loop.bt \
+ utilities/usdt-scripts/upcall_cost.py \
+ utilities/usdt-scripts/upcall_monitor.py
completion_SCRIPTS += \
utilities/ovs-appctl-bashcomp.bash \
From 9a86a3dd68f054d47e1a93b8dec03d51479554f4 Mon Sep 17 00:00:00 2001
From: David Marchand
Date: Wed, 21 Dec 2022 18:51:20 +0100
Subject: [PATCH 092/833] travis: Drop support.
Following a change in the terms of use, free Travis credits are really
too low for a realistic usage by OVS contributors.
As a consequence, testing OVS with Travis has been abandoned by most
(if not all) contributors to the project.
Drop the Travis configuration from our repository, clean references in
the documentation and move GHA specifics to the association yml.
Acked-by: Aaron Conole
Signed-off-by: David Marchand
Signed-off-by: Ilya Maximets
---
.ci/linux-build.sh | 31 +---------
.ci/linux-prepare.sh | 22 +------
.ci/osx-build.sh | 15 -----
.github/workflows/build-and-test.yml | 4 ++
.travis.yml | 57 -------------------
.../contributing/submitting-patches.rst | 7 +--
Documentation/topics/testing.rst | 40 -------------
Makefile.am | 1 -
NEWS | 2 +
README.rst | 2 -
10 files changed, 14 insertions(+), 167 deletions(-)
delete mode 100644 .travis.yml
diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index 48510967238..c06186ce1cf 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -7,21 +7,6 @@ CFLAGS_FOR_OVS="-g -O2"
SPARSE_FLAGS=""
EXTRA_OPTS="--enable-Werror"
-on_exit() {
- if [ $? = 0 ]; then
- exit
- fi
- FILES_TO_PRINT="config.log"
- FILES_TO_PRINT="$FILES_TO_PRINT */_build/sub/tests/testsuite.log"
-
- for pr_file in $FILES_TO_PRINT; do
- cat "$pr_file" 2>/dev/null
- done
-}
-# We capture the error logs as artifacts in Github Actions, no need to dump
-# them via a EXIT handler.
-[ -n "$GITHUB_WORKFLOW" ] || trap on_exit EXIT
-
function install_kernel()
{
if [[ "$1" =~ ^5.* ]]; then
@@ -98,19 +83,9 @@ function install_kernel()
function install_dpdk()
{
local DPDK_VER=$1
- local VERSION_FILE="dpdk-dir/travis-dpdk-cache-version"
+ local VERSION_FILE="dpdk-dir/cached-version"
local DPDK_OPTS=""
- local DPDK_LIB=""
-
- if [ -z "$TRAVIS_ARCH" ] ||
- [ "$TRAVIS_ARCH" == "amd64" ]; then
- DPDK_LIB=$(pwd)/dpdk-dir/build/lib/x86_64-linux-gnu
- elif [ "$TRAVIS_ARCH" == "aarch64" ]; then
- DPDK_LIB=$(pwd)/dpdk-dir/build/lib/aarch64-linux-gnu
- else
- echo "Target is unknown"
- exit 1
- fi
+ local DPDK_LIB=$(pwd)/dpdk-dir/build/lib/x86_64-linux-gnu
if [ "$DPDK_SHARED" ]; then
EXTRA_OPTS="$EXTRA_OPTS --with-dpdk=shared"
@@ -245,7 +220,7 @@ elif [ "$M32" ]; then
# Adding m32 flag directly to CC to avoid any posiible issues with API/ABI
# difference on 'configure' and 'make' stages.
export CC="$CC -m32"
-elif [ "$TRAVIS_ARCH" != "aarch64" ]; then
+else
OPTS="--enable-sparse"
if [ "$AFXDP" ]; then
# netdev-afxdp uses memset for 64M for umem initialization.
diff --git a/.ci/linux-prepare.sh b/.ci/linux-prepare.sh
index 11d75a6d598..f414a879c70 100755
--- a/.ci/linux-prepare.sh
+++ b/.ci/linux-prepare.sh
@@ -10,14 +10,11 @@ fi
# Build and install sparse.
#
-# Explicitly disable sparse support for llvm because some travis
-# environments claim to have LLVM (llvm-config exists and works) but
-# linking against it fails.
# Disabling sqlite support because sindex build fails and we don't
# really need this utility being installed.
git clone git://git.kernel.org/pub/scm/devel/sparse/sparse.git
cd sparse
-make -j4 HAVE_LLVM= HAVE_SQLITE= install
+make -j4 HAVE_SQLITE= install
cd ..
# Installing wheel separately because it may be needed to build some
@@ -29,23 +26,8 @@ pip3 install --disable-pip-version-check --user \
flake8 'hacking>=3.0' netaddr pyparsing sphinx setuptools pyelftools
pip3 install --user 'meson==0.53.2'
-if [ "$M32" ]; then
- # Installing 32-bit libraries.
- pkgs="gcc-multilib"
- if [ -z "$GITHUB_WORKFLOW" ]; then
- # 32-bit and 64-bit libunwind can not be installed at the same time.
- # This will remove the 64-bit libunwind and install 32-bit version.
- # GitHub Actions doesn't have 32-bit versions of these libs.
- pkgs=$pkgs" libunwind-dev:i386 libunbound-dev:i386"
- fi
-
- sudo apt-get install -y $pkgs
-fi
-
# Install python test dependencies
pip3 install -r python/test_requirements.txt
-# IPv6 is supported by kernel but disabled in TravisCI images:
-# https://github.com/travis-ci/travis-ci/issues/8891
-# Enable it to avoid skipping of IPv6 related tests.
+# Make sure IPv6 is enabled to avoid skipping of IPv6 related tests.
sudo sysctl -w net.ipv6.conf.all.disable_ipv6=0
diff --git a/.ci/osx-build.sh b/.ci/osx-build.sh
index f8facebeb02..09df61826f1 100755
--- a/.ci/osx-build.sh
+++ b/.ci/osx-build.sh
@@ -5,21 +5,6 @@ set -o errexit
CFLAGS="-Werror $CFLAGS"
EXTRA_OPTS=""
-on_exit() {
- if [ $? = 0 ]; then
- exit
- fi
- FILES_TO_PRINT="config.log"
- FILES_TO_PRINT="$FILES_TO_PRINT */_build/sub/tests/testsuite.log"
-
- for pr_file in $FILES_TO_PRINT; do
- cat "$pr_file" 2>/dev/null
- done
-}
-# We capture the error logs as artifacts in Github Actions, no need to dump
-# them via a EXIT handler.
-[ -n "$GITHUB_WORKFLOW" ] || trap on_exit EXIT
-
function configure_ovs()
{
./boot.sh && ./configure $*
diff --git a/.github/workflows/build-and-test.yml b/.github/workflows/build-and-test.yml
index e08d7b1bac1..1949d12001b 100644
--- a/.github/workflows/build-and-test.yml
+++ b/.github/workflows/build-and-test.yml
@@ -133,8 +133,12 @@ jobs:
- name: install common dependencies
run: sudo apt install -y ${{ env.dependencies }}
- name: install libunbound libunwind
+ # GitHub Actions doesn't have 32-bit versions of these libraries.
if: matrix.m32 == ''
run: sudo apt install -y libunbound-dev libunwind-dev
+ - name: install 32-bit libraries
+ if: matrix.m32 != ''
+ run: sudo apt install -y gcc-multilib
- name: prepare
run: ./.ci/linux-prepare.sh
diff --git a/.travis.yml b/.travis.yml
deleted file mode 100644
index c7aeede06e6..00000000000
--- a/.travis.yml
+++ /dev/null
@@ -1,57 +0,0 @@
-language: c
-
-os:
- - linux
-
-cache:
- directories:
- - dpdk-dir
-
-addons:
- apt:
- packages:
- - bc
- - libssl-dev
- - llvm-dev
- - libjemalloc1
- - libjemalloc-dev
- - libnuma-dev
- - libpcap-dev
- - python3-pip
- - python3-sphinx
- - libelf-dev
- - selinux-policy-dev
- - libunbound-dev
- - libunwind-dev
- - python3-setuptools
- - python3-wheel
- - ninja-build
-
-before_install: ./.ci/${TRAVIS_OS_NAME}-prepare.sh
-
-before_script: export PATH=$PATH:$HOME/bin
-
-matrix:
- include:
- - arch: arm64
- compiler: gcc
- env: TESTSUITE=1 DPDK=1
- - arch: arm64
- compiler: gcc
- env: KERNEL_LIST="5.5 4.19"
- - arch: arm64
- compiler: gcc
- env: KERNEL_LIST="4.9 3.16"
- - arch: arm64
- compiler: gcc
- env: DPDK_SHARED=1
- - arch: arm64
- compiler: clang
- env: OPTS="--disable-ssl"
-
-script: ./.ci/${TRAVIS_OS_NAME}-build.sh $OPTS
-
-notifications:
- email:
- recipients:
- - ovs-build@openvswitch.org
diff --git a/Documentation/internals/contributing/submitting-patches.rst b/Documentation/internals/contributing/submitting-patches.rst
index 9d718982712..8a8bc11b0a9 100644
--- a/Documentation/internals/contributing/submitting-patches.rst
+++ b/Documentation/internals/contributing/submitting-patches.rst
@@ -68,10 +68,9 @@ Testing is also important:
feature. A bug fix patch should preferably add a test that would
fail if the bug recurs.
-If you are using GitHub, then you may utilize the travis-ci.org and the GitHub
-Actions CI build systems. They will run some of the above tests automatically
-when you push changes to your repository. See the "Continuous Integration with
-Travis-CI" in :doc:`/topics/testing` for details on how to set it up.
+If you are using GitHub, then you may utilize the GitHub Actions CI build
+systems. They will run some of the above tests automatically
+when you push changes to your repository.
Email Subject
-------------
diff --git a/Documentation/topics/testing.rst b/Documentation/topics/testing.rst
index abccce1ee60..bc41b217a5c 100644
--- a/Documentation/topics/testing.rst
+++ b/Documentation/topics/testing.rst
@@ -474,46 +474,6 @@ You should invoke scan-view to view analysis results. The last line of output
from ``clang-analyze`` will list the command (containing results directory)
that you should invoke to view the results on a browser.
-Continuous Integration with Travis CI
--------------------------------------
-
-A .travis.yml file is provided to automatically build Open vSwitch with various
-build configurations and run the testsuite using Travis CI. Builds will be
-performed with gcc, sparse and clang with the -Werror compiler flag included,
-therefore the build will fail if a new warning has been introduced.
-
-The CI build is triggered via git push (regardless of the specific branch) or
-pull request against any Open vSwitch GitHub repository that is linked to
-travis-ci.
-
-Instructions to setup travis-ci for your GitHub repository:
-
-1. Go to https://travis-ci.org/ and sign in using your GitHub ID.
-2. Go to the "Repositories" tab and enable the ovs repository. You may disable
- builds for pushes or pull requests.
-3. In order to avoid forks sending build failures to the upstream mailing list,
- the notification email recipient is encrypted. If you want to receive email
- notification for build failures, replace the encrypted string:
-
- 1. Install the travis-ci CLI (Requires ruby >=2.0): gem install travis
- 2. In your Open vSwitch repository: travis encrypt mylist@mydomain.org
- 3. Add/replace the notifications section in .travis.yml and fill in the
- secure string as returned by travis encrypt::
-
- notifications:
- email:
- recipients:
- - secure: "....."
-
- .. note::
- You may remove/omit the notifications section to fall back to default
- notification behaviour which is to send an email directly to the author and
- committer of the failing commit. Note that the email is only sent if the
- author/committer have commit rights for the particular GitHub repository.
-
-4. Pushing a commit to the repository which breaks the build or the
- testsuite will now trigger a email sent to mylist@mydomain.org
-
vsperf
------
diff --git a/Makefile.am b/Makefile.am
index 606bcc22e12..e605187b813 100644
--- a/Makefile.am
+++ b/Makefile.am
@@ -81,7 +81,6 @@ EXTRA_DIST = \
.ci/osx-prepare.sh \
.cirrus.yml \
.github/workflows/build-and-test.yml \
- .travis.yml \
appveyor.yml \
boot.sh \
poc/builders/Vagrantfile \
diff --git a/NEWS b/NEWS
index c79d9f97dc4..c0095c345d1 100644
--- a/NEWS
+++ b/NEWS
@@ -20,6 +20,8 @@ Post-v3.0.0
* New option '--dump-hugepages' to include hugepages in core dumps. This
can assist with postmortem analysis involving DPDK, but may also produce
significantly larger core dump files.
+ - Support for travis-ci.org based continuous integration builds has been
+ dropped.
v3.0.0 - 15 Aug 2022
diff --git a/README.rst b/README.rst
index 8fe01f4cf23..a60a314feb3 100644
--- a/README.rst
+++ b/README.rst
@@ -8,8 +8,6 @@ Open vSwitch
.. image:: https://github.com/openvswitch/ovs/workflows/Build%20and%20Test/badge.svg
:target: https://github.com/openvswitch/ovs/actions
-.. image:: https://travis-ci.org/openvswitch/ovs.png
- :target: https://travis-ci.org/openvswitch/ovs
.. image:: https://ci.appveyor.com/api/projects/status/github/openvswitch/ovs?branch=master&svg=true&retina=true
:target: https://ci.appveyor.com/project/blp/ovs/history
.. image:: https://api.cirrus-ci.com/github/openvswitch/ovs.svg
From 526230bfab09095cf0214c7033382463b9d506cf Mon Sep 17 00:00:00 2001
From: Kevin Traynor
Date: Wed, 30 Nov 2022 17:39:52 +0000
Subject: [PATCH 093/833] dpif-netdev: Make pmd-rxq-show time configurable.
pmd-rxq-show shows the Rx queue to pmd assignments as well as the
pmd usage of each Rx queue.
Up until now a tail length of 60 seconds pmd usage was shown
for each Rx queue, as this is the value used during rebalance
to avoid any spike effects.
When debugging or tuning, it is also convenient to display the
pmd usage of an Rx queue over a shorter time frame, so any changes
config or traffic that impact pmd usage can be evaluated more quickly.
A parameter is added that allows pmd-rxq-show stats pmd usage to
be shown for a shorter time frame. Values are rounded up to the
nearest 5 seconds as that is the measurement granularity and the value
used is displayed. e.g.
$ ovs-appctl dpif-netdev/pmd-rxq-show -secs 5
Displaying last 5 seconds pmd usage %
pmd thread numa_id 0 core_id 4:
isolated : false
port: dpdk0 queue-id: 0 (enabled) pmd usage: 95 %
overhead: 4 %
The default time frame has not changed and the maximum value
is limited to the maximum stored tail length (60 seconds).
Reviewed-by: David Marchand
Signed-off-by: Kevin Traynor
Signed-off-by: Ilya Maximets
---
lib/dpif-netdev-private-thread.h | 2 +-
lib/dpif-netdev.c | 98 ++++++++++++++++++++++++--------
tests/pmd.at | 62 ++++++++++++++++++++
3 files changed, 138 insertions(+), 24 deletions(-)
diff --git a/lib/dpif-netdev-private-thread.h b/lib/dpif-netdev-private-thread.h
index 4472b199d5c..1ec3cd79470 100644
--- a/lib/dpif-netdev-private-thread.h
+++ b/lib/dpif-netdev-private-thread.h
@@ -114,7 +114,7 @@ struct dp_netdev_pmd_thread {
atomic_ullong intrvl_cycles;
/* Write index for 'busy_cycles_intrvl'. */
- unsigned int intrvl_idx;
+ atomic_count intrvl_idx;
/* Busy cycles in last PMD_INTERVAL_MAX intervals. */
atomic_ullong *busy_cycles_intrvl;
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index 9331f2cbac6..af99a91d1cc 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -160,11 +160,13 @@ static struct odp_support dp_netdev_support = {
/* Time in microseconds of the interval in which rxq processing cycles used
* in rxq to pmd assignments is measured and stored. */
-#define PMD_INTERVAL_LEN 10000000LL
+#define PMD_INTERVAL_LEN 5000000LL
+/* For converting PMD_INTERVAL_LEN to secs. */
+#define INTERVAL_USEC_TO_SEC 1000000LL
/* Number of intervals for which cycles are stored
* and used during rxq to pmd assignment. */
-#define PMD_INTERVAL_MAX 6
+#define PMD_INTERVAL_MAX 12
/* Time in microseconds to try RCU quiescing. */
#define PMD_RCU_QUIESCE_INTERVAL 10000LL
@@ -428,7 +430,7 @@ struct dp_netdev_rxq {
pinned. OVS_CORE_UNSPEC if the
queue doesn't need to be pinned to a
particular core. */
- unsigned intrvl_idx; /* Write index for 'cycles_intrvl'. */
+ atomic_count intrvl_idx; /* Write index for 'cycles_intrvl'. */
struct dp_netdev_pmd_thread *pmd; /* pmd thread that polls this queue. */
bool is_vhost; /* Is rxq of a vhost port. */
@@ -615,6 +617,9 @@ dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx,
unsigned long long cycles);
static uint64_t
dp_netdev_rxq_get_intrvl_cycles(struct dp_netdev_rxq *rx, unsigned idx);
+static uint64_t
+get_interval_values(atomic_ullong *source, atomic_count *cur_idx,
+ int num_to_read);
static void
dpif_netdev_xps_revalidate_pmd(const struct dp_netdev_pmd_thread *pmd,
bool purge);
@@ -869,7 +874,8 @@ sorted_poll_list(struct dp_netdev_pmd_thread *pmd, struct rxq_poll **list,
}
static void
-pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
+pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd,
+ int secs)
{
if (pmd->core_id != NON_PMD_CORE_ID) {
struct rxq_poll *list;
@@ -877,6 +883,7 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
uint64_t total_cycles = 0;
uint64_t busy_cycles = 0;
uint64_t total_rxq_proc_cycles = 0;
+ unsigned int intervals;
ds_put_format(reply,
"pmd thread numa_id %d core_id %u:\n isolated : %s\n",
@@ -888,15 +895,14 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
/* Get the total pmd cycles for an interval. */
atomic_read_relaxed(&pmd->intrvl_cycles, &total_cycles);
+ /* Calculate how many intervals are to be used. */
+ intervals = DIV_ROUND_UP(secs,
+ PMD_INTERVAL_LEN / INTERVAL_USEC_TO_SEC);
/* Estimate the cycles to cover all intervals. */
- total_cycles *= PMD_INTERVAL_MAX;
-
- for (int j = 0; j < PMD_INTERVAL_MAX; j++) {
- uint64_t cycles;
-
- atomic_read_relaxed(&pmd->busy_cycles_intrvl[j], &cycles);
- busy_cycles += cycles;
- }
+ total_cycles *= intervals;
+ busy_cycles = get_interval_values(pmd->busy_cycles_intrvl,
+ &pmd->intrvl_idx,
+ intervals);
if (busy_cycles > total_cycles) {
busy_cycles = total_cycles;
}
@@ -906,9 +912,9 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd)
const char *name = netdev_rxq_get_name(rxq->rx);
uint64_t rxq_proc_cycles = 0;
- for (int j = 0; j < PMD_INTERVAL_MAX; j++) {
- rxq_proc_cycles += dp_netdev_rxq_get_intrvl_cycles(rxq, j);
- }
+ rxq_proc_cycles = get_interval_values(rxq->cycles_intrvl,
+ &rxq->intrvl_idx,
+ intervals);
total_rxq_proc_cycles += rxq_proc_cycles;
ds_put_format(reply, " port: %-16s queue-id: %2d", name,
netdev_rxq_get_queue_id(list[i].rxq->rx));
@@ -1422,6 +1428,10 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
unsigned int core_id;
bool filter_on_pmd = false;
size_t n;
+ unsigned int secs = 0;
+ unsigned long long max_secs = (PMD_INTERVAL_LEN * PMD_INTERVAL_MAX)
+ / INTERVAL_USEC_TO_SEC;
+ bool first_show_rxq = true;
ovs_mutex_lock(&dp_netdev_mutex);
@@ -1432,6 +1442,14 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
}
argc -= 2;
argv += 2;
+ } else if (type == PMD_INFO_SHOW_RXQ &&
+ !strcmp(argv[1], "-secs") &&
+ argc > 2) {
+ if (!str_to_uint(argv[2], 10, &secs)) {
+ secs = max_secs;
+ }
+ argc -= 2;
+ argv += 2;
} else {
dp = shash_find_data(&dp_netdevs, argv[1]);
argc -= 1;
@@ -1461,7 +1479,18 @@ dpif_netdev_pmd_info(struct unixctl_conn *conn, int argc, const char *argv[],
continue;
}
if (type == PMD_INFO_SHOW_RXQ) {
- pmd_info_show_rxq(&reply, pmd);
+ if (first_show_rxq) {
+ if (!secs || secs > max_secs) {
+ secs = max_secs;
+ } else {
+ secs = ROUND_UP(secs,
+ PMD_INTERVAL_LEN / INTERVAL_USEC_TO_SEC);
+ }
+ ds_put_format(&reply, "Displaying last %u seconds "
+ "pmd usage %%\n", secs);
+ first_show_rxq = false;
+ }
+ pmd_info_show_rxq(&reply, pmd, secs);
} else if (type == PMD_INFO_CLEAR_STATS) {
pmd_perf_stats_clear(&pmd->perf_stats);
} else if (type == PMD_INFO_SHOW_STATS) {
@@ -1576,8 +1605,9 @@ dpif_netdev_init(void)
unixctl_command_register("dpif-netdev/pmd-stats-clear", "[-pmd core] [dp]",
0, 3, dpif_netdev_pmd_info,
(void *)&clear_aux);
- unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] [dp]",
- 0, 3, dpif_netdev_pmd_info,
+ unixctl_command_register("dpif-netdev/pmd-rxq-show", "[-pmd core] "
+ "[-secs secs] [dp]",
+ 0, 5, dpif_netdev_pmd_info,
(void *)&poll_aux);
unixctl_command_register("dpif-netdev/pmd-perf-show",
"[-nh] [-it iter-history-len]"
@@ -5174,7 +5204,7 @@ static void
dp_netdev_rxq_set_intrvl_cycles(struct dp_netdev_rxq *rx,
unsigned long long cycles)
{
- unsigned int idx = rx->intrvl_idx++ % PMD_INTERVAL_MAX;
+ unsigned int idx = atomic_count_inc(&rx->intrvl_idx) % PMD_INTERVAL_MAX;
atomic_store_relaxed(&rx->cycles_intrvl[idx], cycles);
}
@@ -6914,6 +6944,9 @@ pmd_thread_main(void *f_)
reload:
atomic_count_init(&pmd->pmd_overloaded, 0);
+ pmd->intrvl_tsc_prev = 0;
+ atomic_store_relaxed(&pmd->intrvl_cycles, 0);
+
if (!dpdk_attached) {
dpdk_attached = dpdk_attach_thread(pmd->core_id);
}
@@ -6945,12 +6978,10 @@ pmd_thread_main(void *f_)
}
}
- pmd->intrvl_tsc_prev = 0;
- atomic_store_relaxed(&pmd->intrvl_cycles, 0);
for (i = 0; i < PMD_INTERVAL_MAX; i++) {
atomic_store_relaxed(&pmd->busy_cycles_intrvl[i], 0);
}
- pmd->intrvl_idx = 0;
+ atomic_count_set(&pmd->intrvl_idx, 0);
cycles_counter_update(s);
pmd->next_rcu_quiesce = pmd->ctx.now + PMD_RCU_QUIESCE_INTERVAL;
@@ -9931,7 +9962,7 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
atomic_store_relaxed(&pmd->intrvl_cycles,
curr_tsc - pmd->intrvl_tsc_prev);
}
- idx = pmd->intrvl_idx++ % PMD_INTERVAL_MAX;
+ idx = atomic_count_inc(&pmd->intrvl_idx) % PMD_INTERVAL_MAX;
atomic_store_relaxed(&pmd->busy_cycles_intrvl[idx], tot_proc);
pmd->intrvl_tsc_prev = curr_tsc;
/* Start new measuring interval */
@@ -9954,6 +9985,27 @@ dp_netdev_pmd_try_optimize(struct dp_netdev_pmd_thread *pmd,
}
}
+/* Returns the sum of a specified number of newest to
+ * oldest interval values. 'cur_idx' is where the next
+ * write will be and wrap around needs to be handled.
+ */
+static uint64_t
+get_interval_values(atomic_ullong *source, atomic_count *cur_idx,
+ int num_to_read) {
+ unsigned int i;
+ uint64_t total = 0;
+
+ i = atomic_count_get(cur_idx) % PMD_INTERVAL_MAX;
+ for (int read = 0; read < num_to_read; read++) {
+ uint64_t interval_value;
+
+ i = i ? i - 1 : PMD_INTERVAL_MAX - 1;
+ atomic_read_relaxed(&source[i], &interval_value);
+ total += interval_value;
+ }
+ return total;
+}
+
/* Insert 'rule' into 'cls'. */
static void
dpcls_insert(struct dpcls *cls, struct dpcls_rule *rule,
diff --git a/tests/pmd.at b/tests/pmd.at
index 10879a349b9..ed90f88c4cb 100644
--- a/tests/pmd.at
+++ b/tests/pmd.at
@@ -70,6 +70,7 @@ CHECK_CPU_DISCOVERED()
CHECK_PMD_THREADS_CREATED()
AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], [0], [dnl
+Displaying last 60 seconds pmd usage %
pmd thread numa_id core_id :
isolated : false
port: p0 queue-id: 0 (enabled) pmd usage: NOT AVAIL
@@ -102,6 +103,7 @@ dummy@ovs-dummy: hit:0 missed:0
])
AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], [0], [dnl
+Displaying last 60 seconds pmd usage %
pmd thread numa_id core_id :
isolated : false
port: p0 queue-id: 0 (enabled) pmd usage: NOT AVAIL
@@ -134,6 +136,7 @@ dummy@ovs-dummy: hit:0 missed:0
])
AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], [0], [dnl
+Displaying last 60 seconds pmd usage %
pmd thread numa_id core_id :
isolated : false
port: p0 queue-id: 0 (enabled) pmd usage: NOT AVAIL
@@ -183,6 +186,7 @@ AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1])
CHECK_PMD_THREADS_CREATED([1], [], [+$TMP])
AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], [0], [dnl
+Displaying last 60 seconds pmd usage %
pmd thread numa_id core_id :
isolated : false
port: p0 queue-id: 0 (enabled) pmd usage: NOT AVAIL
@@ -215,6 +219,7 @@ dummy@ovs-dummy: hit:0 missed:0
])
AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | sed SED_NUMA_CORE_PATTERN], [0], [dnl
+Displaying last 60 seconds pmd usage %
pmd thread numa_id core_id :
isolated : false
port: p0 queue-id: 0 (enabled) pmd usage: NOT AVAIL
@@ -280,6 +285,7 @@ CHECK_PMD_THREADS_CREATED([1], [1], [+$TMP])
OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Performing pmd to rx queue assignment using group algorithm"])
AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show], [0], [dnl
+Displaying last 60 seconds pmd usage %
pmd thread numa_id 1 core_id 1:
isolated : false
port: p0 queue-id: 0 (enabled) pmd usage: NOT AVAIL
@@ -302,6 +308,7 @@ AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=roundrobin])
OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Performing pmd to rx queue assignment using roundrobin algorithm"])
AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show], [0], [dnl
+Displaying last 60 seconds pmd usage %
pmd thread numa_id 1 core_id 1:
isolated : false
port: p0 queue-id: 0 (enabled) pmd usage: NOT AVAIL
@@ -322,6 +329,7 @@ AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-rxq-assign=cycles])
OVS_WAIT_UNTIL([tail -n +$TMP ovs-vswitchd.log | grep "Performing pmd to rx queue assignment using cycles algorithm"])
AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show], [0], [dnl
+Displaying last 60 seconds pmd usage %
pmd thread numa_id 1 core_id 1:
isolated : false
port: p0 queue-id: 0 (enabled) pmd usage: NOT AVAIL
@@ -343,6 +351,7 @@ AT_CHECK([ovs-vsctl set Open_vSwitch . other_config:pmd-cpu-mask=0x1])
CHECK_PMD_THREADS_CREATED([1], [1], [+$TMP])
AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show], [0], [dnl
+Displaying last 60 seconds pmd usage %
pmd thread numa_id 1 core_id 0:
isolated : false
port: p0 queue-id: 0 (enabled) pmd usage: NOT AVAIL
@@ -471,6 +480,59 @@ pmd thread numa_id core_id :
OVS_VSWITCHD_STOP
AT_CLEANUP
+AT_SETUP([PMD - pmd-rxq-show pmd usage time])
+OVS_VSWITCHD_START([add-port br0 p0 -- set Interface p0 type=dummy-pmd], [], [], [DUMMY_NUMA])
+
+#CHECK_CPU_DISCOVERED()
+#CHECK_PMD_THREADS_CREATED()
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show | grep Displaying], [0], [dnl
+Displaying last 60 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs -1 | grep Displaying], [0], [dnl
+Displaying last 60 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs 0 | grep Displaying], [0], [dnl
+Displaying last 60 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs 1 | grep Displaying], [0], [dnl
+Displaying last 5 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs 5 | grep Displaying], [0], [dnl
+Displaying last 5 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs 6 | grep Displaying], [0], [dnl
+Displaying last 10 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs 51 | grep Displaying], [0], [dnl
+Displaying last 55 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs 55 | grep Displaying], [0], [dnl
+Displaying last 55 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs 56 | grep Displaying], [0], [dnl
+Displaying last 60 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs 60 | grep Displaying], [0], [dnl
+Displaying last 60 seconds pmd usage %
+])
+
+AT_CHECK([ovs-appctl dpif-netdev/pmd-rxq-show -secs 61 | grep Displaying], [0], [dnl
+Displaying last 60 seconds pmd usage %
+])
+
+OVS_VSWITCHD_STOP
+AT_CLEANUP
+
dnl Reconfigure the number of rx queues of a port, make sure that all the
dnl queues are polled by the datapath and try to send a couple of packets.
AT_SETUP([PMD - reconfigure n_rxq])
From e9ab15f4f82330e0d7bc33e57d3357fa52f76749 Mon Sep 17 00:00:00 2001
From: Kevin Traynor
Date: Wed, 30 Nov 2022 17:39:53 +0000
Subject: [PATCH 094/833] docs: Add documentation for pmd-rxq-show secs
parameter.
Add description of new '-secs' parameter in docs. Also, add to NEWS as
it is a user facing change.
Reviewed-by: David Marchand
Signed-off-by: Kevin Traynor
Signed-off-by: Ilya Maximets
---
Documentation/topics/dpdk/pmd.rst | 23 ++++++++++++++++++-----
NEWS | 3 +++
2 files changed, 21 insertions(+), 5 deletions(-)
diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst
index b259cc8b32d..88457f36694 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -101,12 +101,20 @@ core cycles for each Rx queue::
.. note::
- A history of one minute is recorded and shown for each Rx queue to allow for
- traffic pattern spikes. Any changes in the Rx queue's PMD core cycles usage,
- due to traffic pattern or reconfig changes, will take one minute to be fully
- reflected in the stats.
+ By default a history of one minute is recorded and shown for each Rx queue
+ to allow for traffic pattern spikes. Any changes in the Rx queue's PMD core
+ cycles usage, due to traffic pattern or reconfig changes, will take one
+ minute to be fully reflected in the stats by default.
- .. versionchanged:: 2.6.0
+PMD thread usage of an Rx queue can be displayed for a shorter period of time,
+from the last 5 seconds up to the default 60 seconds in 5 second steps.
+
+To see the port/Rx queue assignment and the last 5 secs of measured usage
+history of PMD core cycles for each Rx queue::
+
+ $ ovs-appctl dpif-netdev/pmd-rxq-show -secs 5
+
+.. versionchanged:: 2.6.0
The ``pmd-rxq-show`` command was added in OVS 2.6.0.
@@ -115,6 +123,11 @@ core cycles for each Rx queue::
A ``overhead`` statistics is shown per PMD: it represents the number of
cycles inherently consumed by the OVS PMD processing loop.
+.. versionchanged:: 3.1.0
+
+ The ``-secs`` parameter was added to the dpif-netdev/pmd-rxq-show
+ command.
+
Rx queue to PMD assignment takes place whenever there are configuration changes
or can be triggered by using::
diff --git a/NEWS b/NEWS
index c0095c345d1..92d33c2912a 100644
--- a/NEWS
+++ b/NEWS
@@ -22,6 +22,9 @@ Post-v3.0.0
significantly larger core dump files.
- Support for travis-ci.org based continuous integration builds has been
dropped.
+ - Userspace datapath:
+ * Add '-secs' argument to appctl 'dpif-netdev/pmd-rxq-show' to show
+ the pmd usage of an Rx queue over a configurable time period.
v3.0.0 - 15 Aug 2022
From ad6e506fcb63e34f3398c5284cb2bd1858ac3a49 Mon Sep 17 00:00:00 2001
From: Kevin Traynor
Date: Wed, 30 Nov 2022 17:39:54 +0000
Subject: [PATCH 095/833] dpif-netdev: Rename pmd_info_show_rxq variables.
There are some similar readings taken for pmds and Rx queues
in this function and a few of the variable names are ambiguous.
Improve the readability of the code by updating some variables
names to indicate that they are readings related to the pmd.
Reviewed-by: David Marchand
Signed-off-by: Kevin Traynor
Signed-off-by: Ilya Maximets
---
lib/dpif-netdev.c | 30 +++++++++++++++---------------
1 file changed, 15 insertions(+), 15 deletions(-)
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index af99a91d1cc..c015fb6ddc9 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -880,8 +880,8 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd,
if (pmd->core_id != NON_PMD_CORE_ID) {
struct rxq_poll *list;
size_t n_rxq;
- uint64_t total_cycles = 0;
- uint64_t busy_cycles = 0;
+ uint64_t total_pmd_cycles = 0;
+ uint64_t busy_pmd_cycles = 0;
uint64_t total_rxq_proc_cycles = 0;
unsigned int intervals;
@@ -894,17 +894,17 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd,
sorted_poll_list(pmd, &list, &n_rxq);
/* Get the total pmd cycles for an interval. */
- atomic_read_relaxed(&pmd->intrvl_cycles, &total_cycles);
+ atomic_read_relaxed(&pmd->intrvl_cycles, &total_pmd_cycles);
/* Calculate how many intervals are to be used. */
intervals = DIV_ROUND_UP(secs,
PMD_INTERVAL_LEN / INTERVAL_USEC_TO_SEC);
/* Estimate the cycles to cover all intervals. */
- total_cycles *= intervals;
- busy_cycles = get_interval_values(pmd->busy_cycles_intrvl,
- &pmd->intrvl_idx,
- intervals);
- if (busy_cycles > total_cycles) {
- busy_cycles = total_cycles;
+ total_pmd_cycles *= intervals;
+ busy_pmd_cycles = get_interval_values(pmd->busy_cycles_intrvl,
+ &pmd->intrvl_idx,
+ intervals);
+ if (busy_pmd_cycles > total_pmd_cycles) {
+ busy_pmd_cycles = total_pmd_cycles;
}
for (int i = 0; i < n_rxq; i++) {
@@ -921,9 +921,9 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd,
ds_put_format(reply, " %s", netdev_rxq_enabled(list[i].rxq->rx)
? "(enabled) " : "(disabled)");
ds_put_format(reply, " pmd usage: ");
- if (total_cycles) {
+ if (total_pmd_cycles) {
ds_put_format(reply, "%2"PRIu64"",
- rxq_proc_cycles * 100 / total_cycles);
+ rxq_proc_cycles * 100 / total_pmd_cycles);
ds_put_cstr(reply, " %");
} else {
ds_put_format(reply, "%s", "NOT AVAIL");
@@ -933,14 +933,14 @@ pmd_info_show_rxq(struct ds *reply, struct dp_netdev_pmd_thread *pmd,
if (n_rxq > 0) {
ds_put_cstr(reply, " overhead: ");
- if (total_cycles) {
+ if (total_pmd_cycles) {
uint64_t overhead_cycles = 0;
- if (total_rxq_proc_cycles < busy_cycles) {
- overhead_cycles = busy_cycles - total_rxq_proc_cycles;
+ if (total_rxq_proc_cycles < busy_pmd_cycles) {
+ overhead_cycles = busy_pmd_cycles - total_rxq_proc_cycles;
}
ds_put_format(reply, "%2"PRIu64" %%",
- overhead_cycles * 100 / total_cycles);
+ overhead_cycles * 100 / total_pmd_cycles);
} else {
ds_put_cstr(reply, "NOT AVAIL");
}
From 46e04ec31bb2b889bd5715d436be2bdc0268f08b Mon Sep 17 00:00:00 2001
From: Cheng Li
Date: Sat, 17 Dec 2022 13:15:36 +0000
Subject: [PATCH 096/833] dpif-netdev: Calculate per numa variance.
Currently, pmd_rebalance_dry_run() calculate overall variance of
all pmds regardless of their numa location. The overall result may
hide un-balance in an individual numa.
Considering the following case. Numa0 is free because VMs on numa0
are not sending pkts, while numa1 is busy. Within numa1, pmds
workloads are not balanced. Obviously, moving 500 kpps workloads from
pmd 126 to pmd 62 will make numa1 much more balance. For numa1
the variance improvement will be almost 100%, because after rebalance
each pmd in numa1 holds same workload(variance ~= 0). But the overall
variance improvement is only about 20%, which may not trigger auto_lb.
```
numa_id core_id kpps
0 30 0
0 31 0
0 94 0
0 95 0
1 126 1500
1 127 1000
1 63 1000
1 62 500
```
As auto_lb doesn't balance workload across numa nodes. So it makes
more sense to calculate variance improvement per numa node.
Signed-off-by: Cheng Li
Signed-off-by: Kevin Traynor
Co-authored-by: Kevin Traynor
Acked-by: Kevin Traynor
Signed-off-by: Ilya Maximets
---
Documentation/topics/dpdk/pmd.rst | 8 +--
lib/dpif-netdev.c | 87 +++++++++++++++----------------
2 files changed, 47 insertions(+), 48 deletions(-)
diff --git a/Documentation/topics/dpdk/pmd.rst b/Documentation/topics/dpdk/pmd.rst
index 88457f36694..9006fd40f07 100644
--- a/Documentation/topics/dpdk/pmd.rst
+++ b/Documentation/topics/dpdk/pmd.rst
@@ -291,10 +291,10 @@ If a PMD core is detected to be above the load threshold and the minimum
pre-requisites are met, a dry-run using the current PMD assignment algorithm is
performed.
-The current variance of load between the PMD cores and estimated variance from
-the dry-run are both calculated. If the estimated dry-run variance is improved
-from the current one by the variance threshold, a new Rx queue to PMD
-assignment will be performed.
+For each numa node, the current variance of load between the PMD cores and
+estimated variance from the dry-run are both calculated. If any numa's
+estimated dry-run variance is improved from the current one by the variance
+threshold, a new Rx queue to PMD assignment will be performed.
For example, to set the variance improvement threshold to 40%::
diff --git a/lib/dpif-netdev.c b/lib/dpif-netdev.c
index c015fb6ddc9..7127068fe0e 100644
--- a/lib/dpif-netdev.c
+++ b/lib/dpif-netdev.c
@@ -6131,39 +6131,33 @@ rxq_scheduling(struct dp_netdev *dp)
static uint64_t variance(uint64_t a[], int n);
static uint64_t
-sched_numa_list_variance(struct sched_numa_list *numa_list)
+sched_numa_variance(struct sched_numa *numa)
{
- struct sched_numa *numa;
uint64_t *percent_busy = NULL;
- unsigned total_pmds = 0;
int n_proc = 0;
uint64_t var;
- HMAP_FOR_EACH (numa, node, &numa_list->numas) {
- total_pmds += numa->n_pmds;
- percent_busy = xrealloc(percent_busy,
- total_pmds * sizeof *percent_busy);
+ percent_busy = xmalloc(numa->n_pmds * sizeof *percent_busy);
- for (unsigned i = 0; i < numa->n_pmds; i++) {
- struct sched_pmd *sched_pmd;
- uint64_t total_cycles = 0;
+ for (unsigned i = 0; i < numa->n_pmds; i++) {
+ struct sched_pmd *sched_pmd;
+ uint64_t total_cycles = 0;
- sched_pmd = &numa->pmds[i];
- /* Exclude isolated PMDs from variance calculations. */
- if (sched_pmd->isolated == true) {
- continue;
- }
- /* Get the total pmd cycles for an interval. */
- atomic_read_relaxed(&sched_pmd->pmd->intrvl_cycles, &total_cycles);
-
- if (total_cycles) {
- /* Estimate the cycles to cover all intervals. */
- total_cycles *= PMD_INTERVAL_MAX;
- percent_busy[n_proc++] = (sched_pmd->pmd_proc_cycles * 100)
- / total_cycles;
- } else {
- percent_busy[n_proc++] = 0;
- }
+ sched_pmd = &numa->pmds[i];
+ /* Exclude isolated PMDs from variance calculations. */
+ if (sched_pmd->isolated == true) {
+ continue;
+ }
+ /* Get the total pmd cycles for an interval. */
+ atomic_read_relaxed(&sched_pmd->pmd->intrvl_cycles, &total_cycles);
+
+ if (total_cycles) {
+ /* Estimate the cycles to cover all intervals. */
+ total_cycles *= PMD_INTERVAL_MAX;
+ percent_busy[n_proc++] = (sched_pmd->pmd_proc_cycles * 100)
+ / total_cycles;
+ } else {
+ percent_busy[n_proc++] = 0;
}
}
var = variance(percent_busy, n_proc);
@@ -6237,6 +6231,7 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
struct sched_numa_list numa_list_est;
bool thresh_met = false;
uint64_t current_var, estimate_var;
+ struct sched_numa *numa_cur, *numa_est;
uint64_t improvement = 0;
VLOG_DBG("PMD auto load balance performing dry run.");
@@ -6255,25 +6250,29 @@ pmd_rebalance_dry_run(struct dp_netdev *dp)
sched_numa_list_count(&numa_list_est) == 1) {
/* Calculate variances. */
- current_var = sched_numa_list_variance(&numa_list_cur);
- estimate_var = sched_numa_list_variance(&numa_list_est);
-
- if (estimate_var < current_var) {
- improvement = ((current_var - estimate_var) * 100) / current_var;
- }
- VLOG_DBG("Current variance %"PRIu64" Estimated variance %"PRIu64".",
- current_var, estimate_var);
- VLOG_DBG("Variance improvement %"PRIu64"%%.", improvement);
-
- if (improvement >= dp->pmd_alb.rebalance_improve_thresh) {
- thresh_met = true;
- VLOG_DBG("PMD load variance improvement threshold %u%% "
- "is met.", dp->pmd_alb.rebalance_improve_thresh);
- } else {
- VLOG_DBG("PMD load variance improvement threshold "
- "%u%% is not met.",
- dp->pmd_alb.rebalance_improve_thresh);
+ HMAP_FOR_EACH (numa_cur, node, &numa_list_cur.numas) {
+ numa_est = sched_numa_list_lookup(&numa_list_est,
+ numa_cur->numa_id);
+ if (!numa_est) {
+ continue;
+ }
+ current_var = sched_numa_variance(numa_cur);
+ estimate_var = sched_numa_variance(numa_est);
+ if (estimate_var < current_var) {
+ improvement = ((current_var - estimate_var) * 100)
+ / current_var;
+ }
+ VLOG_DBG("Numa node %d. Current variance %"PRIu64" Estimated "
+ "variance %"PRIu64". Variance improvement %"PRIu64"%%.",
+ numa_cur->numa_id, current_var,
+ estimate_var, improvement);
+ if (improvement >= dp->pmd_alb.rebalance_improve_thresh) {
+ thresh_met = true;
+ }
}
+ VLOG_DBG("PMD load variance improvement threshold %u%% is %s.",
+ dp->pmd_alb.rebalance_improve_thresh,
+ thresh_met ? "met" : "not met");
} else {
VLOG_DBG("PMD auto load balance detected cross-numa polling with "
"multiple numa nodes. Unable to accurately estimate.");
From d83d7c4915f1fc538f52fd05076532b744e389dd Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 22 Dec 2022 01:06:18 +0100
Subject: [PATCH 097/833] ci: Fix overriding OPTS provided from the yml.
For GCC builds we're overriding --disable-ssl or --enable-shared
options set up in the GHA yml file.
Fix that by adding to EXTRA_OPTS instead.
Fixes: 2581b0ad1159 ("travis: Combine kernel builds.")
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
.ci/linux-build.sh | 2 +-
1 file changed, 1 insertion(+), 1 deletion(-)
diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index c06186ce1cf..a944cf14962 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -221,7 +221,7 @@ elif [ "$M32" ]; then
# difference on 'configure' and 'make' stages.
export CC="$CC -m32"
else
- OPTS="--enable-sparse"
+ EXTRA_OPTS="$EXTRA_OPTS --enable-sparse"
if [ "$AFXDP" ]; then
# netdev-afxdp uses memset for 64M for umem initialization.
SPARSE_FLAGS="${SPARSE_FLAGS} -Wno-memcpy-max-count"
From 0d8318db633fb24936a0f55e869331f0c27f243f Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 22 Dec 2022 01:06:19 +0100
Subject: [PATCH 098/833] netdev-afxdp: Disable -Wfree-nonheap-object on
receive.
GCC 11+ generates a warning:
In file included from lib/netdev-linux-private.h:30,
from lib/netdev-afxdp.c:19:
In function 'dp_packet_delete',
inlined from 'dp_packet_delete' at lib/dp-packet.h:246:1,
inlined from 'dp_packet_batch_add__' at lib/dp-packet.h:775:9,
inlined from 'dp_packet_batch_add' at lib/dp-packet.h:783:5,
inlined from 'netdev_afxdp_rxq_recv' at lib/netdev-afxdp.c:898:9:
lib/dp-packet.h:260:9: warning: 'free' called on pointer
'*umem.xpool.array' with nonzero offset [8, 2558044588346441168]
[-Wfree-nonheap-object]
260 | free(b);
| ^~~~~~~
But it is a false positive since the code path is not possible.
In this call chain the packet will always have source DPBUF_AFXDP
and the free() will never be called. GCC doesn't see that, because
initialization function dp_packet_use_afxdp() is part of a different
translation unit.
Disabling a warning in this particular place to avoid build failures.
Older versions of clang do not have the -Wfree-nonheap-object, so we
need to additionally guard the pragmas. Clang is using GCC pragmas
and complains about unknown ones.
Reported-at: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108187
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
lib/netdev-afxdp.c | 13 +++++++++++++
1 file changed, 13 insertions(+)
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index ca3f2431eac..4d57efa5ce9 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -868,9 +868,22 @@ netdev_afxdp_rxq_recv(struct netdev_rxq *rxq_, struct dp_packet_batch *batch,
OVS_XDP_HEADROOM);
dp_packet_set_size(packet, len);
+#if __GNUC__ >= 11 && !__clang__
+ /* GCC 11+ generates a false-positive warning about free() being
+ * called on DPBUF_AFXDP packet, but it is an imposisible code path.
+ * Disabling a warning to avoid build failures.
+ * https://gcc.gnu.org/bugzilla/show_bug.cgi?id=108187 */
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wfree-nonheap-object"
+#endif
+
/* Add packet into batch, increase batch->count. */
dp_packet_batch_add(batch, packet);
+#if __GNUC__ && !__clang__
+#pragma GCC diagnostic pop
+#endif
+
idx_rx++;
}
/* Release the RX queue. */
From 1dcc490d44879f33392337dfd9175645fcc4118e Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 22 Dec 2022 01:06:20 +0100
Subject: [PATCH 099/833] netdev-afxdp: Allow building with libxdp and newer
libbpf.
AF_XDP functions was deprecated in libbpf 0.7 and moved to libxdp.
Functions bpf_get/set_link_xdp_id() was deprecated in libbpf 0.8
and replaced with bpf_xdp_query_id() and bpf_xdp_attach/detach().
Updating configuration and source code to accommodate above changes
and allow building OVS with AF_XDP support on newer systems:
- Checking the version of libbpf by detecting availability
of bpf_xdp_detach.
- Checking availability of the libxdp in a system by looking
for a library providing libxdp_strerror(), if libbpf is
newer than 0.6. And checking for xsk.h header provided by
libxdp-dev[el].
- Use xsk.h from libbpf if it is older than 0.7 and not linking
with libxdp in this case as there are known incompatible
versions of libxdp in distributions.
- Check for the NEED_WAKEUP feature replaced with direct checking
in the source code if XDP_USE_NEED_WAKEUP is defined.
- Checking availability of bpf_xdp_query_id and bpf_xdp_detach
and using them instead of deprecated APIs. Fall back to old
functions if not found.
- Dropped LIBBPF_LDADD variable as it makes library and function
detection much harder without providing any actual benefits.
AC_SEARCH_LIBS is used instead and it allows use of AC_CHECK_FUNCS.
- Header includes moved around to files where they are actually used.
- Removed libelf dependency as it is not really used.
With these changes it should be possible to build OVS with either:
- libbpf built from the kernel sources (5.19 or older).
- libbpf < 0.7 provided in distributions.
- libxdp and libbpf >= 0.7 provided in newer distributions.
While it is technically possible to build with libbpf 0.7+ without
libxdp at the moment we're not allowing that for a few reasons.
First, required functions in libbpf are deprecated and can be removed
in future releases. Second, support for all these combinations makes
the detection code fairly complex.
AFAIK, most of the distributions packaging libbpf 0.7+ do package
libxdp as well.
libxdp added as a build dependency for Fedora build since all
supported versions of Fedora are packaging this library.
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
NEWS | 2 ++
acinclude.m4 | 28 ++++++++++++++----------
lib/automake.mk | 1 -
lib/libopenvswitch.pc.in | 2 +-
lib/netdev-afxdp-pool.c | 2 ++
lib/netdev-afxdp-pool.h | 5 -----
lib/netdev-afxdp.c | 38 ++++++++++++++++++++++++++-------
rhel/openvswitch-fedora.spec.in | 2 +-
8 files changed, 53 insertions(+), 27 deletions(-)
diff --git a/NEWS b/NEWS
index 92d33c2912a..ce5d11d73a9 100644
--- a/NEWS
+++ b/NEWS
@@ -2,6 +2,8 @@ Post-v3.0.0
--------------------
- ovs-vswitchd now detects changes in CPU affinity and adjusts the number
of handler and revalidator threads if necessary.
+ - AF_XDP:
+ * Added support for building with libxdp and libbpf >= 0.7.
- ovs-appctl:
* "ovs-appctl ofproto/trace" command can now display port names with the
"--names" option.
diff --git a/acinclude.m4 b/acinclude.m4
index aa9af55062f..e47e925b376 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -251,7 +251,7 @@ AC_DEFUN([OVS_FIND_DEPENDENCY], [
dnl OVS_CHECK_LINUX_AF_XDP
dnl
-dnl Check both Linux kernel AF_XDP and libbpf support
+dnl Check both Linux kernel AF_XDP and libbpf/libxdp support
AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
AC_ARG_ENABLE([afxdp],
[AS_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
@@ -270,8 +270,21 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
AC_CHECK_HEADER([linux/if_xdp.h], [],
[AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
- AC_CHECK_HEADER([bpf/xsk.h], [],
- [AC_MSG_ERROR([unable to find bpf/xsk.h for AF_XDP support])])
+ OVS_FIND_DEPENDENCY([libbpf_strerror], [bpf], [libbpf])
+ AC_CHECK_FUNCS([bpf_xdp_query_id bpf_xdp_detach])
+
+ if test "x$ac_cv_func_bpf_xdp_detach" = xyes; then
+ dnl We have libbpf >= 0.7. Look for libxdp as xsk functions
+ dnl were moved into this library.
+ OVS_FIND_DEPENDENCY([libxdp_strerror], [xdp], [libxdp])
+ AC_CHECK_HEADER([xdp/xsk.h],
+ AC_DEFINE([HAVE_LIBXDP], [1], [xsk.h is supplied with libxdp]),
+ AC_MSG_ERROR([unable to find xdp/xsk.h for AF_XDP support]))
+ else
+ dnl libbpf < 0.7 contains all the necessary functionality.
+ AC_CHECK_HEADER([bpf/xsk.h], [],
+ [AC_MSG_ERROR([unable to find bpf/xsk.h for AF_XDP support])])
+ fi
AC_CHECK_FUNCS([pthread_spin_lock], [],
[AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])])
@@ -280,13 +293,6 @@ AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
AC_DEFINE([HAVE_AF_XDP], [1],
[Define to 1 if AF_XDP support is available and enabled.])
- LIBBPF_LDADD=" -lbpf -lelf"
- AC_SUBST([LIBBPF_LDADD])
-
- AC_CHECK_DECL([xsk_ring_prod__needs_wakeup], [
- AC_DEFINE([HAVE_XDP_NEED_WAKEUP], [1],
- [XDP need wakeup support detected in xsk.h.])
- ], [], [[#include ]])
fi
AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
])
@@ -357,7 +363,7 @@ AC_DEFUN([OVS_CHECK_DPDK], [
], [], [[#include ]])
AC_CHECK_DECL([RTE_NET_AF_XDP], [
- LIBBPF_LDADD="-lbpf"
+ OVS_FIND_DEPENDENCY([libbpf_strerror], [bpf], [libbpf])
], [], [[#include ]])
AC_CHECK_DECL([RTE_LIBRTE_VHOST_NUMA], [
diff --git a/lib/automake.mk b/lib/automake.mk
index a0fabe38f36..61bdc308f07 100644
--- a/lib/automake.mk
+++ b/lib/automake.mk
@@ -9,7 +9,6 @@ lib_LTLIBRARIES += lib/libopenvswitch.la
lib_libopenvswitch_la_LIBADD = $(SSL_LIBS)
lib_libopenvswitch_la_LIBADD += $(CAPNG_LDADD)
-lib_libopenvswitch_la_LIBADD += $(LIBBPF_LDADD)
if WIN32
diff --git a/lib/libopenvswitch.pc.in b/lib/libopenvswitch.pc.in
index 44fbb1f9fd2..a5f4d39479a 100644
--- a/lib/libopenvswitch.pc.in
+++ b/lib/libopenvswitch.pc.in
@@ -7,5 +7,5 @@ Name: libopenvswitch
Description: Open vSwitch library
Version: @VERSION@
Libs: -L${libdir} -lopenvswitch
-Libs.private: @LIBS@ @SSL_LIBS@ @CAPNG_LDADD@ @LIBBPF_LDADD@
+Libs.private: @LIBS@ @SSL_LIBS@ @CAPNG_LDADD@
Cflags: -I${includedir}
diff --git a/lib/netdev-afxdp-pool.c b/lib/netdev-afxdp-pool.c
index 3386d2dcf78..f56a7b29ece 100644
--- a/lib/netdev-afxdp-pool.c
+++ b/lib/netdev-afxdp-pool.c
@@ -15,6 +15,8 @@
*/
#include
+#include
+
#include "dp-packet.h"
#include "netdev-afxdp-pool.h"
#include "openvswitch/util.h"
diff --git a/lib/netdev-afxdp-pool.h b/lib/netdev-afxdp-pool.h
index f929b9489c7..6681cf539e9 100644
--- a/lib/netdev-afxdp-pool.h
+++ b/lib/netdev-afxdp-pool.h
@@ -19,12 +19,7 @@
#ifdef HAVE_AF_XDP
-#include
-#include
-#include
-
#include "openvswitch/thread.h"
-#include "ovs-atomic.h"
/* LIFO ptr_array. */
struct umem_pool {
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index 4d57efa5ce9..f8995da1fda 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -21,6 +21,11 @@
#include "netdev-afxdp.h"
#include "netdev-afxdp-pool.h"
+#ifdef HAVE_LIBXDP
+#include
+#else
+#include
+#endif
#include
#include
#include
@@ -29,6 +34,7 @@
#include
#include
#include
+#include
#include
#include
#include
@@ -44,6 +50,7 @@
#include "openvswitch/list.h"
#include "openvswitch/thread.h"
#include "openvswitch/vlog.h"
+#include "ovs-atomic.h"
#include "ovs-numa.h"
#include "packets.h"
#include "socket-util.h"
@@ -72,7 +79,7 @@ static struct vlog_rate_limit rl = VLOG_RATE_LIMIT_INIT(5, 20);
#define PROD_NUM_DESCS XSK_RING_PROD__DEFAULT_NUM_DESCS
#define CONS_NUM_DESCS XSK_RING_CONS__DEFAULT_NUM_DESCS
-#ifdef HAVE_XDP_NEED_WAKEUP
+#ifdef XDP_USE_NEED_WAKEUP
#define NEED_WAKEUP_DEFAULT true
#else
#define NEED_WAKEUP_DEFAULT false
@@ -169,7 +176,7 @@ struct netdev_afxdp_tx_lock {
);
};
-#ifdef HAVE_XDP_NEED_WAKEUP
+#ifdef XDP_USE_NEED_WAKEUP
static inline void
xsk_rx_wakeup_if_needed(struct xsk_umem_info *umem,
struct netdev *netdev, int fd)
@@ -201,7 +208,7 @@ xsk_tx_need_wakeup(struct xsk_socket_info *xsk_info)
return xsk_ring_prod__needs_wakeup(&xsk_info->tx);
}
-#else /* !HAVE_XDP_NEED_WAKEUP */
+#else /* !XDP_USE_NEED_WAKEUP */
static inline void
xsk_rx_wakeup_if_needed(struct xsk_umem_info *umem OVS_UNUSED,
struct netdev *netdev OVS_UNUSED,
@@ -215,7 +222,7 @@ xsk_tx_need_wakeup(struct xsk_socket_info *xsk_info OVS_UNUSED)
{
return true;
}
-#endif /* HAVE_XDP_NEED_WAKEUP */
+#endif /* XDP_USE_NEED_WAKEUP */
static void
netdev_afxdp_cleanup_unused_pool(struct unused_pool *pool)
@@ -351,7 +358,7 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
cfg.bind_flags = xdp_modes[mode].bind_flags;
cfg.xdp_flags = xdp_modes[mode].xdp_flags | XDP_FLAGS_UPDATE_IF_NOEXIST;
-#ifdef HAVE_XDP_NEED_WAKEUP
+#ifdef XDP_USE_NEED_WAKEUP
if (use_need_wakeup) {
cfg.bind_flags |= XDP_USE_NEED_WAKEUP;
}
@@ -377,7 +384,11 @@ xsk_configure_socket(struct xsk_umem_info *umem, uint32_t ifindex,
}
/* Make sure the built-in AF_XDP program is loaded. */
+#ifdef HAVE_BPF_XDP_QUERY_ID
+ ret = bpf_xdp_query_id(ifindex, cfg.xdp_flags, &prog_id);
+#else
ret = bpf_get_link_xdp_id(ifindex, &prog_id, cfg.xdp_flags);
+#endif
if (ret || !prog_id) {
if (ret) {
VLOG_ERR("Get XDP prog ID failed (%s)", ovs_strerror(errno));
@@ -630,9 +641,9 @@ netdev_afxdp_set_config(struct netdev *netdev, const struct smap *args,
}
need_wakeup = smap_get_bool(args, "use-need-wakeup", NEED_WAKEUP_DEFAULT);
-#ifndef HAVE_XDP_NEED_WAKEUP
+#ifndef XDP_USE_NEED_WAKEUP
if (need_wakeup) {
- VLOG_WARN("XDP need_wakeup is not supported in libbpf.");
+ VLOG_WARN("XDP need_wakeup is not supported in libbpf/libxdp.");
need_wakeup = false;
}
#endif
@@ -742,7 +753,11 @@ xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode)
uint32_t ret, prog_id = 0;
/* Check whether XDP program is loaded. */
+#ifdef HAVE_BPF_XDP_QUERY_ID
+ ret = bpf_xdp_query_id(ifindex, flags, &prog_id);
+#else
ret = bpf_get_link_xdp_id(ifindex, &prog_id, flags);
+#endif
if (ret) {
VLOG_ERR("Failed to get XDP prog id (%s)", ovs_strerror(errno));
return;
@@ -753,7 +768,14 @@ xsk_remove_xdp_program(uint32_t ifindex, enum afxdp_mode mode)
return;
}
- bpf_set_link_xdp_fd(ifindex, -1, flags);
+#ifdef HAVE_BPF_XDP_DETACH
+ if (bpf_xdp_detach(ifindex, flags, NULL) != 0) {
+#else
+ if (bpf_set_link_xdp_fd(ifindex, -1, flags) != 0) {
+#endif
+ VLOG_ERR("Failed to detach XDP program (%s) at ifindex %d",
+ ovs_strerror(errno), ifindex);
+ }
}
void
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 8fc6e8ab233..eb5077a215f 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -75,7 +75,7 @@ BuildRequires: dpdk-devel >= 22.11
Provides: %{name}-dpdk = %{version}-%{release}
%endif
%if %{with afxdp}
-BuildRequires: libbpf-devel numactl-devel
+BuildRequires: libxdp-devel libbpf-devel numactl-devel
%endif
BuildRequires: unbound unbound-devel
From b17cadff1d3d060eb8b19aac8787b894d2e1c89a Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 22 Dec 2022 01:06:21 +0100
Subject: [PATCH 100/833] netdev-afxdp: Hide too large memset from sparse.
Sparse complains about 64M umem initialization. Hide it from
the checker instead of disabling a warning globally.
SPARSE_FLAGS are kept in the CI script even though they are
empty at the moment.
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
.ci/linux-build.sh | 4 ----
lib/netdev-afxdp.c | 4 ++++
2 files changed, 4 insertions(+), 4 deletions(-)
diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index a944cf14962..e6e4f6a60e6 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -222,10 +222,6 @@ elif [ "$M32" ]; then
export CC="$CC -m32"
else
EXTRA_OPTS="$EXTRA_OPTS --enable-sparse"
- if [ "$AFXDP" ]; then
- # netdev-afxdp uses memset for 64M for umem initialization.
- SPARSE_FLAGS="${SPARSE_FLAGS} -Wno-memcpy-max-count"
- fi
CFLAGS_FOR_OVS="${CFLAGS_FOR_OVS} ${SPARSE_FLAGS}"
fi
diff --git a/lib/netdev-afxdp.c b/lib/netdev-afxdp.c
index f8995da1fda..16f26bc3065 100644
--- a/lib/netdev-afxdp.c
+++ b/lib/netdev-afxdp.c
@@ -434,7 +434,11 @@ xsk_configure(int ifindex, int xdp_queue_id, enum afxdp_mode mode,
/* Umem memory region. */
bufs = xmalloc_pagealign(NUM_FRAMES * FRAME_SIZE);
+#ifndef __CHECKER__
+ /* Sparse complains about a very large memset, but it is OK in this case.
+ * So, hiding it from the checker. */
memset(bufs, 0, NUM_FRAMES * FRAME_SIZE);
+#endif
/* Create AF_XDP socket. */
umem = xsk_configure_umem(bufs, NUM_FRAMES * FRAME_SIZE);
From 649dbc19ffc0acd050ad729b9052aba8c7fce090 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 22 Dec 2022 01:06:22 +0100
Subject: [PATCH 101/833] github: Test AF_XDP build using libbpf instead of
kernel sources.
AF_XDP bits was removed from kernel's libbpf in 6.0. libbpf
and libxdp are now primary way to build AF_XDP applications.
Most of modern distributions are already packaging some version
of libbpf, so it's better to test building with it instead
of building old unsupported kernel tree.
Ubuntu started packaging libxdp only in 22.10, so not using
it for now.
Kernel build infrastructure in CI scripts is not needed anymore.
Removed.
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
.ci/linux-build.sh | 77 ----------------------------
.github/workflows/build-and-test.yml | 10 ++--
2 files changed, 3 insertions(+), 84 deletions(-)
diff --git a/.ci/linux-build.sh b/.ci/linux-build.sh
index e6e4f6a60e6..10021fddb25 100755
--- a/.ci/linux-build.sh
+++ b/.ci/linux-build.sh
@@ -7,79 +7,6 @@ CFLAGS_FOR_OVS="-g -O2"
SPARSE_FLAGS=""
EXTRA_OPTS="--enable-Werror"
-function install_kernel()
-{
- if [[ "$1" =~ ^5.* ]]; then
- PREFIX="v5.x"
- elif [[ "$1" =~ ^4.* ]]; then
- PREFIX="v4.x"
- elif [[ "$1" =~ ^3.* ]]; then
- PREFIX="v3.x"
- else
- PREFIX="v2.6/longterm/v2.6.32"
- fi
-
- base_url="https://cdn.kernel.org/pub/linux/kernel/${PREFIX}"
- # Download page with list of all available kernel versions.
- wget ${base_url}/
- # Uncompress in case server returned gzipped page.
- (file index* | grep ASCII) || (mv index* index.new.gz && gunzip index*)
- # Get version of the latest stable release.
- hi_ver=$(echo ${1} | sed 's/\./\\\./')
- lo_ver=$(cat ./index* | grep -P -o "${hi_ver}\.[0-9]+" | \
- sed 's/.*\..*\.\(.*\)/\1/' | sort -h | tail -1)
- version="${1}.${lo_ver}"
-
- rm -rf index* linux-*
-
- url="${base_url}/linux-${version}.tar.xz"
- # Download kernel sources. Try direct link on CDN failure.
- wget ${url} ||
- (rm -f linux-${version}.tar.xz && wget ${url}) ||
- (rm -f linux-${version}.tar.xz && wget ${url/cdn/www})
-
- tar xvf linux-${version}.tar.xz > /dev/null
- pushd linux-${version}
- make allmodconfig
-
- # Cannot use CONFIG_KCOV: -fsanitize-coverage=trace-pc is not supported by compiler
- sed -i 's/CONFIG_KCOV=y/CONFIG_KCOV=n/' .config
-
- # stack validation depends on tools/objtool, but objtool does not compile on travis.
- # It is giving following error.
- # >>> GEN arch/x86/insn/inat-tables.c
- # >>> Semantic error at 40: Unknown imm opnd: AL
- # So for now disable stack-validation for the build.
-
- sed -i 's/CONFIG_STACK_VALIDATION=y/CONFIG_STACK_VALIDATION=n/' .config
- make oldconfig
-
- # Older kernels do not include openvswitch
- if [ -d "net/openvswitch" ]; then
- make net/openvswitch/
- else
- make net/bridge/
- fi
-
- if [ "$AFXDP" ]; then
- sudo make headers_install INSTALL_HDR_PATH=/usr
- pushd tools/lib/bpf/
- # Bulding with gcc because there are some issues in make files
- # that breaks building libbpf with clang on Travis.
- CC=gcc sudo make install
- CC=gcc sudo make install_headers
- sudo ldconfig
- popd
- # The Linux kernel defines __always_inline in stddef.h (283d7573), and
- # sys/cdefs.h tries to re-define it. Older libc-dev package in xenial
- # doesn't have a fix for this issue. Applying it manually.
- sudo sed -i '/^# define __always_inline .*/i # undef __always_inline' \
- /usr/include/x86_64-linux-gnu/sys/cdefs.h || true
- EXTRA_OPTS="${EXTRA_OPTS} --enable-afxdp"
- fi
- popd
-}
-
function install_dpdk()
{
local DPDK_VER=$1
@@ -202,10 +129,6 @@ assert ovs.json.from_string('{\"a\": 42}') == {'a': 42}"
exit 0
fi
-if [ "$KERNEL" ]; then
- install_kernel $KERNEL
-fi
-
if [ "$DPDK" ] || [ "$DPDK_SHARED" ]; then
if [ -z "$DPDK_VER" ]; then
DPDK_VER="22.11.1"
diff --git a/.github/workflows/build-and-test.yml b/.github/workflows/build-and-test.yml
index 1949d12001b..82675b9734d 100644
--- a/.github/workflows/build-and-test.yml
+++ b/.github/workflows/build-and-test.yml
@@ -8,14 +8,12 @@ jobs:
dependencies: |
automake libtool gcc bc libjemalloc2 libjemalloc-dev \
libssl-dev llvm-dev libelf-dev libnuma-dev libpcap-dev \
- ninja-build selinux-policy-dev
- AFXDP: ${{ matrix.afxdp }}
+ ninja-build selinux-policy-dev libbpf-dev
ASAN: ${{ matrix.asan }}
UBSAN: ${{ matrix.ubsan }}
CC: ${{ matrix.compiler }}
DPDK: ${{ matrix.dpdk }}
DPDK_SHARED: ${{ matrix.dpdk_shared }}
- KERNEL: ${{ matrix.kernel }}
LIBS: ${{ matrix.libs }}
M32: ${{ matrix.m32 }}
OPTS: ${{ matrix.opts }}
@@ -65,11 +63,9 @@ jobs:
libs: -ljemalloc
- compiler: gcc
- afxdp: afxdp
- kernel: 5.3
+ opts: --enable-afxdp
- compiler: clang
- afxdp: afxdp
- kernel: 5.3
+ opts: --enable-afxdp
- compiler: gcc
dpdk: dpdk
From 771a55825f4a1d84c18439ae5a7485807169b0f9 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 22 Dec 2022 01:06:23 +0100
Subject: [PATCH 102/833] Documentation/afxdp: Use packaged libbpf/libxdp for
the build.
Necessary bits was removed from the kernel's libbpf in 6.0 release,
so the instructions on how to build libbpf from kernel sources are
now incorrect. Suggest to use libbpf and libxdp packaged by
distributions instead.
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
Documentation/intro/install/afxdp.rst | 39 ++++++---------------------
1 file changed, 8 insertions(+), 31 deletions(-)
diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
index bfef4986015..a4f0b87fe2c 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -88,7 +88,7 @@ Build requirements
In addition to the requirements described in :doc:`general`, building Open
vSwitch with AF_XDP will require the following:
-- libbpf from kernel source tree (kernel 5.0.0 or later)
+- ``libbpf`` and ``libxdp`` (if version of ``libbpf`` if higher than ``0.6``).
- Linux kernel XDP support, with the following options (required)
@@ -125,41 +125,18 @@ vSwitch with AF_XDP will require the following:
Installing
----------
For OVS to use AF_XDP netdev, it has to be configured with LIBBPF support.
-First, clone a recent version of Linux bpf-next tree::
- git clone git://git.kernel.org/pub/scm/linux/kernel/git/bpf/bpf-next.git
+First, install ``libbpf`` and ``libxdp``. For example, on Fedora these
+libraries along with development headers can be obtained by installing
+``libbpf-devel`` and ``libxdp-devel`` packages. For Ubuntu that will be
+``libbpf-dev`` package with additional ``libxdp-dev`` on Ubuntu 22.10
+or later.
-Second, go into the Linux source directory and build libbpf in the tools
-directory::
-
- cd bpf-next/
- cd tools/lib/bpf/
- make && make install
- make install_headers
-
-.. note::
- Make sure xsk.h and bpf.h are installed in system's library path,
- e.g. /usr/local/include/bpf/ or /usr/include/bpf/
-
-Make sure the libbpf.so is installed correctly::
-
- ldconfig
- ldconfig -p | grep libbpf
-
-.. note::
- Check /etc/ld.so.conf if libbpf is installed but can not be found by
- ldconfig.
-
-Third, ensure the standard OVS requirements are installed and
+Next, ensure the standard OVS requirements are installed and
bootstrap/configure the package::
./boot.sh && ./configure --enable-afxdp
-.. note::
- If you encounter "WARNING: bpf/libbpf.h: present but cannot be compiled",
- check the Linux headers are in line with libbpf. For example, in Ubuntu,
- check the installed linux-headers* and linux-libc-dev* dpkg.
-
Finally, build and install OVS::
make && make install
@@ -182,7 +159,7 @@ If a test case fails, check the log at::
Setup AF_XDP netdev
-------------------
-Before running OVS with AF_XDP, make sure the libbpf, libelf, and libnuma are
+Before running OVS with AF_XDP, make sure the libbpf and libnuma are
set-up right::
ldd vswitchd/ovs-vswitchd
From e44e80343189fcb7ec10d776f1b62747d7095c18 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 22 Dec 2022 01:06:24 +0100
Subject: [PATCH 103/833] acinclude.m4: Build with AF_XDP support by default if
possible.
With this change we will try to detect all the netdev-afxdp
dependencies and enable AF_XDP support by default if they are
present at the build time.
Configuration script behaves in a following way:
- ./configure --enable-afxdp
Will check for AF_XDP dependencies and fail if they are
not available.
- ./configure --disable-afxdp
Disables checking for AF_XDP. Build will not support
AF_XDP even if all dependencies are installed.
- Just ./configure or ./configure --enable-afxdp=auto
Will check for AF_XDP dependencies. Will print a warning
if they are not available, but will continue without AF_XDP
support. If dependencies are available in a system, this
option is equal to --enable-afxdp.
'--disable-afxdp' added to the debian and fedora package builds
to keep predictable behavior.
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
Documentation/intro/install/afxdp.rst | 6 ++-
NEWS | 3 ++
acinclude.m4 | 72 ++++++++++++++++++---------
debian/rules | 25 ++++++----
rhel/openvswitch-fedora.spec.in | 2 +
5 files changed, 72 insertions(+), 36 deletions(-)
diff --git a/Documentation/intro/install/afxdp.rst b/Documentation/intro/install/afxdp.rst
index a4f0b87fe2c..51c24bf5b1e 100644
--- a/Documentation/intro/install/afxdp.rst
+++ b/Documentation/intro/install/afxdp.rst
@@ -30,8 +30,7 @@ This document describes how to build and install Open vSwitch using
AF_XDP netdev.
.. warning::
- The AF_XDP support of Open vSwitch is considered 'experimental',
- and it is not compiled in by default.
+ The AF_XDP support of Open vSwitch is considered 'experimental'.
Introduction
@@ -137,6 +136,9 @@ bootstrap/configure the package::
./boot.sh && ./configure --enable-afxdp
+``--enable-afxdp`` here is optional, but it will ensure that all dependencies
+are available at the build time.
+
Finally, build and install OVS::
make && make install
diff --git a/NEWS b/NEWS
index ce5d11d73a9..2f6ededfe47 100644
--- a/NEWS
+++ b/NEWS
@@ -4,6 +4,9 @@ Post-v3.0.0
of handler and revalidator threads if necessary.
- AF_XDP:
* Added support for building with libxdp and libbpf >= 0.7.
+ * Support for AF_XDP is now enabled by default if all dependencies are
+ available at the build time. Use --disable-afxdp to disable.
+ Use --enable-afxdp to fail the build if dependencies are not present.
- ovs-appctl:
* "ovs-appctl ofproto/trace" command can now display port names with the
"--names" option.
diff --git a/acinclude.m4 b/acinclude.m4
index e47e925b376..8aecfb63d2a 100644
--- a/acinclude.m4
+++ b/acinclude.m4
@@ -253,46 +253,70 @@ dnl OVS_CHECK_LINUX_AF_XDP
dnl
dnl Check both Linux kernel AF_XDP and libbpf/libxdp support
AC_DEFUN([OVS_CHECK_LINUX_AF_XDP], [
- AC_ARG_ENABLE([afxdp],
- [AS_HELP_STRING([--enable-afxdp], [Enable AF-XDP support])],
- [], [enable_afxdp=no])
+ AC_ARG_ENABLE(
+ [afxdp],
+ [AS_HELP_STRING([--disable-afxdp], [Disable AF-XDP support])],
+ [case "${enableval}" in
+ (yes | no | auto) ;;
+ (*) AC_MSG_ERROR([bad value ${enableval} for --enable-afxdp]) ;;
+ esac],
+ [enable_afxdp=auto])
+
AC_MSG_CHECKING([whether AF_XDP is enabled])
- if test "$enable_afxdp" != yes; then
+ if test "$enable_afxdp" == no; then
AC_MSG_RESULT([no])
AF_XDP_ENABLE=false
else
- AC_MSG_RESULT([yes])
+ AC_MSG_RESULT([$enable_afxdp])
AF_XDP_ENABLE=true
+ failed_dep=none
+ dnl Saving libs to restore in case we will end up not building with AF_XDP.
+ save_LIBS=$LIBS
- AC_CHECK_HEADER([bpf/libbpf.h], [],
- [AC_MSG_ERROR([unable to find bpf/libbpf.h for AF_XDP support])])
+ AC_CHECK_HEADER([bpf/libbpf.h], [], [failed_dep="bpf/libbpf.h"])
- AC_CHECK_HEADER([linux/if_xdp.h], [],
- [AC_MSG_ERROR([unable to find linux/if_xdp.h for AF_XDP support])])
+ if test "$failed_dep" = none; then
+ AC_CHECK_HEADER([linux/if_xdp.h], [], [failed_dep="linux/if_xdp.h"])
+ fi
- OVS_FIND_DEPENDENCY([libbpf_strerror], [bpf], [libbpf])
- AC_CHECK_FUNCS([bpf_xdp_query_id bpf_xdp_detach])
+ if test "$failed_dep" = none; then
+ AC_SEARCH_LIBS([libbpf_strerror], [bpf], [], [failed_dep="libbpf"])
+ AC_CHECK_FUNCS([bpf_xdp_query_id bpf_xdp_detach])
+ fi
- if test "x$ac_cv_func_bpf_xdp_detach" = xyes; then
+ if test "$failed_dep" = none -a "x$ac_cv_func_bpf_xdp_detach" = xyes; then
dnl We have libbpf >= 0.7. Look for libxdp as xsk functions
dnl were moved into this library.
- OVS_FIND_DEPENDENCY([libxdp_strerror], [xdp], [libxdp])
- AC_CHECK_HEADER([xdp/xsk.h],
- AC_DEFINE([HAVE_LIBXDP], [1], [xsk.h is supplied with libxdp]),
- AC_MSG_ERROR([unable to find xdp/xsk.h for AF_XDP support]))
- else
+ AC_SEARCH_LIBS([libxdp_strerror], [xdp],
+ AC_CHECK_HEADER([xdp/xsk.h],
+ AC_DEFINE([HAVE_LIBXDP], [1], [xsk.h is supplied with libxdp]),
+ [failed_dep="xdp/xsk.h"]),
+ [failed_dep="libxdp"])
+ elif test "$failed_dep" = none; then
dnl libbpf < 0.7 contains all the necessary functionality.
- AC_CHECK_HEADER([bpf/xsk.h], [],
- [AC_MSG_ERROR([unable to find bpf/xsk.h for AF_XDP support])])
+ AC_CHECK_HEADER([bpf/xsk.h], [], [failed_dep="bpf/xsk.h"])
fi
- AC_CHECK_FUNCS([pthread_spin_lock], [],
- [AC_MSG_ERROR([unable to find pthread_spin_lock for AF_XDP support])])
+ if test "$failed_dep" = none; then
+ AC_CHECK_FUNCS([pthread_spin_lock], [], [failed_dep="pthread_spin_lock"])
+ fi
- OVS_FIND_DEPENDENCY([numa_alloc_onnode], [numa], [libnuma])
+ if test "$failed_dep" = none; then
+ AC_SEARCH_LIBS([numa_alloc_onnode], [numa], [], [failed_dep="libnuma"])
+ fi
- AC_DEFINE([HAVE_AF_XDP], [1],
- [Define to 1 if AF_XDP support is available and enabled.])
+ if test "$failed_dep" = none; then
+ AC_DEFINE([HAVE_AF_XDP], [1],
+ [Define to 1 if AF_XDP support is available and enabled.])
+ elif test "$enable_afxdp" = yes; then
+ AC_MSG_ERROR([Missing $failed_dep dependency for AF_XDP support])
+ else
+ AC_MSG_WARN(m4_normalize(
+ [Cannot find $failed_dep, netdev-afxdp will not be supported
+ (use --disable-afxdp to suppress this warning).]))
+ AF_XDP_ENABLE=false
+ LIBS=$save_LIBS
+ fi
fi
AM_CONDITIONAL([HAVE_AF_XDP], test "$AF_XDP_ENABLE" = true)
])
diff --git a/debian/rules b/debian/rules
index 971bc1775ee..ddbd4dc5c15 100755
--- a/debian/rules
+++ b/debian/rules
@@ -23,21 +23,26 @@ override_dh_auto_configure:
test -d _debian || mkdir _debian
cd _debian && ( \
test -e Makefile || \
- ../configure --prefix=/usr --localstatedir=/var --enable-ssl \
- --sysconfdir=/etc \
- $(DATAPATH_CONFIGURE_OPTS) \
- $(EXTRA_CONFIGURE_OPTS) \
- )
+ ../configure --prefix=/usr --localstatedir=/var \
+ --enable-ssl \
+ --disable-afxdp \
+ --sysconfdir=/etc \
+ $(DATAPATH_CONFIGURE_OPTS) \
+ $(EXTRA_CONFIGURE_OPTS) \
+ )
ifneq (,$(filter i386 amd64 ppc64el arm64, $(DEB_HOST_ARCH)))
ifeq (,$(filter nodpdk, $(DEB_BUILD_OPTIONS)))
test -d _dpdk || mkdir _dpdk
cd _dpdk && ( \
test -e Makefile || \
- ../configure --prefix=/usr --localstatedir=/var --enable-ssl \
- --with-dpdk=shared --sysconfdir=/etc \
- $(DATAPATH_CONFIGURE_OPTS) \
- $(EXTRA_CONFIGURE_OPTS) \
- )
+ ../configure --prefix=/usr --localstatedir=/var \
+ --enable-ssl \
+ --disable-afxdp \
+ --with-dpdk=shared \
+ --sysconfdir=/etc \
+ $(DATAPATH_CONFIGURE_OPTS) \
+ $(EXTRA_CONFIGURE_OPTS) \
+ )
endif
endif
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index eb5077a215f..3091e204e15 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -171,6 +171,8 @@ This package provides IPsec tunneling support for OVS tunnels.
%endif
%if %{with afxdp}
--enable-afxdp \
+%else
+ --disable-afxdp \
%endif
--enable-ssl \
--disable-static \
From 9736b971b519b725507116578d780d822755b2a6 Mon Sep 17 00:00:00 2001
From: Ilya Maximets
Date: Thu, 22 Dec 2022 01:06:25 +0100
Subject: [PATCH 104/833] rhel: Enable AF_XDP by default in Fedora builds.
All supported versions of Fedora do package libxdp and libbpf, so it
makes sense to enable AF_XDP support.
Control files for debian packaging are much less flexible, so its hard
to enable AF_XDP builds while not breaking builds for version of Ubuntu
and Debian that do not package libbpf or libxdp.
Acked-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
rhel/openvswitch-fedora.spec.in | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/rhel/openvswitch-fedora.spec.in b/rhel/openvswitch-fedora.spec.in
index 3091e204e15..44899c1ca74 100644
--- a/rhel/openvswitch-fedora.spec.in
+++ b/rhel/openvswitch-fedora.spec.in
@@ -26,8 +26,8 @@
%bcond_without libcapng
# To enable DPDK support, specify '--with dpdk' when building
%bcond_with dpdk
-# To enable AF_XDP support, specify '--with afxdp' when building
-%bcond_with afxdp
+# To disable AF_XDP support, specify '--without afxdp' when building
+%bcond_without afxdp
# If there is a need to automatically enable the package after installation,
# specify the "--with autoenable"
From 62e85106b4439faf28261ee0776d3d9f9736994e Mon Sep 17 00:00:00 2001
From: Eelco Chaudron
Date: Thu, 22 Dec 2022 10:12:12 +0100
Subject: [PATCH 105/833] utilities: Add USDT script to monitor dpif netlink
execute message queuing.
This patch adds the dpif_nl_exec_monitor.py script that will used the
existing dpif_netlink_operate__:op_flow_execute USDT probe to show
all DPIF_OP_EXECUTE operations being queued for transmission over
the netlink interface.
Here is an example, truncated output:
Display DPIF_OP_EXECUTE operations being queued for transmission...
TIME CPU COMM PID NL_SIZE
3124.516679897 1 ovs-vswitchd 8219 180
nlmsghdr : len = 0, type = 36, flags = 1, seq = 0, pid = 0
genlmsghdr: cmd = 3, version = 1, reserver = 0
ovs_header: dp_ifindex = 21
> Decode OVS_PACKET_ATTR_* TLVs:
nla_len 46, nla_type OVS_PACKET_ATTR_PACKET[1], data: 00 00 00...
nla_len 20, nla_type OVS_PACKET_ATTR_KEY[2], data: 08 00 02 00...
> Decode OVS_KEY_ATTR_* TLVs:
nla_len 8, nla_type OVS_KEY_ATTR_PRIORITY[2], data: 00 00...
nla_len 8, nla_type OVS_KEY_ATTR_SKB_MARK[15], data: 00 00...
nla_len 88, nla_type OVS_PACKET_ATTR_ACTIONS[3], data: 4c 00 03...
> Decode OVS_ACTION_ATTR_* TLVs:
nla_len 76, nla_type OVS_ACTION_ATTR_SET[3], data: 48 00...
> Decode OVS_TUNNEL_KEY_ATTR_* TLVs:
nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_ID[0], data:...
nla_len 20, nla_type OVS_TUNNEL_KEY_ATTR_IPV6_DST[13], ...
nla_len 5, nla_type OVS_TUNNEL_KEY_ATTR_TTL[4], data: 40
nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT[5]...
nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_CSUM[6], data:
nla_len 6, nla_type OVS_TUNNEL_KEY_ATTR_TP_DST[10],...
nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS[8],...
nla_len 8, nla_type OVS_ACTION_ATTR_OUTPUT[1], data: 02 00 00 00
- Dumping OVS_PACKET_ATR_PACKET data:
###[ Ethernet ]###
dst = 00:00:00:00:ec:01
src = 04:f4:bc:28:57:00
type = IPv4
###[ IP ]###
version = 4
ihl = 5
tos = 0x0
len = 50
id = 0
flags =
frag = 0
ttl = 127
proto = icmp
chksum = 0x2767
src = 10.0.0.1
dst = 10.0.0.100
\options \
###[ ICMP ]###
type = echo-request
code = 0
chksum = 0xf7f3
id = 0x0
seq = 0xc
Acked-by: Adrian Moreno
Signed-off-by: Eelco Chaudron
Signed-off-by: Ilya Maximets
---
Documentation/topics/usdt-probes.rst | 1 +
utilities/automake.mk | 3 +
.../usdt-scripts/dpif_nl_exec_monitor.py | 662 ++++++++++++++++++
3 files changed, 666 insertions(+)
create mode 100755 utilities/usdt-scripts/dpif_nl_exec_monitor.py
diff --git a/Documentation/topics/usdt-probes.rst b/Documentation/topics/usdt-probes.rst
index 7ce19aaedea..004817b1c54 100644
--- a/Documentation/topics/usdt-probes.rst
+++ b/Documentation/topics/usdt-probes.rst
@@ -254,6 +254,7 @@ DPIF_OP_FLOW_EXECUTE operation as part of the dpif ``operate()`` callback.
**Script references**:
+- ``utilities/usdt-scripts/dpif_nl_exec_monitor.py``
- ``utilities/usdt-scripts/upcall_cost.py``
diff --git a/utilities/automake.mk b/utilities/automake.mk
index 132a16942e8..b020511c61c 100644
--- a/utilities/automake.mk
+++ b/utilities/automake.mk
@@ -22,6 +22,7 @@ scripts_SCRIPTS += \
scripts_DATA += utilities/ovs-lib
usdt_SCRIPTS += \
utilities/usdt-scripts/bridge_loop.bt \
+ utilities/usdt-scripts/dpif_nl_exec_monitor.py \
utilities/usdt-scripts/upcall_cost.py \
utilities/usdt-scripts/upcall_monitor.py
@@ -67,6 +68,7 @@ EXTRA_DIST += \
utilities/docker/debian/Dockerfile \
utilities/docker/debian/build-kernel-modules.sh \
utilities/usdt-scripts/bridge_loop.bt \
+ utilities/usdt-scripts/dpif_nl_exec_monitor.py \
utilities/usdt-scripts/upcall_cost.py \
utilities/usdt-scripts/upcall_monitor.py
MAN_ROOTS += \
@@ -137,6 +139,7 @@ FLAKE8_PYFILES += utilities/ovs-pcap.in \
utilities/ovs-check-dead-ifs.in \
utilities/ovs-tcpdump.in \
utilities/ovs-pipegen.py \
+ utilities/usdt-scripts/dpif_nl_exec_monitor.py \
utilities/usdt-scripts/upcall_monitor.py \
utilities/usdt-scripts/upcall_cost.py
diff --git a/utilities/usdt-scripts/dpif_nl_exec_monitor.py b/utilities/usdt-scripts/dpif_nl_exec_monitor.py
new file mode 100755
index 00000000000..0a9ff812334
--- /dev/null
+++ b/utilities/usdt-scripts/dpif_nl_exec_monitor.py
@@ -0,0 +1,662 @@
+#!/usr/bin/env python3
+#
+# Copyright (c) 2022 Red Hat, Inc.
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at:
+#
+# http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+# Script information:
+# -------------------
+# dpif_nl_exec_monitor.py uses the dpif_netlink_operate__:op_flow_execute USDT
+# probe to receive all DPIF_OP_EXECUTE operations that are queued for
+# transmission over the netlink socket. It will do some basic decoding, and if
+# requested a packet dump.
+#
+# Here is an example:
+#
+# # ./dpif_nl_exec_monitor.py --packet-decode decode
+# Display DPIF_OP_EXECUTE operations being queued for transmission...
+# TIME CPU COMM PID NL_SIZE
+# 3124.516679897 1 ovs-vswitchd 8219 180
+# nlmsghdr : len = 0, type = 36, flags = 1, seq = 0, pid = 0
+# genlmsghdr: cmd = 3, version = 1, reserver = 0
+# ovs_header: dp_ifindex = 21
+# > Decode OVS_PACKET_ATTR_* TLVs:
+# nla_len 46, nla_type OVS_PACKET_ATTR_PACKET[1], data: 00 00 00...
+# nla_len 20, nla_type OVS_PACKET_ATTR_KEY[2], data: 08 00 02 00...
+# > Decode OVS_KEY_ATTR_* TLVs:
+# nla_len 8, nla_type OVS_KEY_ATTR_PRIORITY[2], data: 00 00...
+# nla_len 8, nla_type OVS_KEY_ATTR_SKB_MARK[15], data: 00 00...
+# nla_len 88, nla_type OVS_PACKET_ATTR_ACTIONS[3], data: 4c 00 03...
+# > Decode OVS_ACTION_ATTR_* TLVs:
+# nla_len 76, nla_type OVS_ACTION_ATTR_SET[3], data: 48 00...
+# > Decode OVS_TUNNEL_KEY_ATTR_* TLVs:
+# nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_ID[0], data:...
+# nla_len 20, nla_type OVS_TUNNEL_KEY_ATTR_IPV6_DST[13], ...
+# nla_len 5, nla_type OVS_TUNNEL_KEY_ATTR_TTL[4], data: 40
+# nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT...
+# nla_len 4, nla_type OVS_TUNNEL_KEY_ATTR_CSUM[6], data:
+# nla_len 6, nla_type OVS_TUNNEL_KEY_ATTR_TP_DST[10],...
+# nla_len 12, nla_type OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS...
+# nla_len 8, nla_type OVS_ACTION_ATTR_OUTPUT[1], data: 02 00 00 00
+# - Dumping OVS_PACKET_ATR_PACKET data:
+# ###[ Ethernet ]###
+# dst = 00:00:00:00:ec:01
+# src = 04:f4:bc:28:57:00
+# type = IPv4
+# ###[ IP ]###
+# version = 4
+# ihl = 5
+# tos = 0x0
+# len = 50
+# id = 0
+# flags =
+# frag = 0
+# ttl = 127
+# proto = icmp
+# chksum = 0x2767
+# src = 10.0.0.1
+# dst = 10.0.0.100
+# \options \
+# ###[ ICMP ]###
+# type = echo-request
+# code = 0
+# chksum = 0xf7f3
+# id = 0x0
+# seq = 0xc
+#
+# The example above dumps the full netlink and packet decode. However options
+# exist to disable this. Here is the full list of supported options:
+#
+# usage: dpif_nl_exec_monitor.py [-h] [--buffer-page-count NUMBER] [-D [DEBUG]]
+# [-d {none,hex,decode}] [-n {none,hex,nlraw}]
+# [-p VSWITCHD_PID] [-s [64-2048]]
+# [-w PCAP_FILE]
+#
+# optional arguments:
+# -h, --help show this help message and exit
+# --buffer-page-count NUMBER
+# Number of BPF ring buffer pages, default 1024
+# -D [DEBUG], --debug [DEBUG]
+# Enable eBPF debugging
+# -d {none,hex,decode}, --packet-decode {none,hex,decode}
+# Display packet content in selected mode, default none
+# -n {none,hex,nlraw}, --nlmsg-decode {none,hex,nlraw}
+# Display netlink message content in selected mode,
+# default nlraw
+# -p VSWITCHD_PID, --pid VSWITCHD_PID
+# ovs-vswitch's PID
+# -s [64-2048], --nlmsg-size [64-2048]
+# Set maximum netlink message size to capture, default
+# 512
+# -w PCAP_FILE, --pcap PCAP_FILE
+# Write upcall packets to specified pcap file
+
+from bcc import BPF, USDT, USDTException
+from os.path import exists
+from scapy.all import hexdump, wrpcap
+from scapy.layers.l2 import Ether
+
+import argparse
+import psutil
+import re
+import struct
+import sys
+import time
+
+#
+# Actual eBPF source code
+#
+ebpf_source = """
+#include
+
+#define MAX_NLMSG
+
+struct event_t {
+ u32 cpu;
+ u32 pid;
+ u64 ts;
+ u32 nl_size;
+ char comm[TASK_COMM_LEN];
+ u8 nl_msg[MAX_NLMSG];
+};
+
+struct ofpbuf {
+ void *base;
+ void *data;
+ uint32_t size;
+
+ /* The actual structure is longer, but we are only interested in the
+ * first couple of entries. */
+};
+
+BPF_RINGBUF_OUTPUT(events, );
+BPF_TABLE("percpu_array", uint32_t, uint64_t, dropcnt, 1);
+
+int trace__op_flow_execute(struct pt_regs *ctx) {
+ struct ofpbuf nlbuf;
+ uint32_t size;
+
+ bpf_usdt_readarg_p(5, ctx, &nlbuf, sizeof(nlbuf));
+
+ struct event_t *event = events.ringbuf_reserve(sizeof(struct event_t));
+ if (!event) {
+ uint32_t type = 0;
+ uint64_t *value = dropcnt.lookup(&type);
+ if (value)
+ __sync_fetch_and_add(value, 1);
+
+ return 1;
+ }
+
+ event->ts = bpf_ktime_get_ns();
+ event->cpu = bpf_get_smp_processor_id();
+ event->pid = bpf_get_current_pid_tgid();
+ bpf_get_current_comm(&event->comm, sizeof(event->comm));
+
+ event->nl_size = nlbuf.size;
+ if (event->nl_size > MAX_NLMSG)
+ size = MAX_NLMSG;
+ else
+ size = event->nl_size;
+
+ bpf_probe_read(&event->nl_msg, size, nlbuf.data);
+
+ events.ringbuf_submit(event, 0);
+ return 0;
+};
+"""
+
+
+#
+# print_event()
+#
+def print_event(ctx, data, size):
+ event = b["events"].event(data)
+ print("{:<18.9f} {:<4} {:<16} {:<10} {:<10}".
+ format(event.ts / 1000000000,
+ event.cpu,
+ event.comm.decode("utf-8"),
+ event.pid,
+ event.nl_size))
+
+ #
+ # Dumping the netlink message data if requested.
+ #
+ if event.nl_size < options.nlmsg_size:
+ nl_size = event.nl_size
+ else:
+ nl_size = options.nlmsg_size
+
+ if options.nlmsg_decode == "hex":
+ #
+ # Abuse scapy's hex dump to dump flow key
+ #
+ print(re.sub("^", " " * 4,
+ hexdump(Ether(bytes(event.nl_msg)[:nl_size]), dump=True),
+ flags=re.MULTILINE))
+
+ if options.nlmsg_decode == "nlraw":
+ decode_result = decode_nlm(bytes(event.nl_msg)[:nl_size], dump=True)
+ else:
+ decode_result = decode_nlm(bytes(event.nl_msg)[:nl_size], dump=False)
+
+ #
+ # Decode packet only if there is data
+ #
+ if "OVS_PACKET_ATTR_PACKET" not in decode_result:
+ return
+
+ pkt_data = decode_result["OVS_PACKET_ATTR_PACKET"]
+ indent = 4 if options.nlmsg_decode != "nlraw" else 6
+
+ if options.packet_decode != "none":
+ print("{}- Dumping OVS_PACKET_ATR_PACKET data:".format(" " * indent))
+
+ if options.packet_decode == "hex":
+ print(re.sub("^", " " * indent, hexdump(pkt_data, dump=True),
+ flags=re.MULTILINE))
+
+ packet = Ether(pkt_data)
+ if options.packet_decode == "decode":
+ print(re.sub("^", " " * indent, packet.show(dump=True),
+ flags=re.MULTILINE))
+
+ if options.pcap is not None:
+ wrpcap(options.pcap, packet, append=True)
+
+
+#
+# decode_nlm_tlvs()
+#
+def decode_nlm_tlvs(tlvs, header=None, indent=4, dump=True,
+ attr_to_str_func=None, decode_tree=None):
+ bytes_left = len(tlvs)
+ result = {}
+
+ if dump:
+ print("{}{}".format(" " * indent, header))
+
+ while bytes_left:
+ if bytes_left < 4:
+ if dump:
+ print("{}WARN: decode truncated; can't read header".format(
+ " " * indent))
+ break
+
+ nla_len, nla_type = struct.unpack("=HH", tlvs[:4])
+
+ if nla_len < 4:
+ if dump:
+ print("{}WARN: decode truncated; nla_len < 4".format(
+ " " * indent))
+ break
+
+ nla_data = tlvs[4:nla_len]
+ trunc = ""
+
+ if attr_to_str_func is None:
+ nla_type_name = "type_{}".format(nla_type)
+ else:
+ nla_type_name = attr_to_str_func(nla_type)
+
+ if nla_len > bytes_left:
+ trunc = "..."
+ nla_data = nla_data[:(bytes_left - 4)]
+ else:
+ result[nla_type_name] = nla_data
+
+ if dump:
+ print("{}nla_len {}, nla_type {}[{}], data: {}{}".format(
+ " " * indent, nla_len, nla_type_name, nla_type,
+ "".join("{:02x} ".format(b) for b in nla_data), trunc))
+
+ #
+ # If we have the full data, try to decode further
+ #
+ if trunc == "" and decode_tree is not None \
+ and nla_type_name in decode_tree:
+ node = decode_tree[nla_type_name]
+ decode_nlm_tlvs(nla_data,
+ header=node["header"],
+ indent=indent + node["indent"], dump=True,
+ attr_to_str_func=node["attr_str_func"],
+ decode_tree=node["decode_tree"])
+
+ if trunc != "":
+ if dump:
+ print("{}WARN: decode truncated; nla_len > msg_len[{}] ".
+ format(" " * indent, bytes_left))
+ break
+
+ # update next offset, but make sure it's aligned correctly
+ next_offset = (nla_len + 3) & ~(3)
+ tlvs = tlvs[next_offset:]
+ bytes_left -= next_offset
+
+ return result
+
+
+#
+# decode_nlm()
+#
+def decode_nlm(msg, indent=4, dump=True):
+ result = {}
+
+ #
+ # Decode 'struct nlmsghdr'
+ #
+ if dump:
+ print("{}nlmsghdr : len = {}, type = {}, flags = {}, seq = {}, "
+ "pid = {}".format(" " * indent,
+ *struct.unpack("=IHHII", msg[:16])))
+
+ msg = msg[16:]
+
+ #
+ # Decode 'struct genlmsghdr'
+ #
+ if dump:
+ print("{}genlmsghdr: cmd = {}, version = {}, reserver = {}".format(
+ " " * indent, *struct.unpack("=BBH", msg[:4])))
+
+ msg = msg[4:]
+
+ #
+ # Decode 'struct ovs_header'
+ #
+ if dump:
+ print("{}ovs_header: dp_ifindex = {}".format(
+ " " * indent, *struct.unpack("=I", msg[:4])))
+
+ msg = msg[4:]
+
+ #
+ # Decode TLVs
+ #
+ nl_attr_tree = {
+ "OVS_PACKET_ATTR_KEY": {
+ "header": "> Decode OVS_KEY_ATTR_* TLVs:",
+ "indent": 4,
+ "attr_str_func": get_ovs_key_attr_str,
+ "decode_tree": None,
+ },
+ "OVS_PACKET_ATTR_ACTIONS": {
+ "header": "> Decode OVS_ACTION_ATTR_* TLVs:",
+ "indent": 4,
+ "attr_str_func": get_ovs_action_attr_str,
+ "decode_tree": {
+ "OVS_ACTION_ATTR_SET": {
+ "header": "> Decode OVS_KEY_ATTR_* TLVs:",
+ "indent": 4,
+ "attr_str_func": get_ovs_key_attr_str,
+ "decode_tree": {
+ "OVS_KEY_ATTR_TUNNEL": {
+ "header": "> Decode OVS_TUNNEL_KEY_ATTR_* TLVs:",
+ "indent": 4,
+ "attr_str_func": get_ovs_tunnel_key_attr_str,
+ "decode_tree": None,
+ },
+ },
+ },
+ },
+ },
+ }
+
+ result = decode_nlm_tlvs(msg, indent=indent + 2, dump=dump,
+ header="> Decode OVS_PACKET_ATTR_* TLVs:",
+ attr_to_str_func=get_ovs_pkt_attr_str,
+ decode_tree=nl_attr_tree)
+ return result
+
+
+#
+# get_ovs_pkt_attr_str()
+#
+def get_ovs_pkt_attr_str(attr):
+ ovs_pkt_attr = ["OVS_PACKET_ATTR_UNSPEC",
+ "OVS_PACKET_ATTR_PACKET",
+ "OVS_PACKET_ATTR_KEY",
+ "OVS_PACKET_ATTR_ACTIONS",
+ "OVS_PACKET_ATTR_USERDATA",
+ "OVS_PACKET_ATTR_EGRESS_TUN_KEY",
+ "OVS_PACKET_ATTR_UNUSED1",
+ "OVS_PACKET_ATTR_UNUSED2",
+ "OVS_PACKET_ATTR_PROBE",
+ "OVS_PACKET_ATTR_MRU",
+ "OVS_PACKET_ATTR_LEN",
+ "OVS_PACKET_ATTR_HASH"]
+ if attr < 0 or attr >= len(ovs_pkt_attr):
+ return "".format(attr)
+
+ return ovs_pkt_attr[attr]
+
+
+#
+# get_ovs_key_attr_str()
+#
+def get_ovs_key_attr_str(attr):
+ ovs_key_attr = ["OVS_KEY_ATTR_UNSPEC",
+ "OVS_KEY_ATTR_ENCAP",
+ "OVS_KEY_ATTR_PRIORITY",
+ "OVS_KEY_ATTR_IN_PORT",
+ "OVS_KEY_ATTR_ETHERNET",
+ "OVS_KEY_ATTR_VLAN",
+ "OVS_KEY_ATTR_ETHERTYPE",
+ "OVS_KEY_ATTR_IPV4",
+ "OVS_KEY_ATTR_IPV6",
+ "OVS_KEY_ATTR_TCP",
+ "OVS_KEY_ATTR_UDP",
+ "OVS_KEY_ATTR_ICMP",
+ "OVS_KEY_ATTR_ICMPV6",
+ "OVS_KEY_ATTR_ARP",
+ "OVS_KEY_ATTR_ND",
+ "OVS_KEY_ATTR_SKB_MARK",
+ "OVS_KEY_ATTR_TUNNEL",
+ "OVS_KEY_ATTR_SCTP",
+ "OVS_KEY_ATTR_TCP_FLAGS",
+ "OVS_KEY_ATTR_DP_HASH",
+ "OVS_KEY_ATTR_RECIRC_ID",
+ "OVS_KEY_ATTR_MPLS",
+ "OVS_KEY_ATTR_CT_STATE",
+ "OVS_KEY_ATTR_CT_ZONE",
+ "OVS_KEY_ATTR_CT_MARK",
+ "OVS_KEY_ATTR_CT_LABELS",
+ "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV4",
+ "OVS_KEY_ATTR_CT_ORIG_TUPLE_IPV6",
+ "OVS_KEY_ATTR_NSH"]
+
+ if attr < 0 or attr >= len(ovs_key_attr):
+ return "".format(attr)
+
+ return ovs_key_attr[attr]
+
+
+#
+# get_ovs_action_attr_str()
+#
+def get_ovs_action_attr_str(attr):
+ ovs_action_attr = ["OVS_ACTION_ATTR_UNSPEC",
+ "OVS_ACTION_ATTR_OUTPUT",
+ "OVS_ACTION_ATTR_USERSPACE",
+ "OVS_ACTION_ATTR_SET",
+ "OVS_ACTION_ATTR_PUSH_VLAN",
+ "OVS_ACTION_ATTR_POP_VLAN",
+ "OVS_ACTION_ATTR_SAMPLE",
+ "OVS_ACTION_ATTR_RECIRC",
+ "OVS_ACTION_ATTR_HASH",
+ "OVS_ACTION_ATTR_PUSH_MPLS",
+ "OVS_ACTION_ATTR_POP_MPLS",
+ "OVS_ACTION_ATTR_SET_MASKED",
+ "OVS_ACTION_ATTR_CT",
+ "OVS_ACTION_ATTR_TRUNC",
+ "OVS_ACTION_ATTR_PUSH_ETH",
+ "OVS_ACTION_ATTR_POP_ETH",
+ "OVS_ACTION_ATTR_CT_CLEAR",
+ "OVS_ACTION_ATTR_PUSH_NSH",
+ "OVS_ACTION_ATTR_POP_NSH",
+ "OVS_ACTION_ATTR_METER",
+ "OVS_ACTION_ATTR_CLONE",
+ "OVS_ACTION_ATTR_CHECK_PKT_LEN",
+ "OVS_ACTION_ATTR_ADD_MPLS",
+ "OVS_ACTION_ATTR_TUNNEL_PUSH",
+ "OVS_ACTION_ATTR_TUNNEL_POP",
+ "OVS_ACTION_ATTR_DROP",
+ "OVS_ACTION_ATTR_LB_OUTPUT"]
+ if attr < 0 or attr >= len(ovs_action_attr):
+ return "".format(attr)
+
+ return ovs_action_attr[attr]
+
+
+#
+# get_ovs_tunnel_key_attr_str()
+#
+def get_ovs_tunnel_key_attr_str(attr):
+ ovs_tunnel_key_attr = ["OVS_TUNNEL_KEY_ATTR_ID",
+ "OVS_TUNNEL_KEY_ATTR_IPV4_SRC",
+ "OVS_TUNNEL_KEY_ATTR_IPV4_DST",
+ "OVS_TUNNEL_KEY_ATTR_TOS",
+ "OVS_TUNNEL_KEY_ATTR_TTL",
+ "OVS_TUNNEL_KEY_ATTR_DONT_FRAGMENT",
+ "OVS_TUNNEL_KEY_ATTR_CSUM",
+ "OVS_TUNNEL_KEY_ATTR_OAM",
+ "OVS_TUNNEL_KEY_ATTR_GENEVE_OPTS",
+ "OVS_TUNNEL_KEY_ATTR_TP_SRC",
+ "OVS_TUNNEL_KEY_ATTR_TP_DST",
+ "OVS_TUNNEL_KEY_ATTR_VXLAN_OPTS",
+ "OVS_TUNNEL_KEY_ATTR_IPV6_SRC",
+ "OVS_TUNNEL_KEY_ATTR_IPV6_DST",
+ "OVS_TUNNEL_KEY_ATTR_PAD",
+ "OVS_TUNNEL_KEY_ATTR_ERSPAN_OPTS",
+ "OVS_TUNNEL_KEY_ATTR_GTPU_OPTS"]
+ if attr < 0 or attr >= len(ovs_tunnel_key_attr):
+ return "".format(attr)
+
+ return ovs_tunnel_key_attr[attr]
+
+
+#
+# buffer_size_type()
+#
+def buffer_size_type(astr, min=64, max=2048):
+ value = int(astr)
+ if min <= value <= max:
+ return value
+ else:
+ raise argparse.ArgumentTypeError(
+ "value not in range {}-{}".format(min, max))
+
+
+#
+# next_power_of_two()
+#
+def next_power_of_two(val):
+ np = 1
+ while np < val:
+ np *= 2
+ return np
+
+
+#
+# main()
+#
+def main():
+ #
+ # Don't like these globals, but ctx passing does not seem to work with the
+ # existing open_ring_buffer() API :(
+ #
+ global b
+ global options
+
+ #
+ # Argument parsing
+ #
+ parser = argparse.ArgumentParser()
+
+ parser.add_argument("--buffer-page-count",
+ help="Number of BPF ring buffer pages, default 1024",
+ type=int, default=1024, metavar="NUMBER")
+ parser.add_argument("-D", "--debug",
+ help="Enable eBPF debugging",
+ type=int, const=0x3f, default=0, nargs="?")
+ parser.add_argument("-d", "--packet-decode",
+ help="Display packet content in selected mode, "
+ "default none",
+ choices=["none", "hex", "decode"], default="none")
+ parser.add_argument("-n", "--nlmsg-decode",
+ help="Display netlink message content in selected mode"
+ ", default nlraw",
+ choices=["none", "hex", "nlraw"], default="nlraw")
+ parser.add_argument("-p", "--pid", metavar="VSWITCHD_PID",
+ help="ovs-vswitch's PID",
+ type=int, default=None)
+ parser.add_argument("-s", "--nlmsg-size",
+ help="Set maximum netlink message size to capture, "
+ "default 512", type=buffer_size_type, default=512,
+ metavar="[64-2048]")
+ parser.add_argument("-w", "--pcap", metavar="PCAP_FILE",
+ help="Write upcall packets to specified pcap file",
+ type=str, default=None)
+
+ options = parser.parse_args()
+
+ #
+ # Find the PID of the ovs-vswitchd daemon if not specified.
+ #
+ if options.pid is None:
+ for proc in psutil.process_iter():
+ if "ovs-vswitchd" in proc.name():
+ if options.pid is not None:
+ print("ERROR: Multiple ovs-vswitchd daemons running, "
+ "use the -p option!")
+ sys.exit(-1)
+
+ options.pid = proc.pid
+
+ #
+ # Error checking on input parameters
+ #
+ if options.pid is None:
+ print("ERROR: Failed to find ovs-vswitchd's PID!")
+ sys.exit(-1)
+
+ if options.pcap is not None:
+ if exists(options.pcap):
+ print("ERROR: Destination capture file \"{}\" already exists!".
+ format(options.pcap))
+ sys.exit(-1)
+
+ options.buffer_page_count = next_power_of_two(options.buffer_page_count)
+
+ #
+ # Attach the usdt probe
+ #
+ u = USDT(pid=int(options.pid))
+ try:
+ u.enable_probe(probe="dpif_netlink_operate__:op_flow_execute",
+ fn_name="trace__op_flow_execute")
+ except USDTException as e:
+ print("ERROR: {}"
+ "ovs-vswitchd!".format(
+ (re.sub("^", " " * 7, str(e), flags=re.MULTILINE)).strip().
+ replace("--with-dtrace or --enable-dtrace",
+ "--enable-usdt-probes")))
+ sys.exit(-1)
+
+ #
+ # Uncomment to see how arguments are decoded.
+ # print(u.get_text())
+ #
+
+ #
+ # Attach probe to running process
+ #
+ source = ebpf_source.replace("