Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix incorrect product statistics count #1728

Merged

Conversation

ianlin
Copy link
Member

@ianlin ianlin commented Oct 14, 2022

Changes proposed in this Pull Request:

Project post: pcTzPl-1a9-p2

This is a follow up for #1713, which is part of the project #1610.

The number returned by product statistics API are not aligned with the syncable products count because it includes the variation products. There are two parts when getting product statistics: counting synced products and not_synced products, we need to group the variations as one for the same parent for both parts.

1. Group variation products when counting synced products.

The solution is inspired by @mikkamp's comments in the PR 1713 by using two existing data $this->product_data_lookup and $this->product_statuses from MerchantStatuses.php, the format of both data are:

product_statuses

{
  "products": {
    "602": {
      "disapproved": 1
    },
    "603": {
      "approved": 1
    }
    "604": {
      "disapproved": 1
    }
  },
  "parents": {
    "589": {
      "approved": 1,
      "disapproved": 1
    }
  }
}

product_data_lookup

{
  "602": {
    "name": "Name A",
    "visibility": [],
    "parent_id": 589,
  },
  "603": {
    "name": "Name B",
    "visibility": [],
    "parent_id": 589,
  },
  "604": {
    "name": "Name C",
    "visibility": [],
    "parent_id": 0,
  },
}

The modified code loop through product_statuses and check product_data_lookup to see if a product has a parent. From the above data we can get two synced products: 589 and 604, where 589 is a variable product that has two variations 602 and 603.

@mikkamp also suggested in pcTzPl-I1-p2#comment-1839 that we should consider a case where one variation is approved and the other is disapproved. So we add a new data product_statistics_priority to assign each status a priority (in this case disapproved has higher priority) then we can get the correct status for each product. The new data's format looks like this:

$product_statistics_priority = [
  MCStatus::APPROVED           => 6,
  MCStatus::PARTIALLY_APPROVED => 5,
  MCStatus::EXPIRING           => 4,
  MCStatus::PENDING            => 3,
  MCStatus::DISAPPROVED        => 2,
  MCStatus::NOT_SYNCED         => 1,
];

2. Get syncable but not synced products

  • GMC status: product statuses fetched from GMC and stored in wp_postmeta with meta key _wc_gla_mc_status and has statues:
    • approved
    • partially_approved
    • expiring
    • pending
    • disapproved
    • not_synced
  • Sync status: product sync statuses stored in wp_postmeta with meta key _wc_gla_sync_status and has statuses:
    • synced
    • not-synced
    • has-errors
    • pending

The original implementation to get the not_synced products is: finding all the simple and variable products where their GMC status is not_synced. While it already excludes the variation products but the result would contains unsyncable products. Our goal is to get the syncable but not synced products, to achieve this we could simply get all the simple and variable products where their SyncStatus is not synced. The reason why we can use SyncStatus is, only the syncable products would have a status recorded in SyncStatus.

3. Fix a problem that "trashing" a product would never really trigger delete batch requests

In DeleteProducts.php, the format of $product_id_map is:

[
  "online:en:TW:gla_1234" => 1234,
  "online:en:TW:gla_5678" => 5678,
]

However in ProductRepository.php the method find_delete_product_ids() accepts an array of WC product IDs. But even though we use array_values on $product_id_map to pass the array of IDs, the product is already in trash status so WC_Product_Query still couldn't get the product unless we set the args 'status' => 'trash' in the query.

Detailed test instructions:

Use an existing Merchant Centre account

  1. Disconnect all accounts from GLA settings.
  2. Clear SyncStatus and MCStatus from the DB.
    DELETE FROM wp_postmeta WHERE meta_key = '_wc_gla_sync_status';
    DELETE FROM wp_postmeta WHERE meta_key = '_wc_gla_mc_status';
  3. Go through the onboarding process. In the step 1 Set up your accounts, choose an existing Merchant Centre account.
  4. In the step 4 Complete your campaign you will see Your product listings are ready to be uploaded N products.
  5. Complete the onboarding by clicking Skip this step for now, as we don't need the Ads account for the test.
  6. Check if the products sync jobs is completed from /wp-admin/admin.php?page=wc-status&tab=action-scheduler&s=update_all_products&action=-1&action2=-1&orderby=schedule&order=desc&paged=1
  7. When the product sync job is done, call GET /mc/product-statistics/refresh or go to GLA product feed page, the total number should be identical to the N products in step 4.

Create a new Merchant Centre account

Follow the above steps but in step 3 create a new Merchant Centre account.

Add, update, or delete a product

  • Adding, updating, or deleting a product, the number returned by GET /mc/product-statistics/refresh should be changed accordingly.

Changelog entry

Update - Fix incorrect product statistics count

@ianlin ianlin self-assigned this Oct 14, 2022
@github-actions github-actions bot added the changelog: update Big changes to something that wasn't broken. label Oct 14, 2022
@ianlin ianlin marked this pull request as draft October 14, 2022 06:23
Base automatically changed from feature/1610-streamlined-onboarding to develop October 18, 2022 10:00
There are two problems:
1. `$product_id_map` is an array map with the format like
	 `[ 'online:en:TW:gla_5981' => 5981 ]` while the method
	 `find_delete_product_ids` accepts an array of product IDs.
2. Even though we fix the above issue by using `array_values(
	 $product_id_map)`, WCProductQuery couldn't get the product as well
	 because the product is already trashed.
@ianlin ianlin changed the title Group variations when counting product statistics Fix incorrect product statistics count Nov 15, 2022
@ianlin ianlin marked this pull request as ready for review November 16, 2022 12:10
@ianlin ianlin requested a review from a team November 16, 2022 12:11
Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some comments as I went through the code. When testing I was unable to get the two amounts to match up (I've got quite the mix of products).

For the syncable product count I get back:

{
	"count": 301
}

And for the product statistics I get:

{
	"timestamp": 1668615754,
	"statistics": {
		"active": 0,
		"expiring": 0,
		"pending": 0,
		"disapproved": 205,
		"not_synced": 111
	},
	"scheduled_sync": 0
}

I'll take another look tomorrow to see if I can pull up some exact product ID's and see why they aren't matching up.

*
* @return WP_Post
*/
public function get_wc_product_by_wp_post( int $product_id ): WP_Post {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After we call this function we double check to make sure the product has a value (is not null), but the return type of this function does not allow null. Do we need a nullable return type?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated in 6c84398.


// Exclude any ID's which are not ready to delete.
$product_id_map = array_intersect( $product_id_map, $ready_ids );

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

By removing these lines we are reverting the PR #1238
I can see the flaws in the filter, but I don't think we should revert it completely otherwise we'll reintroduce the endless retry upon failure. I can see the logic of adding the status equals trash, but if we look at the sync hooks. It's hooked into both when a product is trashed and when it's deleted completely. In the latter case it won't be able to query the product data at all. Which means:

  • we should still try the initial delete_by_batch_request
  • we should not retry a product that is not available in the database

In that case maybe we should move the filtering from process_items to the code block where the retries are scheduled. If we filter the ID map at that point then we can remove any no longer existing products and ones that retried too many times.

Although I would move this to a separate issue/PR.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, my brain wasn't working properly yesterday, I actually forgot to add back the deleted code so even though I added "status equals trash" logic in find_delete_product_ids method from ProductRepository.php, there is no one calling this method. So if I added it back like below It would try to find the product with status trash then exclude the ones which are not ready to delete. Note that array_intersect( $product_id_map, $ready_ids ); can work as array_intersect is working through the array values.

--- a/src/Jobs/DeleteProducts.php
+++ b/src/Jobs/DeleteProducts.php
@@ -37,6 +37,12 @@ class DeleteProducts extends AbstractProductSyncerJob implements StartOnHookInte
   * @throws ProductSyncerException If an error occurs. The exception will be logged by ActionScheduler.
   */
  public function process_items( array $product_id_map ) {
+   $product_ids = array_values( $product_id_map );
+   $ready_ids   = $this->product_repository->find_delete_product_ids( $product_ids );
+
+   // Exclude any ID's which are not ready to delete.
+   $product_id_map = array_intersect( $product_id_map, $ready_ids );
+
    $product_entries = BatchProductIDRequestEntry::create_from_id_map( new ProductIDMap( $product_id_map ) );
    $this->product_syncer->delete_by_batch_requests( $product_entries );
  }

but if we look at the sync hooks. It's hooked into both when a product is trashed and when it's deleted completely. In the latter case it won't be able to query the product data at all.

Correct me if I'm wrong, from the WooCommerce admin products page I can only delete a trash product completely. Which means based on the above code, we would call delete_by_batch_request when a product status is becoming trash, and we would not call delete_by_batch_request again when the product is deleted completely. Is it correct?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A product can still be force deleted (without going to trash) either through WP-CLI, API or making use of a plugin.
In that case we have the ID mapping scheduled for removal, but we can't load any product details when the scheduled task actually starts running. So I imagine in that case the filter wouldn't work and the product would get excluded from the batch.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, I think in this case it would be better to move this into different issue/PR so we don't block this PR. Do you think I should revert the commits 3f7a2cc and 7dfd462 then create an issue, or adding a new commit including the change in my previous comment and create an issue?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think fixing it temporarily by changing the status to trash (as long as we aren't reverting previous behaviour) is a good option. And then create a separate followup issue for filtering later, since that's covering behaviour that was already broken.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The follow up issue is created in #1785.

@mikkamp
Copy link
Contributor

mikkamp commented Nov 17, 2022

I'll take another look tomorrow to see if I can pull up some exact product ID's and see why they aren't matching up.

So I collected the arrays of product ID's and did some comparisons. For counting syncable products I should have had 300 instead of 301. The extra one was because I have a variable subscription product, but the Subscription plugin was deactivated, in which case it detects the variations as a regular variable product. (This might be acceptable to leave as an edge case)

For the counts I still had 16 products too many in the unsynced count, it seems this was from products that were synced once upon a time but the channel visibility was later toggled to "do not sync and show". Modifying the meta query for unsynced products seems to resolve that:

			'meta_query' => [
				[
					'key'     => ProductMetaHandler::KEY_SYNC_STATUS,
					'compare' => '!=',
					'value'   => SyncStatus::SYNCED,
				],
				[
					'key'     => ProductMetaHandler::KEY_VISIBILITY,
					'compare' => '=',
					'value'   => ChannelVisibility::SYNC_AND_SHOW,
				],
			],

That gets me closer but I'm still off by one product. Which is a product that previously synced but I set the catalog visibility to hidden, which means it becomes unsyncable. Not sure what the best way to resolve that is. Should either the sync status or the channel visibility be reset in such a case?

@ianlin
Copy link
Member Author

ianlin commented Nov 17, 2022

Thanks @mikkamp, those are very valuable results. I will update the meta query for unsynced products.

Which is a product that previously synced but I set the catalog visibility to hidden, which means it becomes unsyncable. Not sure what the best way to resolve that is. Should either the sync status or the channel visibility be reset in such a case?

When setting a product's catalog visibility to hidden, the channel visibility on product UI display Don't sync and show, while the product's meta value _wc_gla_visibility in the database is still sync-and-show. I think it would be more consistent to update the channel visibility meta to dont-sync-and-show as well in such case, what do you think?

@mikkamp
Copy link
Contributor

mikkamp commented Nov 17, 2022

When setting a product's catalog visibility to hidden, the channel visibility on product UI display Don't sync and show, while the product's meta value _wc_gla_visibility in the database is still sync-and-show. I think it would be more consistent to update the channel visibility meta to dont-sync-and-show as well in such case, what do you think?

If I recall correctly there was a reason why we didn't update the visibility. In PR #880 is where it was implemented not to sync products hidden from the catalog. This PR also mentions products out of stock, I don't think I had any test products for this, but the same thing would probably apply. So we might want to test how that scenario works, what we don't want is the following scenario:

  1. Product is purchased and goes out of stock
  2. Stock update triggers removal of synced product from Merchant Center
  3. Since it is no longer syncable we set the channel visibility to do-not-sync-and-show
  4. Stock is added (often done through external system) but channel visibility is not restored (so it requires a manual action to restore)

Step number 3 is not being done right now, so we'll have to be careful we don't introduce unwanted side effects. Maybe it's those cases that we want to have an extra sync status available like TEMPORARILY_UNSYNCED, or just delete the sync status (which throws it in the group of unsyncable products).

@ianlin
Copy link
Member Author

ianlin commented Nov 18, 2022

Thanks for the detailed explanation on the history of why we made it. I think removing the sync status for such case (catalog visibility becomes hidden) so it falls into the group of unsyncable products is a good idea, as the product becomes unsyncable for sure.

I'm looking for where is the best place to remove the sync status, at first I found

} elseif ( $this->product_helper->is_product_synced( $product ) ) {
// Delete the product from Google Merchant Center if it's already synced BUT it is not sync ready after the edit.
$products_to_delete[] = $product;

Setting catalog visibility to hidden the synced product becomes unsyncable (not sync ready) so the code above will trigger delete products job to delete a product from GMC. However adding the code to remove sync status here would be overridden by the code in the ProductSyncer.php:

array_walk( $deleted_products, [ $this->batch_helper, 'mark_as_unsynced' ] );

There are some options we could do:

  1. In ProductSyncer.php's delete_by_batch_requests() method, remove sync status if the product's catalog_visibility is hidden (or maybe checking product->is_visible() is false).
    • However this could leads to some performance problems as we only have product ID in method delete_by_batch_requests(), we need to get all the product objects from the array.
  2. Not sure if it makes sense, another thought I have is to just remove the sync status in delete_by_batch_requests() method as when the product is trashed or deleted it becomes unsyncable.

@mikkamp
Copy link
Contributor

mikkamp commented Nov 18, 2022

I don't think just checking $product->is_visible() is enough. What I meant to imply with the going out of stock scenario is that there are many reasons why a product might become unsyncable after it was previously syncable. So I think the best source of truth is to call $product_helper->is_sync_ready( $product ).

batch_helper->mark_as_unsynced already loads the product object so we won't be using extra resources there. Maybe we should pass a parameter to mark_as_unsynced which if true will check is_sync_ready and if it's not will remove sync status. Something like:

if ( $remove_sync_status && $this->is_sync_ready( $product ) {
	$this->meta_handler->delete_sync_status( $product );
} else {
	$this->meta_handler->update_sync_status( $product, SyncStatus::NOT_SYNCED );
}

We might not even need the $remove_sync_status variable (would need to check what effect that would have), but I think it could also be safer to only call that after delete_by_batch to limit the scope of this change.

@ianlin
Copy link
Member Author

ianlin commented Nov 21, 2022

Hello @mikkamp, I've added three commits 40f189a, 41c52e0, and 39e3190 to address the problems you've mentioned above so it's ready for another round of review.

Copy link
Contributor

@mikkamp mikkamp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the additional changes. I rechecked the calculations and I'm now getting matching numbers.

I just found one additional problem with it querying non-trash products. I added a suggestion to fix that and we can continue adding some tests for it in the followup PR.

This PR looks good to go then, so I'll approve it.

@@ -163,7 +164,8 @@ public function find_sync_ready_products( array $args = [], int $limit = - 1, in
* @return array
*/
public function find_delete_product_ids( array $ids, int $limit = - 1, int $offset = 0 ): array {
$results = $this->find_by_ids( $ids, $limit, $offset );
$args = [ 'status' => 'trash' ];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is now working for items that need to be deleted because they are in the trash. But for items that get deleted from Merchant Center for other reasons are not found/updated. For example I set a product to catalog visibility = hidden, which will trigger it to be deleted from the Merchant Center, but it's status remains "published".

Without any status it was previously querying the default WP statuses so it seems we should include trash with the default array.

Suggested change
$args = [ 'status' => 'trash' ];
$args = [ 'status' => [ 'draft', 'pending', 'private', 'publish', 'trash' ] ];

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for catching this, I've updated the issue #1785 so it will be handled in the next PR. Merging this PR first.

@ianlin ianlin merged commit 1312ce1 into develop Nov 22, 2022
@ianlin ianlin deleted the update/1610-group-variations-when-counting-product-statistics branch November 22, 2022 10:07
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
changelog: update Big changes to something that wasn't broken.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants