-
Notifications
You must be signed in to change notification settings - Fork 21
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Free Listings + Paid Ads: Group variations type for syncable products #1713
Free Listings + Paid Ads: Group variations type for syncable products #1713
Conversation
- Find and return an array of WooCommerce product IDs that are syncable but marked as MC not_synced. - Excludes: - variable products without variations, and - variable products with all of their variations are not syncable.
src/Product/ProductRepository.php
Outdated
// Get syncable products, and swap variation products for its parent. | ||
$args = [ | ||
'status' => [ 'publish' ], | ||
'type' => array_diff( ProductSyncer::get_supported_product_types(), [ 'variable' ] ), | ||
'meta_query' => $this->get_sync_ready_products_meta_query(), | ||
]; | ||
|
||
$syncable_product_ids = $this->product_helper->maybe_swap_for_parent_ids( | ||
array_filter( | ||
$this->find_ids( $args, $limit, $offset ), | ||
function ( $product_id ) { | ||
$product = $this->product_helper->get_wc_product( $product_id ); | ||
return $this->product_helper->is_sync_ready( $product ); | ||
} | ||
) | ||
); | ||
|
||
// Get the product ids which their MC status is 'not synced' from the syncable products. | ||
$find_by_ids_args = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This part is being refactored by the next commit eb9f841 so it can be ignored. Keeping it in the commit log in case anyone is interested in seeing the difference between two approaches.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the changes. For the first part of the PR (combining variations) I think we are good. However as you can see in the comments I'm a bit worried about the way the rest of the calculations are done.
I wonder if we can preserve the syncable product counts (possibly array of ID's) a little longer. Take that count and subtract the products that have been successfully synced, to come up with the number of not_synced
products. Although it might be hard to keep that number up to date if products are added/removed later on.
My other thoughts would be to have a DB lookup table that stores product ID's and it's statuses + relation to parent. That would allow direct results, but would need quite a substantial refactor.
$this->options->update( OptionsInterface::SYNCABLE_PRODUCTS_COUNT_INTERMEDIATE_DATA, [ ...$product_ids, ...$items ] ); | ||
$grouped_items = $this->product_helper->maybe_swap_for_parent_ids( $items ); | ||
|
||
$this->options->update( OptionsInterface::SYNCABLE_PRODUCTS_COUNT_INTERMEDIATE_DATA, [ ...$product_ids, ...$grouped_items ] ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not already store unique ID's here, will save us having to do it later when counting.
$args['type'] = array_diff( ProductSyncer::get_supported_product_types(), [ 'variable' ] ); | ||
$filtered_product_ids = array_flip( $product_repository->find_synced_product_ids( $args ) ); | ||
$filtered_product_ids = array_flip( | ||
$product_helper->maybe_swap_for_parent_ids( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We've been working with an array of product ID's, but this call will load the product object for each ID to check if it should get the parent ID. Since caching is in place that can cause a lot of resource usage if we have a large amount of synced products. Have we tested whether this will hit any resource limits?
For aggregating by variable product we previously used product_data_lookup
which would store the relation between a product ID and parent ID. This array would also get populated using get_post
calls which doesn't load all the product data, and consumed less memory. Is that a pattern we should stick to? There might be some more history in previous PR's why we chose that implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For aggregating by variable product we previously used product_data_lookup which would store the relation between a product ID and parent ID.
Thanks for pointing this out, I just did a quick check and looks like I can get the correct synced count from product_data_lookup
array. For the synced count I'd lean towards to this solution.
src/Product/ProductRepository.php
Outdated
$args = [ | ||
'status' => 'publish', | ||
'type' => $types, | ||
// Get simple and variable products that are marked as MC 'not synced'. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have similar thoughts about this section. While switching to variable products can speed it up / limit resources. If we have a site with 10k simple products then this will still be an issue.
src/Product/ProductRepository.php
Outdated
} | ||
|
||
// Check if a simple product is syncable. | ||
return $this->product_helper->is_sync_ready( $product ); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_sync_ready
check here and for variable products will exclude some products but we have built in compatibility for bundles which is attached to woocommerce_gla_get_sync_ready_products_pre_filter
. That means they'll be excluded from the syncable products, but not excluded here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing this out, this is something I've missed.
@mikkamp Thanks for the review. Agree that it'd be better to pull the part of counting statistics out of the PR to avoid blocking the release. I've reverted three commits and also updated the PR description. A P2 post about solutions of counting statistics will be created soon.
I had thought of the same way but it's indeed not easy to keep the array of IDs up to date.
I'd prefer using this solution as the query should be pretty fast. I will list it down in the P2 post. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for simplifying the PR. The changes look good, I just noticed we might want to be more careful in case a product gets deleted during the counting process, so we don't get unexpected gaps in the count.
Are we also going to add a changelog entry, or are we content with it using the PR title?
src/Product/ProductHelper.php
Outdated
* @throws InvalidValue If a given ID doesn't reference a valid product. Or if a variation product does not have a | ||
* valid parent ID (i.e. it's an orphan). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was confusing at first. It seems though that an orphan will get excluded when creating the batches of syncable products. So it shouldn't encounter any orphans here. However there is still a small chance that a parent or even just a regular product gets deleted after the batches get created (should that be covered by a unit test).
If that happens then wouldn't we just want to exclude those from the count, instead of making the whole batch fail? Which won't show as an error but instead excludes a large amount of products from the count.
@mikkamp Thanks for pointing out this case, I have added new commits for solving this. |
Yeah, we will be using the PR title for the changelog entry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the extra changes, I'm getting the results as expected. I just added a minor comment, but it looks ready to go.
* | ||
* @since x.x.x | ||
*/ | ||
public function maybe_swap_for_parent_ids( array $product_ids, bool $check_product_status = true, bool $ignore_product_on_error = true ) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess this is needed for a followup PR, but just curious why it needs options to change the behaviour.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My initial though was this function might be used in other cases in the future, so options could be flexible. We can remove them if there are not much uses tho.
@@ -189,4 +267,83 @@ public function test_update_syncable_products_count_no_syncable_products_multipl | |||
do_action( self::CREATE_BATCH_HOOK, 2 ); | |||
do_action( self::CREATE_BATCH_HOOK, 3 ); | |||
} | |||
|
|||
public function test_update_syncable_products_count_product_gets_deleted_after_batches_get_created() { | |||
// This is a product in the first batch that will be deteled after the first batch gets created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a small typo.
// This is a product in the first batch that will be deteled after the first batch gets created. | |
// This is a product in the first batch that will be deleted after the first batch gets created. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated in 3d64f9d.
Changes proposed in this Pull Request:
This is part of the backend task 1 📌 Syncable Products Count from the epic issue #1610.
Updates on 6th Oct 2022
I reverted the commits for calculating synced and non synced products of product statistics API (the second part of change listed below), as there still be some performance problems that need to be fixed. For now this PR only solves the syncable products count problem so it's not blocking the release of the project. A P2 post will be created for the discussion about the solution of performance problems.
Original post
This PR has two changes:
GET mc/syncable-product-count
includes variation products, so the total number will be a lot higher than the total products count in WC. This PR groups the variations as one if they have the same parent so the count becomes more reasonable.GET mc/product-statistics
is higher than the syncable product counts, confusion may be raised since they are inconsistent. The reason is the same as above, this PR groups the variations as one if they have the same parents when counting product statistics.The logic of product statistics API contains two parts: counting synced products and counting not synced products. In this PR, counting
synced
products uses the similar approach like counting syncable products by getting a list ofsimple
andvariation
products that marked as MCsynced
first, and group the variation products as one if they have the same parent.However, counting
not synced
products uses a different approach by getting a list ofsimple
andvariable
products that marked as MCnot synced
first, and check the variable products' children (variation products) if they are syncable, finally group the variation with the same parent and ignore the variable products with no variations. The reason it uses a different approach is it will process a lot faster than using the approach above. If using the above approach we will get a list ofsimple
andvariation
that are marked as MCnot synced
, in my test environment it's more than 3000 items and looping through these items to check if the product is syncable takes a lot of time than looping through a list ofsimple
andvariable
products, which in my local environment it's about 400 products.Detailed test instructions:
Note: use
GET mc/product-statistics/refresh
to refresh product statistics, instead of getting the statuses from the transient.Fully onboarded and products are already synced
POST mc/syncable-products-count
.GET mc/syncable-products-count
.GET mc/product-statistics/refresh
.New merchant center account that products are not synced yet
Start listing products →
.POST mc/syncable-products-count
.You may notice that the total number of statistics is sometimes less than syncable products count. This is because when just onboarded all the products are being marked as MC statuses like
not synced
orpending
by the background product sync job. When getting product statistics the product MC statuses may still being written to the DB, so we couldn't get the exact number for it. Try fetching product statistics after a couple of seconds or minutes and the number will become accurate.Changelog entry