Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG]: Discrepancy Between Total Items in Paginator and Direct Query Using Phalcon's QueryBuilder #16581

Open
indigo7333 opened this issue May 9, 2024 · 8 comments
Labels
bug A bug report status: unverified Unverified

Comments

@indigo7333
Copy link

When using Phalcon's modelsManager to construct complex queries involving multiple table joins and conditional filters, there appears to be a mismatch between the total number of items reported by directly querying the database versus using Phalcon's paginator. This discrepancy could potentially affect data presentation and user navigation in applications relying on accurate pagination.

use Phalcon\Mvc\Model\Query\Builder as QueryBuilder;

// Setup the Query Builder
$builder = $this->modelsManager->createBuilder()
->columns('DISTINCT MainTable.id')
->from(['MainTable' => 'MainEntity'])
->where("MainTable.owner_id = :owner_id: AND is_active = 1", ['owner_id' => $owner_id])
->innerJoin('RelatedEntity1', 'MainTable.owner_id = RelatedEntity1.id')
->innerJoin('RelatedEntity2', 'RelatedEntity2.id = MainTable.related_id')
->leftJoin('OptionalEntity1', 'OptionalEntity1.related_id = RelatedEntity2.id')
->leftJoin('OptionalEntity2', 'OptionalEntity2.related_id = RelatedEntity2.id')
->leftJoin('OptionalEntity3', 'OptionalEntity3.main_id = MainTable.id')
->orderBy('MainTable.id DESC');

// Apply filter if present
$filter = $this->request->get('search', 'string');
if (!empty($filter)) {
$conditions = "MainTable.id = :filterId: OR OptionalEntity3.custom_field LIKE :filterDomain: OR OptionalEntity1.name LIKE :filterName:";
$builder->andWhere($conditions, [
'filterId' => $filter,
'filterDomain' => '%' . $filter . '%',
'filterName' => '%' . $filter . '%',
]);
}

// Initialize Paginator
$paginator = new QueryBuilder([
'builder' => $builder,
'limit' => 13,
'page' => $currentPage
]);

// Direct total count
$totalItems = $builder->getQuery()->execute()->count();

// Paginator total count
$page = $paginator->paginate();
$paginatorTotalItems = $page->getTotalItems();

// Output for debugging
echo "Direct count: $totalItems
";
echo "Paginator count: $paginatorTotalItems
";

Details
Version: 5.6.1-2+020240213.7+debian101.gbp9d5d41
php 8.1

@indigo7333 indigo7333 added bug A bug report status: unverified Unverified labels May 9, 2024
@indigo7333
Copy link
Author

The problem appears to be with the distinct(true) . with distinct(false) the are no issues with pagination numbers.

@noone-silent
Copy link
Contributor

Could you post both SQL Statements here? The one from the builder and the one you wrote?

@s-ohnishi
Copy link

@indigo7333 The reason you're not getting the results you expect is probably because you're using DISTINCT.
Looking at the source (QueryBuilder.zep), it seems that the difference in judgment is whether you have GROUP BY and HAVING, and if you do not have either, it does not matter whether the query contains "DISTINCT" Instead, "COUNT(*)" will be applied.

I don't really understand whether this judgment is appropriate.
I don't understand the logic exactly, but it may be possible to avoid this by using GROUP BY which generally has the same effect as DISTINCT.
__SELECT DISTINCT MainTable.id ~~~
__SELECT MainTable.id ~~~ GROUP BY MainTable.id

@indigo7333
Copy link
Author

@s-ohnishi @noone-silent yes, the problem appears when using DISINCT, but that is not correct. It creates a lot of confusion if the pagination doesn't work in that case. I had to come up with a workaround. DISTINCT must be used and not COUNT(*) if distinct parameter is provided.

@noone-silent
Copy link
Contributor

@s-ohnishi DISTINCT and GROUP BY are not the same. DISTINCT just removes duplicates, while GROUP BY allows you to use aggregate functions like SUM, that have a completely different outcome.

I agree with @indigo7333 that the query builder should not just blindly replace all fields with COUNT(*). For example, there are a lot of different possibilities on a SELECT query: https://dev.mysql.com/doc/refman/8.0/en/select.html

Otherwise it should clearly state this in the documentation OR even better, throw an exception if one of this fields is used and the Builder does not support it.

SELECT
    [ALL | DISTINCT | DISTINCTROW ]
    [HIGH_PRIORITY]
    [STRAIGHT_JOIN]
    [SQL_SMALL_RESULT] [SQL_BIG_RESULT] [SQL_BUFFER_RESULT]
    [SQL_NO_CACHE] [SQL_CALC_FOUND_ROWS]
    select_expr [, select_expr] ...

And this is MySQL only.

@s-ohnishi
Copy link

@noone-silent I am aware that DISTINCT and GROUP BY are different things from the original purpose. However, in this case, we want to avoid duplicate rows by using JOIN, so we have shown a workaround for that. This is not a suggestion for "general" use (though I have not tested it to confirm).

If you think about it in "general" terms, I have been taught that instead of using DISTINCT to suppress the duplicate rows that have grown up with JOIN, it is better to use EXISTS to extract only the necessary rows.
(I have not tried whether subqueries can be used with Phalcon's Builder)

I think it would be desirable to be able to support all grammars, but since the expressions differ depending on the database, I think it's unreasonable to hope for perfection. I think it is acceptable to be limited to some extent, especially when using Builder.

@indigo7333
Copy link
Author

@s-ohnishi I agree that EXISTS is faster, but I was perfectly fine with DISTINCT in my case, and it's not fun getting inconsistent results in PhalconPHP, and many developers might have similar issue.

@s-ohnishi
Copy link

@indigo7333 It's not about which is faster, I just thought it might be an alternative if you're having trouble "right now".
(Modifications around the database are slow and updates are not very frequent)
If the example SQL is just for explanation, it probably won't be easily replaced.
If it's a production SQL, I don't understand why it "must" be DISTINCT, so feel free to do whatever you like.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug A bug report status: unverified Unverified
Projects
None yet
Development

No branches or pull requests

3 participants