Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Data extraction issue: Query does not return the expected result. #1968

Open
hotsun1508 opened this issue Sep 4, 2024 · 1 comment
Open

Comments

@hotsun1508
Copy link

hotsun1508 commented Sep 4, 2024

Hi, I'm expecting different results when executing the query below.
There seems to be an issue with data extraction.
(I've checked that there's no problem with CASE WHEN usage: https://github.com/AlaSQL/alasql/wiki/CASE)

  • AlaSQL query
SELECT 
    CASE 
        WHEN [age] BETWEEN 0 AND 9 THEN '0-9'
        WHEN [age] BETWEEN 10 AND 19 THEN '10-19'
        WHEN [age] BETWEEN 20 AND 29 THEN '20-29'
        WHEN [age] BETWEEN 30 AND 39 THEN '30-39'
        WHEN [age] BETWEEN 40 AND 49 THEN '40-49'
        WHEN [age] BETWEEN 50 AND 59 THEN '50-59'
        WHEN [age] BETWEEN 60 AND 69 THEN '60-69'
        WHEN [age] BETWEEN 70 AND 79 THEN '70-79'
        WHEN [age] BETWEEN 80 AND 89 THEN '80-89'
        WHEN [age] BETWEEN 90 AND 99 THEN '90-99'
        ELSE '100+'
    END AS [age_group],
    COUNT(*) AS [customer_count]
FROM 
    [temptable]
WHERE 
    [age] IS NOT NULL
GROUP BY 
    [age_group]
ORDER BY 
    [age_group];
  • Result
0: {age_group: '100+', customer_count: 56046}

The data type of the age column is numeric, and the expected result when extracting this data is as follows.

agegroup	customer_count
20-29	6928
30-39	19188
40-49	16326
50-59	9272
60-69	3755
70-79	495
80-89	61
90-99	21

FYI, I've got this result by executing the same query in PostgreSQL.

SELECT 
    CASE 
        WHEN age BETWEEN 0 AND 9 THEN '0-9'
        WHEN age BETWEEN 10 AND 19 THEN '10-19'
        WHEN age BETWEEN 20 AND 29 THEN '20-29'
        WHEN age BETWEEN 30 AND 39 THEN '30-39'
        WHEN age BETWEEN 40 AND 49 THEN '40-49'
        WHEN age BETWEEN 50 AND 59 THEN '50-59'
        WHEN age BETWEEN 60 AND 69 THEN '60-69'
        WHEN age BETWEEN 70 AND 79 THEN '70-79'
        WHEN age BETWEEN 80 AND 89 THEN '80-89'
        WHEN age BETWEEN 90 AND 99 THEN '90-99'
        ELSE '100+'
    END AS agegroup,
    COUNT(*) AS customer_count
FROM 
    adventureworks."adventureworks-total"
WHERE 
   age IS NOT NULL
GROUP BY 
   agegroup
ORDER BY 
   agegroup;

The query isn't returning the expected results when I run it.
Could you help me out?
Thank you so much! 😊

@mathiasrw
Copy link
Member

mathiasrw commented Sep 13, 2024

Very interesting.

 SELECT 
        CASE 
            WHEN age BETWEEN 0 AND 9 THEN '0-9'
            WHEN age BETWEEN 10 AND 19 THEN '10-19'
            WHEN age BETWEEN 20 AND 29 THEN '20-29'
            ELSE '30+'
        END AS age_group 
        
    FROM ? 
    --GROUP BY age_group 

Gives the expected result (see https://jsfiddle.net/bpkesrh2/ ) and if I want to group across the case column it returns "30+" (see https://jsfiddle.net/1jLnxdmz/1/ ) like you have uncovered.

BUT!!!

If I remove any values over 30 (see https://jsfiddle.net/1jLnxdmz/2/ ) it STILL returns 30+ when adding the group by back in (see https://jsfiddle.net/1jLnxdmz/3/ )

This indicate that the group by logic somehow fetches the value in theELSE clause - no matter the content actually selected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants