Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug dataset image not showing #969

Merged
merged 28 commits into from
Mar 9, 2022

Conversation

kencho51
Copy link
Contributor

@kencho51 kencho51 commented Feb 21, 2022

Pull request for issue: #960 and #961

This is a pull request for the following functionalities:

The dataset images have been stored in a s3 bucket, which would be accessed through a CDN server hosted by Cloudflare.
By updating the dataset image src which points to the CDN server and the Content Security Policy, the gigadb website will be able to display it correctly from a Cloudflare server.

Changes to the database schema

  1. Updated the url in data/dev/image.csv

  2. Created a migration script to update the url in production db image table according to:

  • Scenario 1: If url starts with http://gigadb.org/, replace it with https://assets.gigadb-cdn.net/live/images/datasets/
  • Scenario 2: If url is empty and location contains no_image, update it to https://assets.gigadb-cdn.net/live/images/datasets/no_image.png
  • Scenario 3: other than scenario 2 and 3, will be dealt case by case basis in future.

Changes to the tests

To assert that the dataset image is rendered from the CDN server.

@kencho51 kencho51 requested review from rija and pli888 February 21, 2022 04:17
@kencho51 kencho51 force-pushed the FixBug-dataset-image-not-showing branch from 392eb96 to 2dee889 Compare February 23, 2022 09:08
@kencho51 kencho51 marked this pull request as ready for review February 25, 2022 02:09
@kencho51 kencho51 force-pushed the FixBug-dataset-image-not-showing branch from 9c0b037 to 2e03c12 Compare February 28, 2022 03:10
Copy link
Member

@pli888 pli888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me and I can see that the images on my local environment are fetched from the CDN after running:

docker-compose run --rm application ./protected/yiic migrate --migrationPath=application.migrations.fix_import --interactive=0 

Rather than having to manually execute the above command to replicate what will soon be the situation on gigadb.org with the images fetched from CDN, would it be better to add the command to the ops/scripts/setup_*db.sh scripts?

public function safeUp()
{
Yii::app()->db->createCommand("update image set url = replace(url, 'http://gigadb.org/','https://assets.gigadb-cdn.net/images/datasets/') where url like 'http://gigadb.org/%';")->execute();
Yii::app()->db->createCommand("update image set url = 'https://assets.gigadb-cdn.net/images/datasets/no_image.png' where url like '' and location like 'no_image%';")->execute();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need an acceptance test for this? Might be useful to create some data for a very basic private dataset that uses the generic no image png file?

Copy link
Contributor

@rija rija Feb 28, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

hi @kencho51

Agree with @pli888 that a scenario is missing.

It's more about testing the case where a generic image ought to be displayed.

you want a scenario that uses a dataset whose image record fulfills the condition for generic image to be used. there may be such test data in the dev test data already, but if it's not the case, we need to add one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @pli888 and @rija,

A scenario Dataset with no image associated will show generic image has been created.
In order to pass it, a dataset /dataset/300070 is created with empty url and location with value no_image.png.

A new command update image set url = '' where url is null; is created to handle null value which should not be existed during test database set up from csv files.

@kencho51 kencho51 force-pushed the FixBug-dataset-image-not-showing branch from 55d361f to 17ea4a0 Compare March 1, 2022 09:47
public function safeUp()
{
Yii::app()->db->createCommand("update image set url = replace(url, 'http://gigadb.org/','https://assets.gigadb-cdn.net/images/datasets/') where url like 'http://gigadb.org/%';")->execute();
// The empty value would be converted to null after set up the test database from csv files, so the following step is needed to convert it back to empty again.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kencho51

I dug a little deeper and figured out the reason of the undesirable behaviour:
it's explained here: https://handlebarsjs.com/guide/builtin-helpers.html#if

You can use the if helper to conditionally render a block. If its argument returns false, undefined, null, "", 0, or [], Handlebars will not render the block.

the code where that if test is used is ops/configuration/yii-conf/migration.php.dist, line 11.

You can see that in the existing dev test data data/dev/image.csv with id 147, is a record with "" in the url.

The resulting Yii migration file protected/migrations/data/dev/m200529_050010_insert_data_image_tab.php has this section:

        $this->insert('image', array(
            'id' => '147',
            'tag' => 'CS icon1',
            'license' => 'public domain',
            'photographer' => 'Shashaank Vattikuti ',
            'source' => 'Gigascience',
        ));

where there is no url key because the if test has resulted in the key/value not to be added since the if test considers "" as false by default.

So the take away of this comment is:

It's better to fix the csv to migration javascript code to allow empty string rather than adding that dirty workaround in the Yii migration script.

The documentation I linked above is very well aware that the if behaviour might be indesirable in some context, as they add this hint on how to fix it:

Sub-Expressions

Helpers are the proposed way to add custom logic to templates. You can write any helper and use it in a sub-expression.

For example, in checking for initialization of a variable the built-in #if check might not be appropriate as it returns false for empty collections (see Utils.isEmpty).

@only1chunts
Copy link
Member

FYI, there are a number of datasets in the production database that already use the "no image" icon, one example would be 101717

@kencho51
Copy link
Contributor Author

kencho51 commented Mar 1, 2022

Hi @pli888,

Rather than having to manually execute the above command to replicate what will soon be the situation on gigadb.org with the images fetched from CDN, would it be better to add the command to the ops/scripts/setup_*db.sh scripts?

My idea is that this fix will only be used once or several times during development, but after that, this script will be no longer needed because all data stored would be in a correct format.
But the setup script is a more permanent way, because it will be called whenever ./up.sh is called.

@rija
Copy link
Contributor

rija commented Mar 1, 2022

Hi @kencho51

Here is the patch to fix the CSV to Yii migrations script's bad behaviour with the url field:

diff --git a/ops/configuration/yii-conf/migration.php.dist b/ops/configuration/yii-conf/migration.php.dist
index 279495c24..91d0c086f 100644
--- a/ops/configuration/yii-conf/migration.php.dist
+++ b/ops/configuration/yii-conf/migration.php.dist
@@ -8,7 +8,7 @@ class {{ class_name }} extends CDbMigration
     {{#each safeup_data}}
         $this->insert('{{ ../table_name }}', array(
         {{#each this}}
-        {{#if this}}
+        {{#if (isdefined @key this) }}
             '{{@key}}' => '{{this}}',
         {{/if}}
         {{/each}}
diff --git a/ops/scripts/csv_yii_migration.js b/ops/scripts/csv_yii_migration.js
index 675861acd..de56b67d1 100644
--- a/ops/scripts/csv_yii_migration.js
+++ b/ops/scripts/csv_yii_migration.js
@@ -158,6 +158,13 @@ for(let a = 0; a < files.length; a ++) {
         safeup_data: parsed.data,
         safedown_data: filtered_ids
     };
+    // Register sub expression to decide whether a value is defined based on its key
+    handlebars.registerHelper('isdefined', function (key, value) {
+        if (key == 'url') {
+            return value !== undefined;
+        }
+        return !handlebars.Utils.isEmpty(value);
+    });
     // Read handlebars template as string
     let template = fs.readFileSync(HANDLEBARS_TEMPLATE_FILE, "utf8");
     const templateScript = handlebars.compile(template);

In theory, using the example code block from the Handlebars documentation sub-expressions section
would be the correct and proper fix, but doing so revealed a lot of data errors (*) in the CSV files that are a big pain to fix right now, so I've tweaked it a little bit to update the behaviour only for url fields for now.

(*) They were undetected so far due to a combination of Handlebars' default falsification of empty strings - causing the key/value set not being added to the insert statement) and corresponding database columns being set with correct default value.

@rija
Copy link
Contributor

rija commented Mar 2, 2022

HI @kencho51, @pli888

I had to change the directory structure on the S3 bucket assets.gigadb-cdn.net, so that the top-level directories are the values of $GIGADB_ENV.

this is because once we have code that can upload to the S3 bucket, we don't want uploads from non-live environments to get mixed up
with the live image data.

That means that you need to update the Yii migrations to replace every occurrences of https://assets.gigadb-cdn.net with https://assets.gigadb-cdn.net/live

It's not just for dataset images. The Yii migrations for the project images need that change as well.

Thanks.

Copy link
Contributor

@rija rija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kencho51,

May I suggest to replace in m220218_062834_update_url_image_tab.php

update image set url = 'https://assets.gigadb-cdn.net/live/images/datasets/no_image.png' where url like '' and location like 'no_image%'

with:

update image set url = 'https://assets.gigadb-cdn.net/live/images/datasets/no_image.png' where url='' and location like 'no_image%''

This is because like is for pattern matching (like in the second part of the where clause), and in the first part of the where clause we are looking for empty url value.

Most importantly, the use of like has performance penalty on the query execution plan, so if we don't have to use it, we avoid using it.

Additionally, make sure you rebase this branch with the latest changes merged to develop and please do add the change to .gitlab-ci.yml I've mentioned in team chat.

The rest of the code is fine, and the tests are all passing locally and on CI

@kencho51 kencho51 force-pushed the FixBug-dataset-image-not-showing branch from b36eb8b to 43c9ec3 Compare March 6, 2022 02:49
Copy link
Contributor

@rija rija left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @kencho51
code looks fine, the images (both dataset and projects) appear properly on my local dev environment and my staging dedployment. Tests are passing both locally and on CI. I'm happy to approve this PR

Copy link
Member

@pli888 pli888 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kencho51 The code looks good now and I can see dataset images are fetched from the CDN now. All test are passing on my CI pipeline so I approve this PR.

@rija rija merged commit 0d0079d into gigascience:develop Mar 9, 2022
@kencho51 kencho51 deleted the FixBug-dataset-image-not-showing branch July 27, 2022 10:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants