Skip to content
This repository has been archived by the owner on Jul 18, 2024. It is now read-only.

Victor Chang Cardiac Research Institute #425

Closed
DSS-AL opened this issue Jun 20, 2022 · 75 comments
Closed

Victor Chang Cardiac Research Institute #425

DSS-AL opened this issue Jun 20, 2022 · 75 comments

Comments

@DSS-AL
Copy link

DSS-AL commented Jun 20, 2022

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

  • Organization Name: Distributed Storage Solutions (DSS)
  • Website / Social Media: distributedstorage.com
  • Total amount of DataCap being requested (between 500 TiB and 5 PiB): 5 PiB
  • Weekly allocation of DataCap requested (usually between 1-100TiB): 100 TiB
  • On-chain address for first allocation: f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa
  • Type: Custom Notary

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

The Victor Chang Cardiac Research Institute (VCCRI) is renowned for the quality of its [scientific discoveries](https://www.victorchang.edu.au/heart-research/major-discoveries) and is dedicated to finding cures for cardiovascular disease through world-class and cutting-edge [medical research](https://www.victorchang.edu.au/heart-research).
DSS have worked with VCCRI to develop a PoC to demonstrate the operational and economic benefit of the Filecoin Network and subsequently make this application on their behalf to solve a long-term data storage requirement resulting from their research.
VCCRI are seeking to store five copies of a 1 PiB dataset as an archive on the Filecoin Network.
DSS is a leading decentralised cloud storage provider dedicated to the Filecoin network based in Sydney. DSS operate enterprise scale compute and storage infrastructure in Tier 3 data centres throughout Australia with clients spanning the globe.

What is the primary source of funding for this project?

DSS is funding the project.

What other projects/ecosystem stakeholders is this project associated with?

Client Allocation Request for: Victor Chang Cardiac Research Institute #1937

Use-case details

Describe the data being stored onto Filecoin

The data sets are the original outputs of scientific cardiac research.

Where was the data in this dataset sourced from?

The data sets have been created by large-scale scientific cardiac research.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

DSS do not currently have permission from the client to share the data publicly, although it has the full cooperation from the client to verify data with notaries directly.

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

The existing dataset is limited by patient consent and whilst it is deidentified data internal policies and permissions do not currently allow for public use. Once DSS and the broader ecosystem have established a high degree of trust with VCCRI and its governance committees we seek to work with them to enable publicly available datasets that may be of value to the medical research community.

What is the expected retrieval frequency for this data?

The principle use case for the client is archival, thus retrieval is likely limited to twice a year.

For how long do you plan to keep this dataset stored on Filecoin?

Indefinitely

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

The storage deals will be distributed among at least four unique geographies. Certain elements of the data have sovereignty requirements, thus these will be limited to distribution within Australian territories. It is DSSs objective to distributed the datasets amongst the USA and Europe to the extent permissible by the client.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Online deals using Singularity.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

DSS intend distributed data among SPs of enterprise scale with similar sealing capacity and whom operate tier 3 data centres.

How will you be distributing deals across storage providers?

Data that has a sovereignty requirement is intended to be distributed among DSS, Digital Income Fund, Holon and Vigilant IT. Datasets without a sovereignty may well be distributed among peers in other geographies, as these discreet datasets are identified by the client we will engage other SPs in the US and EU.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes, we have the resources/funding to begin making deals once we receive DataCap. 

We currently have the support we need thanks to the help of the Foundation, PL and other members of the community over the last few months.
@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@large-datacap-requests
Copy link

Thanks for your request!
Everything looks good. 👌

A Governance Team member will review the information provided and contact you back pretty soon.

@Kakkouii
Copy link

@DSS-AL Hey, do you have mail with company domain? I can't find any on your website, would you please mail to [email protected] with a company domain mail to verify your identity. Besides, I do understand your data are not currently available for public, but how long it will take to make it public? A rough figure is enough for us.

@DSS-AL
Copy link
Author

DSS-AL commented Jun 21, 2022

Hi @EGGRICE02, no problem, email sent. We have ambition to begin making permissible datasets public within 6 months.

@MegTei
Copy link

MegTei commented Jun 21, 2022

HI Notaries, I have performed DD under NDA with Andrew @dss who is acting as proxy for the client VCCRI who have an encrypted (private) data set. I have cited email comms about the PoC and been CC'd to the IT sponsor and am satisfied this is authentic.

HI Andrew (@DSS-AL) please you confirm further details:

  1. SP distribution partners and locations
  2. What type of data is it and why it's necessary to be encrypted

@DSS-AL
Copy link
Author

DSS-AL commented Jun 21, 2022

Thank you Meg,

  1. The SP distribution is intended as follows (NB: The data has sovereignty requirements that do not allow it to be distributed internationally):

    • 1PiB Digital Income Fund (Sydney)
    • 1PiB Vigilant IT (Sydney)
    • 1PiB Holon (Sydney)
    • 2PiB DSS (Sydney) - Following Seal Storage precedent.
  2. The dataset is initially private as it contains patient data that has not been fully anonymised. We are working with the client to enable parts if not all data to be made public over time.

Best
Andrew

@DSS-AL
Copy link
Author

DSS-AL commented Jun 22, 2022

Hi Meg, @galen-mcandrew and notaries,

Great news following a call with the client this morning, they have agreed to share all data from published papers as public data only leaving the non-anonymised patient data as encrypted.

I hope this helps the application process. Please let me know if there are any questions.

Andrew

@Destore2023
Copy link

HI Notaries, I have performed DD under NDA with Andrew @dss who is acting as proxy for the client VCCRI who have an encrypted (private) data set. I have cited email comms about the PoC and been CC'd to the IT sponsor and am satisfied this is authentic.

HI Andrew (@DSS-AL) please you confirm further details:

  1. SP distribution partners and locations
  2. What type of data is it and why it's necessary to be encrypted

OK, It's time for filecoin to welcome private dataset. Please count ByteBase in if needed. @MegTei

@cryptowhizzard
Copy link

Yes, i guess it is time and this would be a perfect candidate for Fil -E because of the sovereignty requirements.

As long as Holon is keeping the oversight here on this project ( ie. building the dataset for distribution etc. ) you can count us in.

@Kevin-PiKNiK
Copy link

This is super cool. Congrats to the DSS team on this fantastic enterprise opportunity with sovereignty requirements. We're hopeful to do something similar in the United States (we have HIPAA issues to overcome) with life sciences, academics, and health systems in the pipeline.

@kernelogic
Copy link

I'd like to support this LDN as well, seeing more and more FIL-E style applications nowadays and I want to participate early to get experience of overseeing this type of LDN lifecycle @MegTei .

@DSS-AL
Copy link
Author

DSS-AL commented Jun 29, 2022

Thank you @kernelogic this is great news, thank you for your support. More to come just like this one.

@DSS-AL
Copy link
Author

DSS-AL commented Jun 29, 2022

Hi @dkkapur please see below a list of notaries that have expressed their intent to support the LDN application.

Fei Yan / Kernelogic / @kernelogic
Wijnand Schouten / Speedium / @cryptowhizzard
Eric / ByteBase / @swatchliu
Meg Dennis / Holon / @MegTei
Cabrina Huang / @xingjitansuo

@dkkapur
Copy link
Collaborator

dkkapur commented Jun 30, 2022

@jamerduhgamer are you looking to support this one as well?

@DSS-AL happy to proceed here, though would highly suggest having at least 1-2 more notaries in case folks have issues with signing or are taking time off. This at least gets you some buffer.

@NiwanDao
Copy link

I had an offline meeting with @DSS-AL to discuss this application. I am excited to welcome this type of scientific dataset onboard to Filecoin.

  1. @dkkapur Since this dataset can only be distributed in Australia, can the notaries from other regions approve?
  2. what would be the best way to check the encrypted deal is made against the scientific cardiac research dataset?

@DSS-AL
Copy link
Author

DSS-AL commented Jun 30, 2022

Thank you for your support @xingjitansuo, I can help answer part 2.

We have a NDA with @MegTei who has had direct communication from the customer will have the ability to verify the data. I could also seek to arrange an NDA with yourself or other notaries if required. I hope this helps.

@large-datacap-requests
Copy link

Stats & Info for DataCap Allocation

Multisig Notary address

f01885534

Client address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Last two approvers

megtei & cryptowhizzard

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

800TiB

Total DataCap granted for client so far

350TiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

4.65PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
10240 6 400TiB 24.73 43.47TiB

@filplus-checker
Copy link

DataCap and CID Checker Report1

  • Organization: Distributed Storage Solutions (DSS)
  • Client: f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

  • Storage provider should not exceed 25% of total datacap.
  • Storage provider should not be storing duplicate data for more than 20%.
  • Storage provider should have published its public IP address.
  • All storage providers should be located in different regions.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01919423 Sydney, New South Wales, AU 88.95 TiB 23.90% 87.64 TiB 1.48%
f01319368new Sydney, New South Wales, AU 88.79 TiB 23.85% 86.48 TiB 2.60%
f01896422 Fremont, California, US 82.39 TiB 22.13% 80.89 TiB 1.82%
f01938357new Sydney, New South Wales, AU 78.01 TiB 20.96% 77.38 TiB 0.80%
f01156538 Sydney, New South Wales, AU 27.54 TiB 7.40% 26.91 TiB 2.27%
f01864434 Sydney, New South Wales, AU 3.29 TiB 0.88% 3.29 TiB 0.00%
f01206408 Sydney, New South Wales, AU 3.29 TiB 0.88% 3.29 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

  • No more than 25% of unique data are stored with less than 4 providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
1.69 TiB 1.69 TiB 1 0.45%
6.19 TiB 12.44 TiB 2 3.34%
2.25 TiB 6.88 TiB 3 1.85%
48.88 TiB 198.44 TiB 4 53.31%
29.75 TiB 152.03 TiB 5 40.84%
132.00 GiB 792.00 GiB 6 0.21%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients.
Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

Copy link

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb2hhzdlz4wsk4uztcggs3dyjlm4s7rd3t2zvziuevbzkipzxxnr2

Address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Datacap Allocated

800.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

82f4dfa9-127a-4e9d-a72f-01ac4805146e

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb2hhzdlz4wsk4uztcggs3dyjlm4s7rd3t2zvziuevbzkipzxxnr2

Copy link

NiwanDao commented Jan 3, 2023

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebatiumj6cefhugs567nke6tu7zeda5x65ft5lx7q42ry4esz5ltw

Address

f3qwluincblkdog6jovdcrv3yqqrlgxipnwv43un2iwbrofv63g6fmqogapwi3cf3fh4l3mdcrgtmfpbfphypa

Datacap Allocated

800.00TiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Id

82f4dfa9-127a-4e9d-a72f-01ac4805146e

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebatiumj6cefhugs567nke6tu7zeda5x65ft5lx7q42ry4esz5ltw

@BDEio
Copy link

BDEio commented Jan 11, 2023

@DSS-AL Hi! Great to see that you have gotten approval for DataCap!
BDE is a verified deals auction house helping you to get paid storing your data with reliable storage providers. If you need any help, please get in touch.

@marshyonline
Copy link

checker:manualTrigger

1 similar comment
@cryptowhizzard
Copy link

checker:manualTrigger

@filplus-checker-app
Copy link

DataCap and CID Checker Report Summary1

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the full report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@marshyonline
Copy link

checker:manualTrigger

@filplus-checker-app
Copy link

DataCap and CID Checker Report Summary1

Retrieval Statistics

  • Overall Graphsync retrieval success rate: 58.32%
  • Overall HTTP retrieval success rate: 0.00%
  • Overall Bitswap retrieval success rate: 0.00%

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients2

✔️ No CID sharing has been observed.

Full report

Click here to view the CID Checker report.
Click here to view the Retrieval report.

Footnotes

  1. To manually trigger this report, add a comment with text checker:manualTrigger

  2. To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

@github-actions
Copy link

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

@github-actions github-actions bot added the Stale label Jul 21, 2023
@github-actions
Copy link

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

@github-actions github-actions bot closed this as not planned Won't fix, can't repro, duplicate, stale Jul 27, 2023
@marshyonline
Copy link

This Project is still underway and should be re-opened

@Sunnyiscoming
Copy link
Collaborator

Hello, @DSS-AL per the filecoin-project/notary-governance#922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

@Sunnyiscoming Sunnyiscoming reopened this Oct 30, 2023
Copy link

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

--
Commented by Stale Bot.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

17 participants