Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Application] <sdcloud> <Geospatial Data Cloud> #123

Open
martapiekarska opened this issue Jan 21, 2025 · 28 comments
Open

[DataCap Application] <sdcloud> <Geospatial Data Cloud> #123

martapiekarska opened this issue Jan 21, 2025 · 28 comments
Assignees
Labels

Comments

@martapiekarska
Copy link
Contributor

martapiekarska commented Jan 21, 2025

Version

2025-01-21T05:57:35.018Z

DataCap Applicant

@Alin11155

Data Owner Name

NASA USGS

Data Owner Country/Region

Life Science / Healthcare

Website

https://www.usgs.gov/landsat-missions/landsat-collection-2-level-2-science-products

Social Media Handle

https://www.usgs.gov/landsat-missions/landsat-collection-2-level-2-science-products

Social Media Type

What is your role related to the dataset

Data Preparer

Total amount of DataCap being requested

5PiB

Expected size of single dataset (one copy)

600TiB

Number of replicas to store

9

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f12wlylkwoyws5lgkb2ayutvrp3r5q4zczim2q7iq

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Identifier

Yes

Share a brief history of your project and organization

Share a brief history of your project and organization: sdcloud is a distributed storage system developed based on blockchain and distributed storage technology. It perfectly solves core issues such as "protecting data privacy" and "reducing storage costs", and provides a reliable data storage solution for privacy data security and ownership issues in the big data era.The personal end-to-end login method is more convenient, providing private storage space and public data sharing platform for personal accounts. Work files, life records, videos, audios, photos and other documents can be stored in the cloud anytime and anywhere, and the download speed is increased by more than 60%. With data slicing storage, private key support and content addressing technology, there is no need to worry about the hidden danger of private files being peeped.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

The Landsat-8 satellite carries two sensors, the Operational Land Imager (OLI) and the Thermal Infrared Sensor (TIRS).
Landsat-8 is basically the same as Landsat 1-7 in terms of spatial resolution and spectral characteristics. The satellite has a total of 11 bands, with bands 1-7 and 9-11 having a spatial resolution of 30 meters, and band 8 being a panchromatic band with a resolution of 15 meters. The satellite can achieve global coverage once every 16 days. Landsat 9 is the newest satellite in the Landsat series. It is scheduled to be launched in December 2020 by an Atlas V 401 rocket from Vandenberg Air Force Base in California. It was actually launched on September 27, 2021, and has begun collecting the first batch of data. Remote sensing data was obtained on October 31, 2021. Landsat 9 is an irreplaceable record of the Landsat series' observation of the earth's surface. To reduce construction time and the risk of observation gaps, Landsat 9 largely replicates its predecessor, Landsat 8.

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

If you are a data preparer. What is your location (Country/Region)

China

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

first download the data and use the official tools such boost or lotus to prepare.

If you are not preparing the data, who will prepare the data? (Provide name and business)

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

No

Please share a sample of the data

https://registry.opendata.aws/usgs-landsat/index.html
AWS CLI Access aws s3 ls --request-payer requester s3://usgs-landsat/collection02/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

Confirm

If you chose not to confirm, what was the reason

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America

How will you be distributing your data to storage providers

Cloud storage (i.e. S3), HTTP or FTP server, Shipping hard drives

How did you find your storage providers

Slack, Big Data Exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you used

Please list the provider IDs and location of the storage providers you will be working with.

f0101020 CN
f0101021 CN
f0861594 US
f01164500 US
f0112781 CN
f0154795 CN
f01973164 US
f0104380 CN
f0154137 US

How do you plan to make deals to your storage providers

Boost client, Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

Can you confirm that you will follow the Fil+ guideline

Yes

Copy link
Contributor

datacap-bot bot commented Jan 21, 2025

Application is waiting for allocator review

@martplo
Copy link
Contributor

martplo commented Jan 21, 2025

@Alin11155
Thank you for your submission. Upon review, I noticed that the dataset you intend to store originates from NASA. The description of this dataset should be significantly different. This will help other Filecoin users easily identify the dataset.

Please note that we are aware that this is your 3rd application to our pathway (1st, 2nd). Why do you keep changing the dataset you want to store as well as the accounts you apply from? This does not build trust in you.

@Alin11155
Copy link

Alin11155 commented Jan 22, 2025

@Alin11155 Thank you for your submission. Upon review, I noticed that the dataset you intend to store originates from NASA. The description of this dataset should be significantly different. This will help other Filecoin users easily identify the dataset.

Please note that we are aware that this is your 3rd application to our pathway (1st, 2nd). Why do you keep changing the dataset you want to store as well as the accounts you apply from? This does not build trust in you.

@dampud
Because 1st time we miscalculated the size of the data sample, so it was closed.
2nd time, we try to open our dataset as open-dataset, but you said it was no value. It was closed.
3rd time, we try to use open-dataset with large size , you said it was from NASA again, But i didn't find it on NASA. Can you give me the data link on NASA?If all the data on the website all from NASA?
Why we apply again and again? It was because we have prepared enough tokens and find our partners. We have made a detailed plan. But it always gets rejected on the source link of the dataset。In order to distinguish the application for each account, so we change account all the time. If you have any distrust, you can KYC us. All the thing are the truth. And we are happy that you can tell how to be succeed in the application. we can correct it.
Hope you can give me the correct (The Landsat-8 satellite) sample link on NASA. And then i will make the application again.

@dampud
Copy link

dampud commented Jan 22, 2025

@Alin11155
Kindly note that the preparation of data and its accurate description should be the client's responsibility, not the allocator's.

For context, the Landsat satellite program is operated by USGS and NASA. GScloud does not own this data but acts as a platform that redistributes it. Similarly, when data is shared on the Filecoin network, neither GScloud nor the client becomes the owner of the data. Access to this open data is facilitated through various public portals.

While the data source is relevant, it is equally critical to specify the exact scope of what will be stored to justify the requested DataCap. Applicants should clearly understand and define what they intend to store at the time of application. For example, providing a .csv file with a detailed list of the data prepared for storage can be helpful in this regard.

Without such detailed information, it becomes challenging to justify the requested DataCap.

We kindly request you to:

Gather all necessary information and provide a detailed description of the files or dataset that would explain it's scope and size.
Ensure the ownership information for the dataset is accurate.
Clarify the details regarding your project (as it appears you may not be representing GScloud).
Once this information is provided and complies with the outlined policies, we will proceed further.

Furthermore, all applicants should already be familiar with the policies outlined in the Open Data Pathway Policies before initiating the application process. Adherence to these policies is mandatory and non-negotiable.

@Alin11155
Copy link

@Alin11155 Kindly note that the preparation of data and its accurate description should be the client's responsibility, not the allocator's.

For context, the Landsat satellite program is operated by USGS and NASA. GScloud does not own this data but acts as a platform that redistributes it. Similarly, when data is shared on the Filecoin network, neither GScloud nor the client becomes the owner of the data. Access to this open data is facilitated through various public portals.

While the data source is relevant, it is equally critical to specify the exact scope of what will be stored to justify the requested DataCap. Applicants should clearly understand and define what they intend to store at the time of application. For example, providing a .csv file with a detailed list of the data prepared for storage can be helpful in this regard.

Without such detailed information, it becomes challenging to justify the requested DataCap.

We kindly request you to:

Gather all necessary information and provide a detailed description of the files or dataset that would explain it's scope and size. Ensure the ownership information for the dataset is accurate. Clarify the details regarding your project (as it appears you may not be representing GScloud). Once this information is provided and complies with the outlined policies, we will proceed further.

Furthermore, all applicants should already be familiar with the policies outlined in the Open Data Pathway Policies before initiating the application process. Adherence to these policies is mandatory and non-negotiable.

@dampud Thank you for yur warm explanation,Here are my further answer.
Yes we have some mistake with the data source. And then we found the data real resource: dataset: https://www.usgs.gov/landsat-missions/landsat-collection-2-level-2-science-products
Brief introduction about the dataset: Landsat Collection 2 science products are internationally-recognized as Committee on Earth Observation Satellites (CEOS) Analysis Ready Data (ARD). This certification recognizes satellite data that have been processed to a minimum set of requirements and organized into a structure that allows for immediate analysis with minimum user effort and interoperability both through time and with other datasets. Landsat data have become an invaluable resource for understanding and monitoring the Earth and its natural resources. The long temporal record, global coverage, free and open access, and versatile applications make Landsat data an ideal tool for learning and advancing remote sensing skills and studying global and regional trends and patterns. There are numerous educational resources for learners of all ages and experience levels.
dataset sample Link: https://registry.opendata.aws/usgs-landsat/index.html
AWS CLI Access
aws s3 ls --request-payer requester s3://usgs-landsat/collection02/
We have checked the data size. It is more than 600TiB. And we will cooperate with 9SPs above. And we will add more SPs in the future. The tokens are enough, and sp are ready. Please let me know if you have any question. Thank you so much.

@dampud
Copy link

dampud commented Jan 23, 2025

@Alin11155
Thank you for clarification.

  1. Have the data already been prepared and indexed.
  2. Will an index of the files be created during the preparation process to enable potential access for other community members ?
  3. Do you know the exact scope of the data that will be stored at this point ? If so, can you justify the size (600TB)? Since access to the specified S3 bucket is not open, it is not easily verifiable.

Please correct the initial application form (you can provide document based on which i will edit the "github issue" to ensure it contains proper information. Keep in mind that all fields should be filled out, without leaving bland fields without response if that is applicable.

@Alin11155
Copy link

@dampud Sorry i don't know cannot process multiple parallel applications. If have to close one , i prefer close #123 and reopen #93. We can discuss the our sdcloud data. Thank you so much. It is my fault. Let talk about #93 Ok? Sincerely.

@dampud
Copy link

dampud commented Jan 27, 2025

@Alin11155
Enterprise path is designed for corporate clients whose data will not be publicly available. It is also important to have the legal right to use this data, along with agreements with service providers (SPs), which are verified in this case, together with the KYB process for both the SP and the client. Given the nature of the data provided, we recommend staying on the open path.

@Fray991
Copy link

Fray991 commented Jan 28, 2025

@Alin11155 Thank you for clarification.

  1. Have the data already been prepared and indexed.
  2. Will an index of the files be created during the preparation process to enable potential access for other community members ?
  3. Do you know the exact scope of the data that will be stored at this point ? If so, can you justify the size (600TB)? Since access to the specified S3 bucket is not open, it is not easily verifiable.

Please correct the initial application form (you can provide document based on which i will edit the "github issue" to ensure it contains proper information. Keep in mind that all fields should be filled out, without leaving bland fields without response if that is applicable.

@dampud The data has been prepared. And we are sure about the retrieval for anyone. The scope of the data is provided for Landsats 1, 2, 3, 4, 5, 7, 8, and 9 (excludes Landsat 6). We have check the collection02 catalog by S3 bucket. And from the gscloud platform, the all LANDSAT series are collected in collection02. It is about 600TiB. We use request-payer checked the data.

@Fray991
Copy link

Fray991 commented Jan 28, 2025

@Alin11155 Thank you for clarification.

  1. Have the data already been prepared and indexed.
  2. Will an index of the files be created during the preparation process to enable potential access for other community members ?
  3. Do you know the exact scope of the data that will be stored at this point ? If so, can you justify the size (600TB)? Since access to the specified S3 bucket is not open, it is not easily verifiable.

Please correct the initial application form (you can provide document based on which i will edit the "github issue" to ensure it contains proper information. Keep in mind that all fields should be filled out, without leaving bland fields without response if that is applicable.

@dampud](https://github.com/dampud) The data has been prepared. And we are sure about the retrieval for anyone. The scope of the data is provided for Landsats 1, 2, 3, 4, 5, 7, 8, and 9 (excludes Landsat 6). We have check the collection02 catalog by S3 bucket. And from the gscloud platform, the all LANDSAT series are collected in collection02. It is about 600TiB. We use request-payer checked the data. And we find that there is no other way to view the data besides the AWS and gscloud platform.

@dampud
Copy link

dampud commented Jan 28, 2025

@Fray991
Thank you for your response. Based on your answer, I assume that we will continue the discussion in this thread and pathway.

Please prepare a proper version of the application with accurate descriptions referring to the dataset, data owner, and data preparation steps, and share it so that I can update the original application.
Please remember that during sealing, it will be very important to create a mapping file containing references to specific pieces (e.g., this page).

@Fray991
Copy link

Fray991 commented Jan 29, 2025

[DataCap Application] -
Core Information
Data Owner Name*
NASA
Data Owner Country/Region*
USA
Data Owner Industry*
Life Science / Healthcare
Website*
 https://www.usgs.gov/landsat-missions/landsat-collection-2-level-2-science-products
Describe the data being stored onto Filecoin*
Brief introduction about the dataset: Landsat Collection 2 science products are internationally-recognized as Committee on Earth Observation Satellites (CEOS) Analysis Ready Data (ARD). This certification recognizes satellite data that have been processed to a minimum set of requirements and organized into a structure that allows for immediate analysis with minimum user effort and interoperability both through time and with other datasets. Landsat data have become an invaluable resource for understanding and monitoring the Earth and its natural resources. The long temporal record, global coverage, free and open access, and versatile applications make Landsat data an ideal tool for learning and advancing remote sensing skills and studying global and regional trends and patterns. There are numerous educational resources for learners of all ages and experience levels.
dataset sample Link: https://registry.opendata.aws/usgs-landsat/index.html
AWS CLI Access aws s3 ls --request-payer requester s3://usgs-landsat/collection02/
If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?
First download the data and use the Cloud storage (i.e. S3)or Shipping hard drives to the Storage providers. Meanwhile we will try our best to ask the SP to obey the community rules.
@dampud The document can't upload, so here are the texts. If it need i reapply for a new one?

datacap-bot bot added a commit that referenced this issue Jan 30, 2025
@dampud
Copy link

dampud commented Jan 30, 2025

@Fray991

The application has been updated based on your information. Please provide the revised values for the data set size, number of replicas, and total storage requirements. Additionally, specify the proper project/organization name by which you will be identified later.

Also, please review and confirm that you understand the following requirement:

"Please remember that during sealing, it is crucial to create a mapping file containing references to specific pieces (e.g., this page)."

@Fray991
Copy link

Fray991 commented Jan 31, 2025

Total amount of DataCap being requested
5PiB
Expected size of single dataset (one copy)
600TiB
Number of replicas to store
9 replicas
Weekly allocation of DataCap requested
500TiB
Proper project/organization name:LANDSAT From NASA
Storage requirements: high the retrieval rate, the data storage consistent,at least 5 different regions SPs. We will add more SPs During the following steps and will be disclosed here. Keep the CID report healthy.
Mapping file: We are going to create it when the application are supported for the first round.
@dampud Any other question please let me know and happy new year.

@dampud
Copy link

dampud commented Feb 3, 2025

@Fray991 Thank you, please note that fields "organization name" and "Share a brief history of your project and organization" refer to your organization that will recive DC(your organization). Please provide also those two.

@Fray991
Copy link

Fray991 commented Feb 4, 2025

organization name: sdcloud
Share a brief history of your project and organization: sdcloud is a distributed storage system developed based on blockchain and distributed storage technology. It perfectly solves core issues such as "protecting data privacy" and "reducing storage costs", and provides a reliable data storage solution for privacy data security and ownership issues in the big data era.The personal end-to-end login method is more convenient, providing private storage space and public data sharing platform for personal accounts. Work files, life records, videos, audios, photos and other documents can be stored in the cloud anytime and anywhere, and the download speed is increased by more than 60%. With data slicing storage, private key support and content addressing technology, there is no need to worry about the hidden danger of private files being peeped.
@dampud

@dampud dampud changed the title [DataCap Application] <Geospatial Data Cloud > <2025-01-21T05:57:35.018Z> [DataCap Application] <sdcloud> <Geospatial Data Cloud> Feb 4, 2025
datacap-bot bot added a commit that referenced this issue Feb 4, 2025
datacap-bot bot added a commit that referenced this issue Feb 4, 2025
Copy link
Contributor

datacap-bot bot commented Feb 4, 2025

@datacap-bot datacap-bot bot removed the validated label Feb 4, 2025
@dampud
Copy link

dampud commented Feb 5, 2025

@Fray991 Please complete mentioned above KYC so that we can continue.

Copy link
Contributor

datacap-bot bot commented Feb 6, 2025

KYC completed for client address f12wlylkwoyws5lgkb2ayutvrp3r5q4zczim2q7iq with Optimism address 0x9B494014969b34cd98156d802d929A142C18be61 and passport score 45.

@Fray991
Copy link

Fray991 commented Feb 6, 2025

@dampud KYC is done. Please check.

Copy link
Contributor

datacap-bot bot commented Feb 7, 2025

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

500TiB

DataCap Amount - First Tranche

256TiB

Client address

f12wlylkwoyws5lgkb2ayutvrp3r5q4zczim2q7iq

Copy link
Contributor

datacap-bot bot commented Feb 7, 2025

DataCap Allocation requested

Multisig Notary address

Client address

f12wlylkwoyws5lgkb2ayutvrp3r5q4zczim2q7iq

DataCap allocation requested

256TiB

Id

434eb87c-ca29-4a95-9e54-75a93e59673b

Copy link
Contributor

datacap-bot bot commented Feb 7, 2025

Application is ready to sign

Copy link
Contributor

datacap-bot bot commented Feb 7, 2025

Storage Providers have been changed successfully

Copy link
Contributor

datacap-bot bot commented Feb 7, 2025

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceasqpxyl53q7raaz37guccld5zhfzcapki56pybtheomu42l272rq bafy2bzaceduhtataxxflbshgzqygdp4fdqp67d7j7horxchm2i6ha5oeair7s

Address

f12wlylkwoyws5lgkb2ayutvrp3r5q4zczim2q7iq

Datacap Allocated

256TiB

Signer Address

f1msap4wvgzzv4xlzeq6kycmgx55ferfloxnt2rcy

Id

434eb87c-ca29-4a95-9e54-75a93e59673b

You can check the status here https://filfox.info/en/message/bafy2bzaceasqpxyl53q7raaz37guccld5zhfzcapki56pybtheomu42l272rq, and here https://filfox.info/en/message/bafy2bzaceduhtataxxflbshgzqygdp4fdqp67d7j7horxchm2i6ha5oeair7s

Copy link
Contributor

datacap-bot bot commented Feb 7, 2025

Application is Granted

@dampud
Copy link

dampud commented Feb 7, 2025

An initial batch of 256 TiB has been granted. Please be aware that, in this case, DC is granted under the client contract and can only be spent on approved or whitelisted SPs. As agreed previously, we are also waiting for the data preparation step, which will explain how the original files were packed and how they connect to the sealed data.

@Fray991
Copy link

Fray991 commented Feb 11, 2025

data preparation step

Thank you so much. We are going to make the map like: https://anyidata.com/image/ next. And then i will disclose here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

5 participants