Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DataCap Application] NorthStar-TrendPredictor #109

Closed
1 of 2 tasks
hchen396 opened this issue Nov 23, 2024 · 5 comments
Closed
1 of 2 tasks

[DataCap Application] NorthStar-TrendPredictor #109

hchen396 opened this issue Nov 23, 2024 · 5 comments
Assignees
Labels

Comments

@hchen396
Copy link

Data Owner Name

Dirk Smith

Data Owner Country/Region

United States

Data Owner Industry

Construction, Property & Real Estate

Website

None

Social Media Handle

fire7drago

Social Media Type

Slack

What is your role related to the dataset

Dataset Owner

Total amount of DataCap being requested

1.5 PiB

Expected size of single dataset (one copy)

1.5 PiB

Number of replicas to store

4

Weekly allocation of DataCap requested

1.5 PiB

On-chain address for first allocation

f3wppwvsmcuhvpermrzjp5pifo75grduouetg5e57cafzara64etd42myeb6e73kpxrnv5j3c7lxmtr5yucaka

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

  • Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

We are performing modeling using public real estate and rental data as a way to detect trends in the housing market. This will generate massive amounts of data as we will be leveraging the power of LLMs. We need a place to store this data.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

Publicly available real estate and rental data and their transform LLM outputs.

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

No response

If you are a data preparer. What is your location (Country/Region)

United States

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

No response

If you are not preparing the data, who will prepare the data? (Provide name and business)

No response

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

No response

Please share a sample of the data

The data will be a transform version from what Redfin provides
https://www.redfin.com/news/data-center/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • I confirm

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Weekly

For how long do you plan to keep this dataset stored on Filecoin

More than 3 years

In which geographies do you plan on making storage deals

North America

How will you be distributing your data to storage providers

I don't know yet

How did you find your storage providers

Slack, Partners

If you answered "Others" in the previous question, what is the tool or platform you used

No response

Please list the provider IDs and location of the storage providers you will be working with.

f0135934, US
f0456374, US
f01152332, US
f0861593, US
f01431043, US

How do you plan to make deals to your storage providers

I don't know yet

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

@martplo
Copy link
Contributor

martplo commented Nov 25, 2024

@hchen396
Thank you for submitting your application. We’ve reviewed the details and would like to request some clarifications and additional information to proceed:

  1. SP Locations:
    The locations of the Storage Providers (SPs) you provided do not align with the actual state. Please note that the use of VPNs is prohibited. Could you confirm and update the SP locations accordingly?

  2. Retrieval Rate:
    The SPs you listed do not appear to have a retrieval rate, and it's essential that data stored in the open pathway is accessible and downloadable by anyone. Please ensure compliance with this requirement.

  3. Data Sample:
    The sample data provided is insufficient. Data intended for the open pathway must be described, downloadable, and fully accessible. If you wish to prepare similar data, please do so, ensure its accessibility, and share the complete dataset with us. Also, please refer to the rules page of our allocator: wiki

  4. Data Size Clarification:
    In your application, you mentioned that a single dataset is 1.5 PiB. However, this size corresponds to the DataCap (DC) requested and would only allow for storing one replica, not the four replicas you indicated. Could you clarify the total amount of data and the intended number of replicas?

  5. Documentation of Data Preparation and Transformation:
    Clear and comprehensive documentation detailing how the data is prepared and transformed from the original source dataset to the format stored on Filecoin must be provided. This should include steps, tools, and processes used in the transformation to ensure transparency and reproducibility.

  6. Data Integrity and Traceability:
    There must be a mechanism to identify which part of the original dataset a downloaded piece represents and to confirm its validity as part of the dataset. For example, this could be achieved by providing a log file that maps offsets or file segments to the corresponding stored deals. This ensures traceability and data integrity throughout the retrieval process.

Please address these points and let us know if you have any questions.

@martplo martplo self-assigned this Nov 25, 2024
Copy link
Contributor

datacap-bot bot commented Nov 26, 2024

Application is waiting for allocator review

@dampud
Copy link

dampud commented Feb 5, 2025

@hchen396 Please let us know if you intend to continue the application.

@dampud
Copy link

dampud commented Feb 19, 2025

closing thread due to inactivity.

@dampud dampud closed this as completed Feb 19, 2025
Copy link
Contributor

datacap-bot bot commented Feb 19, 2025

The application has been declined.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants