Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

2nd Community Review of Bitengine-Reeta Allocator #221

Closed
Bitengine-reeta opened this issue Nov 7, 2024 · 14 comments
Closed

2nd Community Review of Bitengine-Reeta Allocator #221

Bitengine-reeta opened this issue Nov 7, 2024 · 14 comments
Assignees
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance

Comments

@Bitengine-reeta
Copy link

Latest report:https://compliance.allocator.tech/report/f03014781/1730765561/report.md

Currently, a total of 5 clients have applied to us, of which 1 is a public dataset and 4 are enterprise datasets (2 have been approved and the other 2 are still under communication).
image

The average Spark search success rate is 47%, which is a medium level. The main reason is that some customers use the DDO model. In the future, we plan to reduce or stop cooperation with customers who use the DDO order model. For customers with low Spark search rates, our technical team is also helping customers improve the Spark search success rate. We look forward to our continued improvement in the future.

@Kevin-FF-USA Kevin-FF-USA self-assigned this Nov 7, 2024
@Kevin-FF-USA Kevin-FF-USA added the Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. label Nov 7, 2024
@filecoin-watchdog
Copy link
Collaborator

@Bitengine-reeta
Allocator Application
Compliance Report
1st Review
1st Review score: 5PiB

5 PiB granted to existing clients:

Client Name DC status
NOAA 2PiB Existing
BeijingMipai Culture Media 2PiB Existing
JuQing Media 0.5PiB New
sxqiaonao 0.2PiB New

NOAA

  • SPs list updated in this round:
    f03055005 HK Chris
    f01315096 HK Chris
    f03091739 USA Lanxin
    f03156617 Australia Tongkun
    f03148356 Japan Linyun
    f01106668 HK Chris
    f03144037 South Korea Cary
    f0870558 HK Chris
    f03151456 HTY Shenzhen
    f03055018 Chris HK
    f01889668 Mike US
    f03151449 HTY Shenzhen
    f01518369 Mike US
    f03179570 YunSD Singapore
    f03055029 Chris HK
    f03188440 Fei Vancouver, Canada
    f03190614 Laibin Dubai
    f03190616 YM Japan
    f03066836 、
    f03081958

  • SPs list used for deals:
    f03055005
    f0870558
    f01315096
    f01106668
    f03055029
    f03055018
    f03091739
    f03091738
    f03074589
    f03074586
    f03074587
    f03074583
    f02199999
    f03156617
    f03148356
    f03144037
    f01518369
    f01889668

  • 6SPs don’t match provided list

  • 8 out of 18 SPs have retrieval rate of 0%.

BeijingMipai Culture Media

  • In this round, the client updated 22 SPs (top 10 match the previous allocation). Most SPs out of the following list were used for deals (except f03151456 and f0317957):
    f01106668 Hong Kong, Hong Kong, HK
    f0870558 Hong Kong, Hong Kong, HK
    f03091739 Dallas, Texas, US
    f03091738 Dallas, Texas, US
    f03074589 Hong Kong, Hong Kong, HK
    f03074587 Seoul, Seoul, KR
    f02199999 Singapore, Singapore, SG
    f03144037 Paripark, Seoul, KR
    f03156617 Sydney, New South Wales, AU
    f03148356 Ōi, Saitama, JP
    f01518369 US
    f01889668 US
    f03055018 Hong Kong
    f03055029 Hong Kong
    f02013352 Hong Kong
    f03151449 CN,Shenzhen HTY
    f03151456 CN,Shenzhen HTY
    f0317957 Singapore, Singapore YunSD
    f03178077 JP,Tokyo Gong
    f03178144 JP,Tokyo Gong
    f03179572 US Flycloud
    f03214937 US Flycloud

  • 20 new SPs were used to make deals(total 30). List of added SPs:
    f03179570
    f03178077
    f03179572
    f03178144
    f03214937
    f03229933
    f03229932
    f03055029
    f03055018
    f03055005
    f01315096
    f03151449
    f03188440
    f03173127
    f02013352
    f01928097
    f03190614
    f03190616
    f01518369
    f01889668

  • 10 newly added SPs don’t match updated list.

  • 10 SPs have retrieval rate of 0%, 8 SPs have retrieval of >75%, and the rest have retrieval below the acceptable level.

  • The client declared 6 replicas, while there are already 9.

JuQing Media

  • The client said they applied for DC in behalf of their clients. Could that be explained more?
  • The client said:

“Initial storage will be on Filecoin. Future data storage plans: We will be launching in the next two weeks.”

What does it mean? Initial storage on filecoin and what after that?

  • Why did the allocator grant 500TiB in the first allocation? What's the allocation schedule here?
  • the client sealed around 20% of DC so far, and the report shows all of data was sealed with one SP and retrieval rate of it is at 34%

sxqiaonao

  • SPs list provided in the form:
    SP ID:f03091738 Location:US
    SP ID:f02199999 Location:Singapore
    SP ID:f03074583 Location:Japan
    SP ID:f03000070 Location:China
  • This data was already stored on the filecoin: [DataCap Application] Qiaonao Enterprise Management - Course Data filecoin-plus-large-datasets#975 What is the reason for store it again?
  • KYC was not completed from my perspective. The client said someone else has passes it previously. That’s not the best explanation.
  • the client sealed around 25% of DC so far, and the report shows all of data was sealed with one SP and retrieval rate of it is at 18%
  • the only SP used for the deals doesn’t match the provided SPs list (f03179572).
  • Why did the allocator grant 200TiB in the first allocation? What's the allocation schedule here?

@filecoin-watchdog filecoin-watchdog added Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. Refresh Applications received from existing Allocators for a refresh of DataCap allowance and removed Awaiting Community/Watchdog Comment DataCap Refresh requests awaiting a public verification of the metrics outlined in Allocator App. labels Nov 21, 2024
@Bitengine-reeta
Copy link
Author

Bitengine-reeta commented Nov 27, 2024

Once again, thank you to @filecoin-watchdog for your hard work. When it comes to the allocator model, we have generally operated based on our own understanding, often lacking a unified set of standards. However, the guidance provided by @filecoin-watchdog has given us a new direction for reviewing our operations.

Here’s my attempt to summarize the areas for improvement mentioned by @filecoin-watchdog:

1. A small number of SPs do not match the actual sealing processes, but it's important to note that most SPs are well-matched.
We acknowledge that in practical operations, SPs may need to be adjusted due to various issues such as staking coins, technical preparation, and data readiness. However, it is essential to communicate and confirm these adjustments with the client before initiating new sealing processes and to update the progress on GitHub simultaneously.
Moving forward, we will strengthen supervision in this area. We have also been proactive in keeping clients informed about the latest SP situations and updating the information on GitHub.
image
image
image
image

2. A few SPs have a Spark retrieval success rate of 0%, but it's important to note that the average Spark retrieval success rate is 47%, which is considered moderate. Some well-performing SPs have a Spark retrieval success rate of over 90%.

At this stage, the entire community is aware that Spark and DDO modes are incompatible. However, this is not an excuse for a 0% Spark retrieval rate. Therefore, we tend to approve applications that are not in DDO mode, which has been reflected in recent sealing activities. Additionally, we recommend that clients cease cooperation with SPs using the DDO order mode.

Due to recent upgrades to the mainnet version, some SPs have experienced a decline in their Spark retrieval success rates. We believe this situation will improve soon.

We have also communicated offline with SPs that do not use the DDO order mode. They informed us that technical teams are working on the issue, and improvements are expected by early December. We have attached the current status for comparison to track ongoing improvements.

Current Status of Some SPs:
image

3. filplus-bookkeeping/Bitengine-Reeta-Filplus#5 "declared 6 replicas on the client side, but there are actually 9." The client has provided an explanation, and we believe that in the early stages of Filecoin data storage, this reason is credible.
image

@Bitengine-reeta
Copy link
Author

Replies to questions about https://github.com/filplus-bookkeeping/Bitengine-Reeta-Filplus/issues/15:

  1. Regarding“ The client said they applied for DC on behalf of their clients. Could that be explained more? “ For detailed information, please refer to the link: [Allocator Application] <CineVault>< CineVault Allocator> PR #95 #96. For new allocator operators, the official suggestion is to "Work with any existing Allocator to create an application on behalf of your client."
    image

  2. Regarding the following content.
    image
    image
    The client has responded to my KYC questions, indicating that the data is being stored on the Filecoin network for the first time. The sealing process is planned to start within the next two weeks, and indeed, it is currently in the sealing phase.

  3. Regarding "Why did the allocator grant 500TiB in the first allocation? What's the allocation schedule here?", the allocator application form at the time clearly stated the DC allocation rules: First allocation: the lesser of 5% of the total DataCap requested or 50% of the weekly allocation rate. The client requested a total of 8PiB, with a Weekly allocation of DataCap requested at 1000TiB.

5% of total DataCap requested: 8 * 1024 * 5% = 409.6TiB
50% of weekly allocation rate: 1000 * 50% = 500TiB
Since 410TiB and 500TiB are not significantly different, we issued 500TiB in the first round.

However, in the case of the first round issuance of 200TiB in filplus-bookkeeping/Bitengine-Reeta-Filplus#7, there was no issue. The client requested a total of 4.5PiB, with a Weekly allocation of DataCap requested at 400TiB.

5% of total DataCap requested: 4.5 * 1024 * 5% = 230.4TiB
50% of weekly allocation rate: 400 * 50% = 200TiB

Please see the link for details:filecoin-project/notary-governance#1015
image

  1. Regarding “The client has sealed around 20% of the DC so far, and the report shows that all of the data was sealed with one SP, with a retrieval rate of 34%. ” The client has provided a response, and we look forward to improvements in future reports.
    image

@Bitengine-reeta
Copy link
Author

Replies to questions about https: filplus-bookkeeping/Bitengine-Reeta-Filplus#7

Regarding the issues related to sxqiaonao, we have also confirmed them. Given that this is only the first round of 200 TiB, we look forward to the progress in subsequent rounds.
image

@Bitengine-reeta
Copy link
Author

Finally, we would like to emphasize that as of now: our client has collaborated with 31 different SPs, including f03055029, f03055005, f0870558, f03055018, f01315096, f01106668, f01518369, f01889668, f03074589, f03173127, f03074587, f03144037, f02013352, f03091739, f03074583, f03074586, f03179570, f03179572, f03156617, f03190616, f02199999, f03148356, f03190614, f03188440, f03229933, f03178077, f03178144, f03214937, f03151449, f03229932, and f01928097. This collaboration is highly encouraging and commendable, as it fully demonstrates the vision of distributed storage in Filecoin. It is understandable that there are differences in Spark retrieval rates among these SPs due to variations in resources and technical capabilities. We look forward to continued improvement from both current and new SPs. We hope to receive more support from the governance team to enable us to continue our efforts.

@filecoin-watchdog
Copy link
Collaborator

filecoin-watchdog commented Nov 27, 2024

@Bitengine-reeta

  1. I disagree with the statement that the SP list in issue 5 contains only a few mismatched IDs. The client failed to inform the allocator about the addition of as many as 10 new SPs. It is the client’s responsibility to ensure that the SP list remains up-to-date.

  2. In analyzing various allocators and clients, I’ve observed a recurring trend. While this is a general observation, it does appear to apply here to some extent. It’s worth reflecting on why clients who are aware of the reasons for reapplying often omit this information in the form, leaving it to the allocator or reviewer to identify data duplication during the review process.

@Bitengine-reeta
Copy link
Author

Bitengine-reeta commented Nov 28, 2024

Thank you for your ongoing feedback. We greatly value your opinions and agree with your points.

As I initially responded, in the operation of the allocator, we often operate and review based on our own understanding, lacking a unified standard. However, your review comments have provided us with a clear direction for our reviews.

  1. As you pointed out, ensuring that the SP list is up-to-date is the responsibility of the client. This applies not only to the client but also to us as allocator operators. If we, as allocators, do not enforce strict requirements, clients may become negligent. As I mentioned in my first response, "Moving forward, we will strengthen supervision in this area. We have also been proactive in keeping clients informed about the latest SP situations and updating the information on GitHub." Currently, at our request, the client has updated the latest SP list.

We still want to give filplus-bookkeeping/Bitengine-Reeta-Filplus#5 another chance. If there is no improvement subsequently, we will stop the Datacap approval and close the issue.

  1. We recognize the recurring trend where clients often omit important information when reapplying. To address this, we will:
    Add prompts in the KYC questions to explicitly require clients to provide complete and up-to-date information, ensuring they understand the importance of reapplying and the required information.

For example, add a confirmation question: "Has this dataset been stored on Filecoin before? If so, why are you choosing to store it again?"

Use the review tool https://allocator.tech/ for verification.
image

@Bitengine-reeta
Copy link
Author

Additionally, I would like to understand how you identify duplicate data applications beyond "client commitment" and using the https://allocator.tech/ tool for queries?

This would be very helpful for our future review processes.

@filecoin-watchdog
Copy link
Collaborator

@Bitengine-reeta

For example, add a confirmation question: "Has this dataset been stored on Filecoin before? If so, why are you choosing to store it again?"

This question is already included in the form, but clients tend to omit it. The best solution is to consistently verify clients' statements against factual data.

Additionally, I would like to understand how you identify duplicate data applications beyond "client commitment" and using the https://allocator.tech/ tool for queries?

To continue this topic and at the same time answer the above question, let me show you how I'm doing it:

  1. Through the already mentioned allocator.tech, by searching for dataset owner:
image
  1. By searching in this archive repo: filecoin-plus-large-datasets for dataset owner, storage link, dataset website etc. It requires careful analysis but gives a room to ask additional questions to clients.
Screenshot 2024-10-09 at 11 37 08

@Bitengine-reeta
Copy link
Author

Bitengine-reeta commented Nov 28, 2024

Thank you for sharing the review methods; we can also follow these guidelines in the future.

In summary, during the review process going forward, we will pay special attention to the following areas:

  1. Gradually improving Spark retrieval rates;
  2. Monitoring the number of dataset replicas;
  3. Requiring customers to maintain an up-to-date list of SPs and ensure proper oversight;
  4. Focusing on reviewing the importance of datasets to Filecoin network storage, to avoid storing redundant and meaningless data.

We look forward to our progress in the future, and we will be watching closely. We hope to obtain 10 PiB of DataCap support to promote the development of the Filecoin network. Thank you again!

@filecoin-watchdog filecoin-watchdog added Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards and removed Awaiting Response from Allocator If there is a question that was raised in the issue that requires comment before moving forward. labels Nov 29, 2024
@Bitengine-reeta
Copy link
Author

Bitengine-reeta commented Dec 5, 2024

Once again, thank you to @filecoin-watchdog for your hard work. When it comes to the allocator model, we have generally operated based on our own understanding, often lacking a unified set of standards. However, the guidance provided by @filecoin-watchdog has given us a new direction for reviewing our operations.

Here’s my attempt to summarize the areas for improvement mentioned by @filecoin-watchdog:

1. A small number of SPs do not match the actual sealing processes, but it's important to note that most SPs are well-matched. We acknowledge that in practical operations, SPs may need to be adjusted due to various issues such as staking coins, technical preparation, and data readiness. However, it is essential to communicate and confirm these adjustments with the client before initiating new sealing processes and to update the progress on GitHub simultaneously. Moving forward, we will strengthen supervision in this area. We have also been proactive in keeping clients informed about the latest SP situations and updating the information on GitHub. image image image image

2. A few SPs have a Spark retrieval success rate of 0%, but it's important to note that the average Spark retrieval success rate is 47%, which is considered moderate. Some well-performing SPs have a Spark retrieval success rate of over 90%.

At this stage, the entire community is aware that Spark and DDO modes are incompatible. However, this is not an excuse for a 0% Spark retrieval rate. Therefore, we tend to approve applications that are not in DDO mode, which has been reflected in recent sealing activities. Additionally, we recommend that clients cease cooperation with SPs using the DDO order mode.

Due to recent upgrades to the mainnet version, some SPs have experienced a decline in their Spark retrieval success rates. We believe this situation will improve soon.

We have also communicated offline with SPs that do not use the DDO order mode. They informed us that technical teams are working on the issue, and improvements are expected by early December. We have attached the current status for comparison to track ongoing improvements.

Current Status of Some SPs: image

3. filplus-bookkeeping/Bitengine-Reeta-Filplus#5 "declared 6 replicas on the client side, but there are actually 9." The client has provided an explanation, and we believe that in the early stages of Filecoin data storage, this reason is credible. image

Continued updates are being made to highlight that our team is also actively assisting SPs in resolving the Spark retrieval success rate issues. Compared to last week, there is a noticeable improvement in the Spark retrieval success rate. We will continue to monitor and support this progress.

image

@Kevin-FF-USA
Copy link
Collaborator

Thanks for all the additional context @Bitengine-reeta,

As more of these refresh requests are processed, greater adherence to the diligence plans in each pathways application will need to be maintained in order to justify receiving additional DC to distribute. Thanks for providing the additional context on these distributions, makes it helpful to understand the clients and how you are working to support them.

Next step is this application will receive a final review from Galen next week.
This review will take into account the retrieval rates, Bookkeeping documentation, SPs, As well as past issues that were identified in previous refreshes. If you have any questions please a comment here in issue, otherwise you will see guidance next week.

Warmly,
-Kevin

@Bitengine-reeta
Copy link
Author

Thanks for @Kevin-FF-USA feedback. Currently, we have no issues and look forward to further guidance from the official team.

@galen-mcandrew
Copy link
Collaborator

Lots of great engagement and diligence above, including seeking ways to improve processes on both sides. Trying to summarize some of the areas for focus:

  • investigating and accurately updating SP lists
  • increasing retrieval rates
  • consistent allocation tranches, starting with lower scale/trust and increasing as clients onboard accurately
  • investigating extraneous and redundant datasets across the Filecoin ecosystem, and encouraging data preparers to justify

We are requesting 10PiB of DataCap for this pathway.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Diligence Audit in Process Governance team is reviewing the DataCap distributions and verifying the deals were within standards Refresh Applications received from existing Allocators for a refresh of DataCap allowance
Projects
None yet
Development

No branches or pull requests

4 participants