-
Notifications
You must be signed in to change notification settings - Fork 58
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Project Beacon - 11.8 PiB Data Set / 60 PiB DataCap #564
Comments
We have talked with the client of the Beacon Projecet, willing to support this one. |
Hello and good morning. This application should not go under regular LDN. LDN applications need to be stored and distributed on different continents. This one is China only and this is not the spirit of LDN. I recommend you take this to the Fil Enterprise stage and put this under the special Fil Enterprise program. |
We are very interested in the storage of private data, which will be the trend and future in the filecoin network. We'd like to be added to the list of notaries willing to support. thank you! |
Thanks for joining us @MetaWaveInfo! As the Lead Notary of the Beacon project, We welcome more notaries to join the support with us. We know very well how difficult it is for the traditional clients to store in filecoin, but as an important ecological partner of the community, We think it is necessary for ByteBase to make corresponding contributions to Mr. Juan Benet's vision of "5PiB DC sealed everyday" and the last community meeting's goal of completing "200PiB DC sealed & 175 active LDNs" by July 26. Therefore, according to PL and ByteBase's suggestion, the client first submitted a public proposal. Acctually, ByteBase have done several rounds of communication with Keren ([email protected]), Frank, Stefaan, Deep, Galen and others from PL & Filecoin Foundation before the submission, and also have helped PL&FF to arrange the specific video conferences with the client. According to @cryptowhizzard 's understanding of the LDN rules, I would like to make some clarifications: Regarding your suggestions to Fil Enterprise program, we are already following and strongly supporting the Fil+ E project with Meg. Based on the stage and target timeframe in her previous proposal, this new process maybe expected to be officially released in September to October, which maynot meet the needs of our client. Therefore, we will assist our clients with the datacap application process for the Beacon project as recommended by PL. At present, we are also helping clients find more Notaries and SPs to join, the implementation of such project requires not only datacap and technical support, but also many SPs with sufficient fil. Thanks all! |
Hello @swatchliu Don’t get me wrong here, i am all in favour of evolving the network as long as it is beneficial for everyone. I am also in favour of a separate ruleset for special locations as there are cultural differences at play. What i do think is wrong here is that there is no community oversight. If you don’t make this data retrievable (outside) China there is no way to check for us if what you store is valid and that is the foundation of the LDN. This oversight was to be build in in Fil-E. I can submit a 25000 EiB datacap request with private and encrypted data this way .. no one can check. Can you make suggestions on how to make oversight on this project, how to verifiy that the data is legit and how we can keep thrust as community that you guys are doing the right thing here? |
@cryptowhizzard for what its worth, this path + what Antarctic went through are what is E-Fil+ until we come up with something better. This is currently the way to request DataCap for these scenarios.
Reasonable and correct. My take on this (cc @galen-mcandrew since he originally mentioned this in a prior conversation) is that we typically get (1) info on the client, (2) sample of the data itself, ideally also shown post retrieval / some link to prove its the right data being stores, and (3) info on the SPs storing this data. (3) combined with the CIDs stored on the network tends to be the biggest variable in insuring the right things are happening on the network. It is rare to get (1) + (2) as well, especially in cases of enterprise clients. Getting (3) + either (1) or (2) seems to meet the burden of proof, at least given the tools we have access to today. @swatchliu given that deals cannot be distributed outside, I do think it would be good to still get some notary approval from outside the region. Perhaps by having conversations with them and/or sharing a sample of the data? Antarctic had at least 3 regions covered across the selected notaries. It could also be interesting to share some details on timelines and if all the deals will be progressing in parallel. We can also set up CID tracking to see how files are replicated across deals with the different SP IDs to prove replication? |
Thanks @dkkapur for your suggestions, we will try to get support from at least three regional notaries and more SPs. In fact, it is not easy to find SPs with so many fil pledges in reserve, and most of them stopped negotiating because of pledging issues. Therefore, we plan to proceed in phases, starting the first phase 1/12-4/12, and launching the second phase 5/12-8/12 when there are enough different SPs to join, followed by the third phase 9/12-12/12 within this year. We plan to select 5-10 different SPs per each phase, with a total of 15-30 SPs selected for all three phases, and the number of SPs will only increase. For our KYC can be carried out simultaneously, we will complete all the preparations with PL and the related notaries supporting the Beacon Project, many thanks to ByteBase for coordinating and providing services for us. Hope it will be a pleasant experience! |
The client has contacted us before and we have learned the details. We are willing to support the project. During the two years we've been involved with the Filecoin project, we've seen very little storage in education, not to mention such a large volume. We were very excited when @Beacon-Edu spoke about the project. We, of course, are well aware of the importance of verifing clients and avoid self dealing. So we have confirmed a lot of detailed information. For example, we have some knowledge about the client by searching network information and asking our friends, and we have clarified the credibility of the client. We also raised doubts about their storage plan. However, we're well aware of the particularity of educational data and the client has previously expressed that they will show data samples through meetings with PL (the proposal shows that it will indeed do so). Therefore, we currently believe that such problem has also been solved. In a word, as a new notary, we have tried our best to verify the first few LDN projects. We do think it's a good project and we are willing to support it. |
It seems to be fine, I would like to support it, but need to know more about it. @Beacon-Edu ben |
I'd like to support this initiative as a North American notary. We need to have more diversity of such LDNs other than just Antarctica Project #489 and should be considered equally. |
Can i ask who is building this dataset for you @Beacon-Edu and what tools are used for that? |
The client has contacted us and we would like to support the project. We just need some notaries from NA and Europe to take a look at the application and support it. |
We would like more justification to see how storing these large amount of data will benefit the Filecoin ecosystem. And also if possible, we would like to see proof that 11.8PiB of data is just a fraction of the archive (such as other datasets the company has and not planning to be stored on Filecoin yet). With the current description of the data, it does feels like these 11.8PiB is all the data the company has opposed to a portion of the archive. |
Thank you all for the great support, it appears that we have gotten support from 4 regions. First of all, allow me to express my welcome to @cryptowhizzard @Tom-OriginStorage for giving us thoughtful tips and sharing experiences.
@cryptowhizzard the encryption tools and the process will be handled exclusively by our technical department, pardon me if I can't be very specific. The distribution to SP is planned as above. @Tom-OriginStorage yes, you can see 11.8PiB as our total data set. Since we will have 5 replicas stored, this is exactly the reason why we are applying for 60PiB. Again, we appreciate all your comments and suggestions! We will prepare carefully with the implementation team before launching the project. |
Hmm, if a company is just dumping their entire video archive on Filecoin (encrypted + only in China), I don't really see the reason to provide them with datacap. The only beneficiaries here are the SPs, notaries (since they referred the SPs) and the company itself. Filecoin ecosystem doesn't benefit at all from supporting Project Beacon under the current LDN program. While I understand that Filecoin wants to reach its KPI for data sealed and the client has a deadline to adhere, I really still think that the company should either re-apply under Fil-Enterprise or just go along storing the data without Datacap. My reasoning is this, I can easily ask a company like Youtube or Bytedance to archive their data on Filecoin, a company like this can easily maxed out the network capacity with just a fraction of their data. And if Filecoin network were to actually actively subsidize companies to unload all of their archived data, I think there will definitely be economic repercussions. |
@Tom-OriginStorage there is a clear benefit when regions with different cultural "things" can participate and lock up collatoral instead of feeling not welcome in the ecosystem and sell everything they have. If there is a clear path for KYC and a clear path for oversight then i am in favor of having participation. What i would like to know is how the oversight is done and how the packing is done. @Beacon-Edu , you are using public LDN value from the community for your benefit. Your technical department should realise that they need to open up and give a description on how your process works on packing this amounts of data and what software tooling you use and how you distribute it so we can learn from it and do this together as a community. I.e. if you develop on Linux ( Open source like filecoin ) then you can't commercialise that and make it closed source. |
We are very willing to see and actively support more notaries can invite giant companies like Youtube and Bytedance to join FIL+ to expand the influence of filecoin. |
We should allow different voices to discuss, just like the Antarctic Project , but I still tend to support this proposal, because storing non-public dataset must be the Real Goal of Filecoin. We have to take this step forward. Believing that all community members who really love Filecoin will understand this. If nothing changes, means nothing will remain. |
I need to clarify my stance as I see most people here are misunderstanding what I mentioned. I have nothing against the company/client storing on Filecoin at all, I am against using the current LDN for the company/client. We already have Fil Enterprise coming up, if the client/company is able to wait for it to come live, it should. Unless the company/client has compelling reasons that they absolutely must store the data now which they don't seem to have. As for inviting large entities, while I do know the right people in those companies to make it happen, I don't see a point in doing it at this stage. Oppose to riding the hype using big names (and not having the infrastructure to back it), I believe it is a more natural course to take by building on the ecosystem first, and gaining traction gradually, which is in fact what I am doing (building on the ecosystem). |
This is a good experiment. Binghe is willing to participate in this program if needed. |
We've discussed this in a governance call, and generally it is getting support. The biggest concern so far has to do with more regional notaries adding support. From tracking the comments, it seems like the list of notaries interested in performing the diligence for this client is below:
Can these 8 notaries please give a reaction emoji to signal the above is correct and show support? Additionally It is unclear if there are more notaries interested in joining this proof of concept proposal. @Tom-OriginStorage & @cryptowhizzard I see some great questions above. Would you like to be added to this group of notaries to perform diligence and approve DataCap requests? Please reply here to let us know! |
Additionally, I have a somewhat modified proposal for the implementation of this proof of concept. The Fil+ community is working to reduce the operational overhead, increase efficiency, and remove barriers where possible. Additionally, our goal is to learn from the incremental successes of these different proposals. Rather than exactly replicate Project Antarctic, I think there are already some lessons that we could incorporate here. Specifically, I propose:
Advantages:
Would like to hear from @swatchliu & @Beacon-Edu about this idea, since it will change the LDN's that get created. If we move forward with this plan, then I think we should also update the parent comment in this proposal with the following details:
|
@Beacon-Edu hello, I am the owner of node f01830428. I am very interested in your project and I have sufficient fil to pledge, in addition I have nodes in both mainland and Hong Kong. |
Yes, we are willing to be part of the notaries to perform diligence and approve DataCap requests. But we definitely still need to hear back from @Beacon-Edu on their explanations for pushing as Project-Beacon opposed to waiting for Fil-Enterprise. |
@Beacon-Edu : Our team is a North American notary always willing to support opportunities like these, where possible. However, can you help me understand the composition of this private data? You claim this is "3,600 videos of lessons recordings". At 11.8PiB, that means each video is greater than 3TiB. That size is bordering unbelievable for an educational video. Here in the United States, it is very common for our universities to record educational courses. At my own alma mater, Harvard University recorded nearly every lecture. The average video was 100-250MiB, which was roughly 1 hour long at 480P and 15 frames per second. Why are your videos more than 12000x bigger? Even 4K videos at one hour do not reach the multi-TiB scale. Again, we are happy to support, but we need to first understand the need to store educational videos that are each greater than 3x10^6MiB. |
Ack. Same here. |
Hi @Kevin-PiKNiK, Thanks for the point. You might have a misunderstanding regarding the dataset breakdown. Please take a look at the sheet 2 columns 4, "Course Live Recording", Let's do easy math, 10742/7000/50/32*1024=0.98GiB Besides, as per the client is at the top level of the education group in China, each classroom equipt a 4MP network camera, H.265+ at 720P and 30f/S. It's a very reasonable size for the client. |
Thanks @galen-mcandrew for the feedback, apparently we have collected support from more than 5 notaries across 4 regions so far. Whether other notaries would like to join the support or not, we appreciate all the sharings and suggestions. Besides, there are indeed many SPs like @Aaronn85 who contacted me on slack hoping to join Beacon project, we will consider it carefully and arrange further communication. Thanks to the great advice and guidance from ByteBase during all this time, as per Galen's suggestion, we would like to use the single multisig notary entity with all signers(same as f01858410) and we strongly intend to launch 12 applications, each with a unique client address and each client address is allowed to be sent to one or several nodes under one SP. Especially, such a huge amount of pledge requirements can be very challenging for many SPs, and may involve the participation and rotation of lots of SPs. Of course we will keep the progress updated via Google sheet (https://docs.google.com/spreadsheets/d/18XDQkjlmWJ_BnQ-ygGszg8tFIC_36LHtOBQytHpRGCA/edit#gid=0) to ensure the transparency to the community. In short, we will operate these applications smoothly, and we are willing to involve more SPs, Fil borrowers, and technical supporters joining this ecosystem. |
Funny to see that
As a community member, I can only say, yes, I believe you guys |
Thanks @dkkapur for the timely tip, but this doesn't seem to be exactly the same as my previous understanding. I would like to confirm whether I need to start a detailed review and establish a trust relationship with each SP? And if the SPs that are trusted now become untrusted in the future, will there be any problem of data loss or unretrievability as you mentioned? |
@Beacon-Edu Don't worry about this. Data loss and retrieval issues are determined by the mechanism and model of the filecoin network, not by the brand and reputation of individual SPs. Otherwise, it would be no different from the centralized storage network. In other words, even SP nodes that are currently considered qualified may become unqualified in the future by temporarily going offline for various reasons, so a reliable storage solution needs to have a sufficient number of copies and a pledge mechanism. In short, currently filecoin is not perfect, but please do not lose heart in it. |
@swatchliu -
I think they might, unless we want to rethink DataCap distribution more substantively. Specifically, ensuring DataCap is distributed across multiple SPs gives us several advantages:
Based on this, if SP entities are just fronts for other SP entities, we just end up creating disproportionate returns for a subset of the community and missing out on a massive opportunity to build a stronger ecosystem and potentially increase odds of data loss in the future. Additionally, in this scenario where SPs are already restricted by region and have no public presence on chain, it is really hard to prove that they are actually separate operations running in different datacenters owned by different people.
Thank you, really appreciate it 🙏. We should also work towards ensuring that in the future, we create the space for this to be flagged during the notary election process itself, |
Hi @dkkapur Regarding the first point, I think we are discussing it from different dimensions although it seems like the same topic. I suppose you are looking to achieve greater decentralization in the physical nodes to ensure scalability and security across the network. Our opinion is that the owners behind each physical node often have a very complex composition. And that doesn't mean we can't provide sufficient decentralized physical nodes. I don't think our opinions are contradictory at all. We have explained this in more detail via email please check it out. Looking forward to hearing from you, thanks! |
@swatchliu looking forward to it. I don't see the email yet but will keep an eye out today. Thank you for engaging positively on this front and helping move the conversation forward. |
hey, if I say I want to check a piece of data. |
Thanks to @dkkapur for raising the queries about overlap in ownership and operations of individual SP entities and also for the continuous communication with us. Fundamentally, the different understanding was caused by the difference in the composition of SP in the eastern and western ecosystems. We would also like to take this opportunity to introduce to the community how SP participation in Filecoin differs in the East.
About Project Beacon During the project preparation, we saw the same enterprise encrypted data go through the application very fast and smoothly. To be honest, we did not expect such a long waiting period. We have been actively coordinating with our client to respond to queries from the very beginning. Although we have over ten notaries's supports, we were and still doing our best to answer questions for the notaries who have reached out. Thanks @jhookersyd for sharing your concerns. We have sent NDA to him but i believe he has some family matters to take care of recently. So the introduction of data samples has not been carried out yet. If you are still interested, please contact me as soon as possible. There is also a notary who clearly disagrees and said he also wants to validate the data. Our door is always open and I don't know why he hasn't contacted us. But it's not something that under our control and it's certainly not necessary for us to satisfy everyone. As is known to all that is not in LDN rules. If every project needs to be agreed upon by everyone and is subject to vote system, Filecoin's growth is bound to be extremely slow. As we mentioned earlier, revealing the identity will put the investors in a situation of extreme risk. However, we have got their understanding after constant communication and have already sent the list to Deep weeks ago. It's also because of our trust that Filecoin will protect the privacy of our investors as well as every community members. In addition, we will update the peer ID under each LDN application once the project starts. Lastly, I hope our introduction will help you understand the different market compositions of Filecoin. If there's anything that you don't understand or want to know more about either about our project or the overall market in China, please feel free to contact me through slack or email [email protected]. Thanks! |
Starbucks makes localization changes as it enters each country. Hope filecoin will understand the geographical differences as well. Looking forward to seeing the first encrypted data application in China start soon! |
The policy of cryptocurrency varies from country to country. All of the above is the current situation of filecoin in China, different markets have different ways of participation. I hope governance team can take this chance to know the different ecosystem. |
@dkkapur I have signed an NDA with @swatchliu regarding verifying the data several weeks ago. After seeing so much raised questions and Eric has patiently answered all of them, also seeing other private LDNs such as "USC Shoah" and "Victor Chang Cardiac Research" already got approved, I suggest we should fast-track this LDN and get the exposure. It will be a very good case study for future reference as well. It would be a shame to see this high profile application lose its momentum and all the effort goes to waste. |
Hi @galen-mcandrew @deep, it's been 4 months since our last meeting in May. How time flies. My baby was born last week, a very healthy baby girl. Many thanks to ByteBase for continuously and patiently coordinating with us to respond to the governance team's suggestions and community members' input. Especially thanks to @swatchliu for keeping active interaction with the community members and communicating thoroughly with storage providers during my maternity leave. We are also aware of the efforts that the governance team has been making for our project. We appreciate all your active involvement. We understand the debate over the definition of educational resources and recording methods because of different cultural backgrounds. But there's a point that I feel needs to be cleared up. As practitioners in the education industry, children are the most important to us. Meanwhile. it's noraml for certified schools and teaching institutions in China back up their courses for improving the performance of their teaching and many other reasons. In addition, a lot of community members have shared many ideas that are not my field of expertise that I may not have understood in a short period of time. I am sorry if this may be seen as non-active participation. The reason I am here today is to know if there is a clear result yet. No matter if our application complies with filecoin rules or not, our group would like to have a clear and timely response. So we don't take up too much unnecessary resources and time. Last but not least, thanks again for everyone's interest and support of our project! |
Hi folks - sharing some updates and proposed next steps. But first, @Beacon-Edu congratulations on your baby girl! Hope all is going smoothly at home. Thanks to @swatchliu and the ByteBase team for continuing to answer a lot of my questions to gain clarity into the situation as well. Here is a summary of my takeaways:
Here are my recommendations
Lots of learnings from this one that we can apply to future LDN exception applications! Looking at the Trust & Transparency WG and Notaries to take some of these learnings forward. |
I confirm my participation in this LDN. |
We are willing to do the due diligence too. @Beacon-Edu Please do reach out to us @llifezou or @derricktan23 so that we can do a final confirmation |
@dkkapur thanks for your good wishes. I hope you're doing well, too. Your summary is mostly correct, but Bytebase recently informed me that three investors have decided to pull out of our project. I will be closing application #444, #474, and #481. This will still meet our storage needs, so we would like to move on with other applications. I don't know much about peer ID but I will consult with @swatchliu on Monday to confirm and respond. Hi @Tom-OriginStorage, thanks for the confirmation. Our company has received support from 10 notaries, but we're still willing to perform due diligence for any other notaries interested in our project. But I'm still on maternity leave please reach out to Eric-Bytebase on slack. @Kevin-PiKNiK @cryptowhizzard @jhookersyd I would also appreciate your confirmation. If any of you are still interested in our project, please contact Eric. He'll introduce data samples for you and we hope all your concerns will be solved. Many thanks! |
I still support and track this LDN. |
I wil support and track this LDN too |
We will follow up on this project. |
I still support this project and will continue to pay close attention on it. |
@dkkapur, thanks for the follow up. There are three points that we would like to amend in the summary above.
We would appreciate all your confirmations. Thanks everyone! |
Reconfirming our support on this project. |
We'd like to join and support this application. |
This is an innovative endeavor, and Matrixstorage wants to join and support this project. |
I think such a long application process is cruel to each client with real storage needs. I've done my best to support. |
Sorry to reply late. Yes, @dkkapur I confirm to support this application. Also hope @Beacon-Edu and @swatchliu running this project as perfect samples on the Filecoin network. |
Great - we have this multisig already ready to go that has the above notaries for the most part on it! #576. @Beacon-Edu @swatchliu shall we proceed with next steps to get DataCap allocated from this multisig for the currently open applications? |
@UnionLabs2020 yes - I think this type of project fits very well into the improvements being proposed through the E-Fil+ WG. Please see latest here - #611 |
Thanks for the follow-up @dkkapur. I'm happy to see that our application is finally started. Many thanks to all the notaries for your engagement for a long time. Thanks all! |
We are a Chinese education group in the Fortune Global 500. We would like to propose a collaborative plan to include a total of 60 PiBs into the Filecoin network. This will represent at least 5 full replicas of a 11.8PiB data set, further details of this project are as below.
Project Description
We have a 11.8PiB project to prove the value proposition of our group's decentralized data storage. These data sets are the output of our online and offline educational materials from the past 10 years, including courses, documents, and classroom recordings videos. Due to the privacy of students and teachers, our group expects the data for this project to be kept confidential, namely encrypted data.
Our education group has been working with large data sets (PiB) for over 6 years with an interest in pursuing the Filecoin network as a solution to cost savings as well as some of our backup data issues. Starting with the 11.8PiB project makes sense for us, as it represents a small portion of our complete archive.
Data Set
The dataset contains courses from more than 100 schools, documentation managed by teams of teachers, and over 3,600 videos of lessons recordings. As previously mentioned, the data itself is not of use to anyone except our group due to the privacy of students and teachers, and the behavior of teachers in the classroom. Therefore, this data will be encrypted for our own consideration.
Website/Social Media
Due to the privacy concerns of many parties involved in our educational programs, our website will only be available to PL and the relevant notaries.
Transparency in KYC
Our group is one of the top education groups in China and we will be going through KYC process to validate our credentials, which includes face-to-face meetings with notaries, PL and our technical team to present and verify sample data.
We understand that encrypting data complicates Filecoin Plus to verify this project. However, we are committed to transparency as far as we can, such as previously mentioned KYC with notaries, PL and storage providers, as well as disclosing official email and data transfer details along with sample data. However, all parties involved in this project will be required to sign confidentiality commitment letters to protect the security of our data.
We would like to express our gratitude to the community for allowing the project to move forward.
In working closely with Protocol Labs and the Filecoin Foundation, we will follow these recommendations as a path forward to make this project a success for us and the network.
1.Submit a Proposal Issue in the Notary Governance repo for the entire project
2.Submit a 60PiB datacap applications
3. At least four notaries have agreed to support these LDNs, while we are in constant contact with additional notaries
4. We have found five SPs so far, and we are constantly looking for more and contact with technically superior SPs in China
Data Storage Plan
5 full replicas, a total of 60 PiB of Datacap
Due to government policy, our encrypted data sets can only be sealed by SP in China. Our current primary SP partners are listed in the table below:
Notaries that Support the Project [person/org/ Notary ID] Updated with additional from comments belowEdit for clarification:Notaries that have expressed an interest in performing diligence on this project and client, and are willing to be placed on the notary msig address for approving allocations after performing ongoing diligence.
The text was updated successfully, but these errors were encountered: