Skip to content

solid/research-topics

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

2 Commits
 
 

Repository files navigation

What is this repo?

A set of research topics that affect Solid in some way. Designed to be a point of insipiration and a starting point for discussion on many of those topics.

In an ideal case, each topic will take on a life of its own and have a repository dedicated to work on it.

Status

To get us started this is a dump of topics in a single file - the goal is to make this more structured in time.

Topics

Data governance and Solid
There is a need to make Solid easier for Industry and Governments globally to adopt within their Enterprise Data Governance (EDG) and broader Enterprise Information Governance (EIG) strategies. The research challenge within academia and industry to add automated EDG support to Solid. In academia, Harshvardhan J. Pandit at Dublin City University and Beatriz Esteves at SolidLab have expertise with ODRL and DPV which may be key components of a semi-automated data governance layer for Solid. In the Dataspace Community IDSA uses ODRL as a key component of their standard to establish agreements for sharing datasets - and thus would be a valuable partner. EIG, a superset of EDG, must be carefully considered. With infrastructure providers for Solid acting as Data Trusts, and an expectation that government and enterprises migrate to read/write personal data from Pods, there is a need to proactively provide support for these organisations to manage the implications for their EIG strategies. In particular, we need to create a generic, Solid compatible, EIG strategy. Josh Cornejo recently presented some work around this topic to the Solid Practitioners channel in https://spectra.video/w/21t7aswEMXkUCjzHsB8CLa.

There is lots of technical and non-technical research to be done around this. To get a flavour of such research see this proposal.

Platforms on top of Solid
There are many platforms that can be built on top of Solid, that may be research projects of their own right, or be built as part of other projects.

  • Research and data analytics platforms on top of Solid such as a platform for researchers to define the parameters of a dataset they need to do, e.g. population analysis, or to virtualise datasets to train ML algorithms. I would that that of interest to some of the research team at the Open Data Institute would be to have a platform that generates ML training sets annotated Croissant metadata.
  • Consent collection/management platform similar to ConsentKit.

Privacy Enhancing Technologies & Solid:

  • Using zero-knowledge proofs for automated data-minimisation when sharing. Zero knowledge proofs can enable selective disclosure and derived disclosure of signed data. Selective disclosure is sharing a subset of signed attributes, derived disclosure is proving that a property can be derived from a set of signed facts (e.g. I can prove that I am over 21 to a bar using a government signed statement about my DOB, without disclosing my DOB to the bar). Whilst both are used heavily in modern digital wallet solutions, usually as part of Verifiable Credentials - however, there is currently lots of custom code that needs to be written to perform and verify the derivation. It would be useful to Solid and a range of other efforts to be able to generate derived data + proof on demand when data is query (e.g. deriving age + proof from DOB + signature when querying for foaf:age in a database).
  • End to End Encryption (E2EE): Concerns are often raised about the need to have a Solid Resource Server take responsibility for managing all of the work around, e.g., disk encryption for resource storage. Here, trust is required on two fronts: guarding against the resource server does not have poor (or malicious) security practises that results in data being leaked to other malicious actors, and trusting that the resource server does not tamper with the data served back to applications. E2EE is a method for resolving this whereby the sender encrypts data before transmitting it to servers, and then ensures that only those actors with permission to read the data can do so. Important questions here are:
  • Current PDS approaches have limited support for ensuring privacy when computations combine data spread across users. Secure Multi-Party Computation (MPC) is a well-known subfield of cryptography, enabling multiple autonomous parties to collaboratively compute. Whilst Oxford has already done some work investigating SMPC with Solid, there is work to be done to see what affordances / adjacent specs (for instance for server <-> server exchange) to allow users to query for the results of an MPC computation, and for the servers to execute that computation. There is a follow up set of research on governance and policies to understand how to, e.g. express the fact that you’re willing for your salary to be included as part of an average for an economic study, but not for that salary to be directly disclosed to anyone.

Web Agents

  • Building specifications for (Semantic) Web agents that operate over Pods
  • Resolution mechanisms for agents operation on different logical assumptions
  • Collaborate on emergent research on LLMs looking up external data as part of their evaluation process (this is not RAG, it is architecturally building data lookup into the LLM)

Data management

  • CRDTs (conflict-free replicated datatypes) support local-first and collaborative application by providing resolution mechanisms . A W3C community group has been established to work on the development of RDF specific CRDT algorithms.

Private Data and LLMs

  • There is a range of technical and research work to be done around knowledge graphs, private data and LLMs. On the more technical front there is work to do such as implementing the Model Context Protocol for Solid. Also note that perplexity acquired Carbon to provider similar functionality to LLMs.

    On the more research side, there is work to e.g. see if we should be adding vector database type access to Solid, and whether, e.g., we can be assigning entities points in vector space and doing similarity searches on entities. What applications can be built on top of this?

Tooling to assist data management

  • A tool to warn me of potential outcomes - especially harmful outcomes - of changing data, or permissions on data. For instance, warn me that I might my Visa to live in the UK if I change my address to an Australian address for more than 3 months - because the immigrations office has access to that address.

Developer Experience

  • There is extensive work on tooling for RDF, with varying degrees of abstractions and interaction patterns. Just a few of such abstractions can be found at https://rdfjs.dev/; yet we still aren’t at the point where way can say “Hey, so you’re a front end developer you want to build a Solid application in Domain X, here is how you can get up and running in only 2 hours; and not have to worry about data modelling or management, that’s all handled”. We need to get to that point for Solid in order to get a critical mass of developer building Solid applications.

Design Patterns

  • Part of the reason that Solid lacks good educational materials for many topics, is a result of a lack of generally accepted design patterns to solve many challenges. This includes:
    • Standards for provenance, and provenance validation. There are many scenarios in which services and applications need to establish a basis on which they can trust - such as having the data signed by a trusted organisation. Currently there are no well-accepted standards for how to sign and verify such data in Solid.
    • Best practices for data access requests. Access requests have been implemented in multiple ways across different projects and servers. There is a need to critically evaluate the different design patterns and propose a single flow as a use-case.

Social Science Research

  • What do people want the future of their technologies and online experiences look like?
  • Downstream effects of amendments to data: if he were to update that address in Solid, and DVLA had access - what would the impact of that be? (Note that this complements the development of the tooling to assist data management).

Understanding the context in which we are building Solid

  • How does Solid relate to the concept of a data mesh. Is there feature parity? What are the gaps if any?

What data can I trust?

  • Just as with the Semantic Web “anyone can say anything about anything”, so just as with the Web, you cannot take every piece of information in everyones Pod to be true. How can applications use provenance (computational trust) + assumptions / guarantees of what entities are trustworthy to establish what data to “believe” for a given application. To put it another way, can I execute a SPARQL query, which evaluates over data deemed “trustworthy” based on the provenance presented, and trust assumptions given to the query engine.

Ontology Creation

  • Automated ontology creation with LLMs: There is research taking place to Map the Mind of Large Language Models - with this, there is an opportunity to automatically create ontologies for new or niche domains. This is done by generating an approximate logical representation of the conceptual worldview encoded in LLMs.

Data Generation

  • Automated data creation with LLMs: Many companies are now having commercial success in generating Knowledge Graphs from unstructured, or semi-structured documents such as PDFs. There are a range of research projects to make systems that convert existing data sources into structured RDF to use with applications.

Related to / extending existing work in academia

  • Working on the development of Malleable Software on top of Solid. Malleable Software was the topic of Geoffrey Litt’s PhD at CSAIL - I’m not sure if there is anyone currently working on this topic in CSAIL, it looks like Prof. Daniel Jackson might be with his Wildcard project and Prof. David Karger is interested in such topics with Mavo. There has long been an item on the Solid Roadmap to integrate Solid with Malvo as a research item in the Solid Ecosystem roadmap.
  • Collaboration with TrustNet to capture the accuracy of data in Pods, (e.g. in my Pod I make statements about the accuracy of other people's data in their Pod). Is quite a core challenge that needs to be solved. Can be applied to content accuracy for decentralized Social media on top of Solid, filtering out poor data when doing federated queries to collect research data, etc.
  • Adding Solid storage as a feature to the MIT App Inventor Project.
  • Collaboration with Data Garbling project in order to work on the E2EE work items described below, in particular on topics such as using FHE to support enable SPARQL queries over encrypted data. There may also be some interest related to this coming from those working on Splintr, who focus on ensuring that those querying data don’t accidentally disclose information based on the content of their query.
  • Adding features of squadbox, and more generally, implementing completely decentralised moderation for media and social media applications in Solid by allowing users to make statements about how they rate / moderate / endorse / dispute content that other users have put on their Pod, and use this to feed my algorithmic view of the world based on who I want to listen to as moderators and who I don’t.

About

A set of research topics for Solid

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published