-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Jupyter Notebooks to illustrate the inference-contract family. #31
Jupyter Notebooks to illustrate the inference-contract family. #31
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
After a few hurdles, i managed to build, run & execute all notebooks. A few comments though ..
@@ -150,6 +150,8 @@ try ${PDO_HOME}/contracts/inference/scripts/ss_start.sh -c -o ${PDO_HOME}/logs - | |||
--config-dir ${PDO_HOME}/etc/contracts \ | |||
--identity guardian_sservice | |||
|
|||
sleep 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i guess this sleep is to make sure that storage service is running before guardian is started? How robust is that 3 seconds and in case for whatever reason the sleep happened to be too short, how easy would it be to diagnose the issue? Couldn't we do some similar synchronization we do as when we wait for CCF or alike?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
in ccf we wait for a file to be created. in pdo we curl a url found in the configuration file. i don't think either approach would work here. there is no guarantee that the db file (the obvious example) was removed between runs and the configuration file has several urls in it that make the grep approach "brittle" at best. maybe a better way for this service AND for the ones in pdo is to explicitly write a URL into a file (like what we do with the pid) and then use that to test for a service listening at the interface.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@g2flyer curious about your thoughts on hyperledger-labs/private-data-objects#476. This would be a clean way to address your concerns as well.
--config-dir ${PDO_HOME}/etc/contracts \ | ||
--identity guardian_sservice | ||
|
||
sleep 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as for sleep in script_test.sh
# token_metadata : additional information about the token | ||
# count : the number of tokens to mint for the asset | ||
|
||
# Note that the notebook assumes that there is a key file for the identity of the form: `${keys}/${identity}_private.pem`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe mention that User1 to User10 exist but more could be created via /documents/key_manager.ipynb
notebook if
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is an improvement to this coming (see key-manager.py in #32 ). I don't think we should rely on the user keys... they should be removed from the PDO client install by default (fine for tests, but not for a real client).
# Create a token notebook for each of the minted tokens. | ||
|
||
# %% | ||
# %%skip True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I know these (undocumented!) skips are inherited but i still found them quite annoying ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
actually it is documented. exchange/pdo/contracts/jupyter.py (this is CODE so the question is whether code documents it). and also in docs/notebooks/README.md
# ``` | ||
# | ||
|
||
# %% tags=["parameters"] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment as with token-issuer, duplication of (slightly re-ordered) fields in two code blocks, one of them auto-generated is highly confusing. Seems to me relying just on the auto-generated one should be completely sufficient ...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can't do it. the parameters field is necessary for papermill to inject the parameters into the file.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've verified that the parameters block must have a superset of the variables that are inherited.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Several things I would like to see addressed.
# %% tags=["parameters"] | ||
identity = input('Identity of the token issuer: ') | ||
token_class = input('Name of the class of tokens to issue: ') | ||
service_host = input('Service host [localhost]: ') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
suggest you pick up the actual defaults... service_host = input('...') or 'localhost'
token_metadata = 'created by {}'.format(token_owner) | ||
count = 1 | ||
service_host = 'localhost' | ||
instance_identifier = '' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there is no description for
instance_identifyer
; More interestingly, though, the "magic" second "parameter" code block (see below) actually redefines this variable anyway? So drop it here?
this is a property of the papermill templating modules. these are "placeholders".
# service host by running the following: | ||
|
||
# %% | ||
# %%skip True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is skip default True here? Seems to contract text from above? In any case above we should say when to skip or not and that one correspondingly has to change the code block? Given that we have creation conditional on existence of toml file and so re-eval block is safe, why even bother with %%skip at alll?
this only need to be run once per "session". note it will be changing completely soon.
|
||
context = pc_jupyter.ml_jupyter.initialize_token_context(state, bindings, context_file, token_path, **context_bindings) | ||
pc_jupyter.pbuilder.Context.SaveContextFile(state, context_file, prefix=token_path) | ||
print('context initialized') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this particular message is not an error and the context comes back successfully. that being said... we are trusting to exceptions (the python way) to be thrown when something break. more or less.
# ### Initialize the PDO Environment | ||
# | ||
# Initialize the PDO environment. This assumes that a functional PDO configuration | ||
# is in place and that the PDO virtual environment has been activated. In particular, | ||
# ensure that the groups file and eservice database have been configured correctly. | ||
This can be done most easily by running the following in a shell: | ||
# | ||
# `$PDO_HOME/bin/pdo-create-service-groups.psh --service_host <service_host>` | ||
# |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
state, token_context, image=image_name, search_path=[path]) | ||
|
||
# %% [markdown] | ||
# ### New User Performs Operations on the Asset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is pretty much a TEST not part of the factory or template. This file is supposed to be a (primitive) interface to the token, not a test file.
--config-dir ${PDO_HOME}/etc/contracts \ | ||
--identity guardian_sservice | ||
|
||
sleep 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is the purpose of this file?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this file is necessary only if we do not use docker for the guardian (to be started before starting the jupyter server). Question is do we even need to support such a mode ? My suggestion is that we always expect the guardian to be deployed via the docker container. Please comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, have removed this file completely. As part of documentation, have given commands using ss_start and gs_start as to how to deploy guardian directly, if needed for bare-metal.
…t family use-case. The Notebooks' structure, installation, execution patterns are same as those used by the Noteboks provided for the Exchange contract family. Please note that the inference guardian service must be started prior to executing the notebooks. Please see inference-contract/README.md for instructions. Some of the additional features offered by this PR taking advantage of recent commits to the contracts repo: 1. It is possible to use the inference contract jupyter notebook via docker (using features from PR 23) 2. When using docker, the inference guardian will run in a separate container (using features from PR 27) 3. The notebooks are constructed using jupytext python modules (using features from PR 29) Signed-off-by: Prakash Narayana Moorthy <[email protected]>
3e1fcc3
to
676abb5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes seem to address the easy addressable concerns and others seem currently general limitations (and matter of taste, e.g., on the skip ;-)
Couldn't test it though as my ICX server has some boot problems (while trying to get DCAP working on it) and the other test machines i have cannot run the code: After a bit of annoying debugging, turns out that the tensorflow libraries we use do not work if you don't have AVX (as for these gemini lake) and the error message is not visible in docker lock. Two suggestions:
- make before running anything involving tensorflow we might to do something like
$(lscpu | grep -q avx) || die "AVX not avilable"
- more generally for docker, might be useful to redirect error messages not only in a file but also to /dev/tty? That said, i found now also a found a reasonably convenient way to investigate failures after docker-compose stopped (you will though have to run
docker-compose
without--abort-on-container-exit
as currently used in Makefile!): https://stackoverflow.com/questions/65953634/how-do-i-inspect-the-stopped-docker-container-files
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A couple cosmetic changes in the additional commits. I need to test this with the changes to the papermill parameters.
service_host = 'localhost' | ||
instance_identifier = '' | ||
# rest of the parameters such as token_owner, token_class, service_host, and instance_identifier will get auto filled from factory via papermill | ||
# note that papermill places the parameters cell after first paramters cell in this notebook | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
have you verified that this works? papermill generally requires that the template contain complete list of parameters. it isn't pretty, but if you're setting through the templating mechanism it has to be there.
Verified that papermill generates error/warning messages for missing variables. Just add the parameters you expect with reasonable defaults. You aren't testing for missing variables anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, tested, paperill does not need repeating all parameters. per papermill documentation, papermill populates the instance notebook with parameters after the first cell in the target notebook that has parameters. (that's why we currently see new parameters appearing after the default set). there is no need for the default set to have anything that papaermill will add. Yes, agreed, warnings will be generated while running the factory notebook.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it actually doesn't work. an error is generated. lets just add the variables that are necessary. to be clear... you can think about the papermill parameters as being the parameters specified to a script. its the "argparse" for notebooks. while @g2flyer doesn't like the duplication, it is a standard way of extending notebooks. your parameter block in the template should contain EVERY variable that you currently (or may reasonably in the future) specify when creating the contracts. for example, you don't currently have the ability to specify the metadata. in the future, that should absolutely be a parameter that could be specified.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, added back all parameters that are being parametrized by papermill.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See message about missing papermill parameters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly have nitpicky formatting comments, though there are a couple of places that adding links/references to other documentation would be really helpful.
@@ -51,9 +51,10 @@ The Digital Assets contract family implements a basic digital asset for bitmap i | |||
### Inference | |||
|
|||
|
|||
The Inference contract family provides contracts for creating a confidentiality preserving policy-wrapper around the usage of a machine learning (ML) model. At its core, the implementation uses a token contract to specify and enforce policies to be followed while using the ML model for inferecing operations. | |||
The Inference contract family provides contracts for creating a confidentiality | |||
preserving policy-wrapper around the usage of a machine learning (ML) model. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is a policy-wrapper?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you, have clarified about both policy-wrapper and token object : "The policy-wrapper specifies and checks the
policies to be followed while using the ML model for inferencing
operations, and is implemented as a PDO contract that we refer to as the Token Object."
docs/notebooks/index.md
Outdated
The Inference contract family provides contracts for creating a confidentiality preserving policy-wrapper around the usage of a machine learning (ML) model. At its core, the implementation uses a token contract to specify and enforce policies to be followed while using the ML model for inferecing operations. | ||
The Inference contract family provides contracts for creating a confidentiality | ||
preserving policy-wrapper around the usage of a machine learning (ML) model. | ||
At its core, the implementation uses a token contract to specify and enforce |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We shouldn't assume the reader knows what a "token contract" is. It would be helpful to add a link to "token contract" to documentation where the reader can find more information on the underlying implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, please see comment above for fix.
docs/notebooks/index.md
Outdated
The Inference contract family provides contracts for creating a confidentiality | ||
preserving policy-wrapper around the usage of a machine learning (ML) model. | ||
At its core, the implementation uses a token contract to specify and enforce | ||
policies to be followed while using the ML model for inferecing operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
policies to be followed while using the ML model for inferecing operations. | |
policies to be followed while using the ML model for inferencing operations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done.
inference-contract/README.md
Outdated
even though the Jupyter server is deployed on bare-metal. Please consult the | ||
`https://github.com/hyperledger-labs/private-data-objects/tree/main` repo for deploypment | ||
options for pdo ledger and services. Regarding deployment of the model guardian, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bit cleaner.
even though the Jupyter server is deployed on bare-metal. Please consult the | |
`https://github.com/hyperledger-labs/private-data-objects/tree/main` repo for deploypment | |
options for pdo ledger and services. Regarding deployment of the model guardian, | |
even though the Jupyter server is deployed on bare-metal. Please consult the [PDO repo](https://github.com/hyperledger-labs/private-data-objects/tree/main) for deployment | |
options for pdo ledger and services. Regarding deployment of the model guardian, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done
inference-contract/README.md
Outdated
We provide [Jupyter Notebooks](./docs/notebooks/README.md) that can be executed | ||
in an interactive manner to illustrate the functionality of the inference | ||
contract family. We support Jupyter notebook deployment via both docker containers | ||
as well as bare-metal. For docker, please invoke the `make test_jupyter` command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Para break.
as well as bare-metal. For docker, please invoke the `make test_jupyter` command | |
as well as bare-metal. | |
For docker, please invoke the `make test_jupyter` command |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done
Object and the Asset Guardian Service. | ||
|
||
The asset guardian service here is implemented as a web service | ||
containing a frontend and backend portions.The backend of the guardian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
containing a frontend and backend portions.The backend of the guardian | |
containing a frontend and backend portions. The backend of the guardian |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done
inference-contract/README.md
Outdated
contract family. We support Jupyter notebook deployment via both docker containers | ||
as well as bare-metal. For docker, please invoke the `make test_jupyter` command | ||
from within the docker folder contained within this repo. This command deploys the | ||
pdo ledger, pdo services, guardian frontend as well as the openvino containers in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please ensure consistent capitalization here an in other docs.
pdo ledger, pdo services, guardian frontend as well as the openvino containers in | |
pdo ledger, pdo services, guardian frontend as well as the openVINO containers in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done
functions: 1. Processes the capability package generated by the token | ||
object and generates/invokes the API for the backend. The capability | ||
package may be thought of the asset-usage approval generated by the | ||
token object smart contract. 2. Implements any input preprocessing and | ||
output postprocessing (collectively referred to as model scoring) as | ||
determined necessary by the original asset owner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will ensure these are rendered as numbered bullets.
functions: 1. Processes the capability package generated by the token | |
object and generates/invokes the API for the backend. The capability | |
package may be thought of the asset-usage approval generated by the | |
token object smart contract. 2. Implements any input preprocessing and | |
output postprocessing (collectively referred to as model scoring) as | |
determined necessary by the original asset owner. | |
functions: | |
1. Processes the capability package generated by the token | |
object and generates/invokes the API for the backend. The capability | |
package may be thought of the asset-usage approval generated by the | |
token object smart contract. | |
2. Implements any input preprocessing and | |
output postprocessing (collectively referred to as model scoring) as | |
determined necessary by the original asset owner. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done
script, and thus ensuring tight binding between the token object | ||
contract and the model usage. | ||
|
||
We note that while the Token Object smart contract is executed within |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As above, it may be good to add a link to "token object smart contract" here for reference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, hope the clarification of token object addresses this.;
|
||
We note that while the Token Object smart contract is executed within | ||
Intel SGX enclaves (via PDO), the asset guardian service in this | ||
impelementation is not protected by TEEs. A prospective asset-user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impelementation is not protected by TEEs. A prospective asset-user | |
implementation is not protected by TEEs. A prospective asset-user |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks, done.
Signed-off-by: Prakash Narayana Moorthy <[email protected]>
d72723f
to
5e367ba
Compare
The Notebooks' structure, installation, execution patterns are same as those used by the Noteboks provided for the Exchange contract family. Please note that the inference guardian service must be started prior to executing the notebooks. Please see inference-contract/README.md for instructions.
Some of the additional features offered by this PR taking advantage of recent commits to the contracts repo: