Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

getting error while training, .solverstate #1347

Open
6 of 11 tasks
mostafa8026 opened this issue Sep 26, 2021 · 23 comments
Open
6 of 11 tasks

getting error while training, .solverstate #1347

mostafa8026 opened this issue Sep 26, 2021 · 23 comments

Comments

@mostafa8026
Copy link

(A Markdown syntax reminder is available here: https://guides.github.com/features/mastering-markdown/)

Before creating a new issue, please make sure that:

If Ok, please give as many details as possible to help us solve the problem more efficiently.

Configuration

  • Version of DeepDetect:
    • Locally compiled on:
      • Ubuntu 18.04 LTS
      • Other:
    • Docker CPU
    • Docker GPU
    • Amazon AMI
  • Commit (shown by the server when starting):

Your question / the problem you're facing:

When I want to train a simple classification (dog_cat example), I face the following error:

resuming a model requires a .solverstate file in model repository

Error message (if any) / steps to reproduce the problem:

  • list of API calls:
    as shown in the sample.

  • Server log output:

4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.201] [dogs_cats] [info] selected solver: SGD
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.201] [dogs_cats] [info] solver flavor : rectified
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.201] [dogs_cats] [info] detected network type is classification
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.203] [dogs_cats] [error] resuming a model requires a .solverstate file in model repository
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.219] [dogs_cats] [error] training status call failed: Dynamic exception type: dd::MLLibBadParamException
4c01f52c6171_cpu_deepdetect_1 | std::exception::what: resuming a model requires a .solverstate file in model repository
4c01f52c6171_cpu_deepdetect_1 |
4c01f52c6171_cpu_deepdetect_1 | [2021-09-26 13:15:41.220] [api] [error] {"code":400,"msg":"BadRequest","dd_code":1006,"dd_msg":"Service Bad Request Error: resuming a model requires a .solverstate file in model repository"}
@beniz
Copy link
Collaborator

beniz commented Sep 26, 2021

Hi, make sure resume is set to False

@mostafa8026
Copy link
Author

Thanks. I didn't know that. maybe it is a good idea to mention that in the docs.

@mostafa8026
Copy link
Author

mostafa8026 commented Sep 27, 2021

@beniz previous error was fixed, but when I want to use the trained service I get the following error when:

request:

curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "dogs_cats",
 "parameters": {
  "input": {},
  "output": {
   "confidence_threshold": 0.3,
   "bbox": true
  },
  "mllib": {
   "gpu": true
  }
 },
 "data": [
  "/opt/platform/data/10021/01.jpg"
 ]
}'

response:

{
 "status": {
  "code": 400,
  "msg": "BadRequest",
  "dd_code": 1006,
  "dd_msg": "Service Bad Request Error: no deploy file in /opt/platform/models/private/dogs_cats for initializing the net"
 }
}

How could I create a deploy file?

@mostafa8026 mostafa8026 reopened this Sep 27, 2021
@fantes
Copy link
Contributor

fantes commented Sep 27, 2021

Hi
in order to do inference (predict), you need to have the network definition (in cas e of caffe inference, the deploy file) and the network weights into the repository .
The deploy file is created by deepdetect when you use a template definition and learn your own net, or comes with a pre-train network definition.
What are you trying to do exactly ?

@mostafa8026
Copy link
Author

@fantes last problem has been resolved, and now I get another problem, the main api engine got killed whenever a large number of pics are added for indexing. I'm implementing similarity search by this link

request:

curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
       "service":"simsearch",
       "parameters":{
         "input":{ "height": 224, "width": 224 },
         "output":{
  "index":true,
  "index_gpu":true,
  "index_gpuid":0,
  "index_type":"IVF20,SQ8",
  "train_samples":500,
  "ondisk":true,
  "nprobe":10  },
         "mllib":{ "extract_layer":"pool5/7x7_s1"  }
       },
       "data":["/opt/platform/data/large-number-of-files/"]
     }'

response

<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.21.1</center>
</body>
</html>

logs

4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.098] [torchlib] [info] Ignoring source layer conv4_3_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.099] [torchlib] [info] Ignoring source layer conv4_4_1x1_increase/bn_conv4_4_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.099] [torchlib] [info] Ignoring source layer conv4_4_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.100] [torchlib] [info] Ignoring source layer conv4_5_1x1_increase/bn_conv4_5_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.100] [torchlib] [info] Ignoring source layer conv4_5_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.101] [torchlib] [info] Ignoring source layer conv4_6_1x1_increase/bn_conv4_6_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.101] [torchlib] [info] Ignoring source layer conv4_6_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.103] [torchlib] [info] Ignoring source layer conv5_1_1x1_increase/bn_conv5_1_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.104] [torchlib] [info] Ignoring source layer conv5_1_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.109] [torchlib] [info] Ignoring source layer conv5_2_1x1_increase/bn_conv5_2_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.109] [torchlib] [info] Ignoring source layer conv5_2_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.112] [torchlib] [info] Ignoring source layer conv5_3_1x1_increase/bn_conv5_3_1x1_increase/bn_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.112] [torchlib] [info] Ignoring source layer conv5_3_prob_reshape
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.114] [torchlib] [info] Ignoring source layer classifier_classifier_0_split
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.197] [simsearch] [info] Net total flops=3860541312 / total params=28070976
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.197] [simsearch] [info] detected network type is classification
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:09.208] [simsearch] [info] imginputfileconn: list subdirs size=0
4c01f52c6171_cpu_deepdetect_1 | tcmalloc: large alloc 1355153408 bytes == 0x5609fb00e000 @
4c01f52c6171_cpu_deepdetect_1 | [2021-09-27 18:21:12.194] [api] [info] HTTP/1.1 "GET /info" <n/a> 200 0ms
platform_ui_1    | 172.29.229.1 - - [27/Sep/2021:18:21:12 +0000] "GET /api/deepdetect/info? HTTP/1.1" 200 433 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
4c01f52c6171_cpu_deepdetect_1 | tcmalloc: large alloc 1355153408 bytes == 0x560a4bc6e000 @
4c01f52c6171_cpu_deepdetect_1 | Killed

@beniz
Copy link
Collaborator

beniz commented Sep 27, 2021

Hi, you need to iterate your image list and send batches. Your machine does not have enough RAM to hold them all at once.

@mostafa8026
Copy link
Author

Thanks, I can index and build them, but after searching, the pictures are not shown, because they are addressed through /opt/platform/data, how can I fix the problem. where is the correct picture addresses?
take a look

this image

@mostafa8026
Copy link
Author

image

they must addressed from /data/ but they are addressed from /opt/platform/data

image

@beniz
Copy link
Collaborator

beniz commented Sep 28, 2021

/opt/platform/data should automatically link to your $DD_PLATFORM/data directory on the host.

In your case, maybe make sure to link your /data directory the $DD_PLATFORM/data.

@mostafa8026
Copy link
Author

another question:
how can I persist service data, because whenever the service got killed, all the services are vanished (I mean added services that use in the predicate)

@mostafa8026
Copy link
Author

after restarting the dd engine, I get this error after adding new service:

[error] service creation mllib bad param: using template while model prototxt and network weights exist, remove 'template' from 'mllib' or remove prototxt files instead

I create the service with the following command:


curl -X PUT 'http://172.29.229.69:1913/api/deepdetect/services/simsearch' -d '{
       "mllib":"caffe",
       "description":"similarity search service",
       "type":"unsupervised",
       "parameters":{
         "input":{
           "connector":"image",
           "height": 224,
           "width": 224
         },
    "output": {
      "store_config": true
    },
         "mllib":{
           "nclasses":1000,
           "template": "se_resnet_50"
         }
       },
       "model":{
         "repository":"/opt/platform/models/private/simsearch/",
         "templates":"/opt/deepdetect/build/templates/caffe/"
       }
     }'

@mostafa8026
Copy link
Author

I have to delete this two files every time. Is there any way to avoid that?

$ rm models/private/simsearch/se_resnet_50.prototxt
$ rm models/private/simsearch/se_resnet_50_solver.prototxt

@beniz
Copy link
Collaborator

beniz commented Sep 28, 2021

after restarting the dd engine, I get this error after adding new service:

[error] service creation mllib bad param: using template while model prototxt and network weights exist, remove 'template' from 'mllib' or remove prototxt files instead

The error says it all, remove the template argument from your API call, since the model already exists. This error prevents sending calls with neural net templates that do not match the existing model.

@mostafa8026
Copy link
Author

I get a new error:

request:

curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {},
  "output": {
   "confidence_threshold": 0.3,
   "search_nn": 10,
   "search": true
  },
  "mllib": {
   "gpu": true,
   "extract_layer": "pool5/7x7_s1"
  }
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'
deepdetect_1     | [2021-09-28 12:50:11.236] [simsearch] [error] unsupervised output needs mllib.extract_layer param
deepdetect_1     | [2021-09-28 12:50:11.237] [api] [info] HTTP/1.1 "POST /predict" simsearch 200 1042ms
platform_ui_1    | 172.29.229.1 - - [28/Sep/2021:12:50:11 +0000] "POST /api/deepdetect/predict HTTP/1.1" 200 125 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1     | [2021-09-28 12:50:15.377] [api] [info] HTTP/1.1 "GET /info" <n/a> 200 0ms
platform_ui_1    | 172.29.229.1 - - [28/Sep/2021:12:50:15 +0000] "GET /api/deepdetect/info? HTTP/1.1" 200 433 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1     | [2021-09-28 12:50:17.751] [api] [info] HTTP/1.1 "POST /predict" simsearch 200 664ms
platform_ui_1    | 172.29.229.1 - - [28/Sep/2021:12:50:17 +0000] "POST /api/deepdetect/predict HTTP/1.1" 200 11545 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1     | open existing index db
deepdetect_1     | [2021-09-28 12:50:19.829] [torchlib] [info] Opened lmdb /opt/platform/models/private/simsearch//names.bin
platform_ui_1    | 2021/09/28 12:50:19 [error] 25#25: *1449 upstream prematurely closed connection while reading response header from upstream, client: 172.29.229.1, server: , request: "POST /api/deepdetect/predict HTTP/1.1", upstream: "http://172.23.0.3:8080/predict", host: "172.29.229.69:1913", referrer: "http://172.29.229.69:1913/"
platform_ui_1    | 172.29.229.1 - - [28/Sep/2021:12:50:19 +0000] "POST /api/deepdetect/predict HTTP/1.1" 502 559 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1     | Segmentation fault

and killed the process

@beniz
Copy link
Collaborator

beniz commented Sep 28, 2021

Hard to tell... what sizes are the db and names.bin files ? Also, you've seem to have gotten the system to work earlier as shown on the UI ?

@mostafa8026
Copy link
Author

It works ok for the first time. but if I restart the docker, that error is shown after calling search api.

names.bin:

$ ls models/private/simsearch/names.bin/ -lh
total 80K
72K Sep 28 16:13 data.mdb
8.0K Sep 28 16:34 lock.mdb

indexes:

$ ls models/private/simsearch/index* -lh
179K Sep 28 16:13 models/private/simsearch/index.faiss
4.0M Sep 28 16:12 models/private/simsearch/index_mmap.faiss

@beniz
Copy link
Collaborator

beniz commented Sep 28, 2021

By first time you mean it works right after indexing has completed, but not afterwards ?

@mostafa8026
Copy link
Author

mostafa8026 commented Sep 28, 2021

yes, when I restart the docker and create the service again, this error is occured.
after the indexing and building has been finished, everything is ok, I can search as much as I can, but after restarting the docker and creating the service again, It can't read the db correctly and segmentation fault is occured.

@beniz
Copy link
Collaborator

beniz commented Sep 28, 2021

Could you post the exact list of API calls, from index creation to prediction after restart, typically for a single image please ? At minima, this would help reproduce without docker.

@mostafa8026
Copy link
Author

create the service:

$ mkdir models/private/simsearch
$ cp SE-ResNet-50.caffemodel models/private/simsearch/
$ curl -X PUT 'http://172.29.229.69:1913/api/deepdetect/services/simsearch' -d '{
  "mllib": "caffe",
  "description": "similarity search service",
  "type": "unsupervised",
  "parameters": {
    "input": {
      "connector": "image",
      "width": 224,
      "height": 224
    },
    "mllib": {
      "nclasses":1000,
      "template": "se_resnet_50"
    },
    "output": {
      "store_config": true
    }
  },
  "model": {
    "templates": "../templates/caffe/",
    "repository": "/opt/platform/models/private/simsearch/",
    "create_repository": true
  }
}'
{"status":{"code":201,"msg":"Created"}}

indexing:

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
        "confidence_threshold": 0.3,
        "index": true,
        "index_type": "IVF20,SQ8",
        "train_samples": 500,
        "ondisk": true,
        "nprobe": 10
    },
  "mllib": {"extract_layer": "pool5/7x7_s1"}
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"simsearch","time":523.0},"body":{"predictions":[{"indexed":true,"last":true,"vals":[0.0,1.1430059671401978,0.0,0.0,0.0,0.0,0.13838012516498567,0.02941223978996277,16.24222183227539,0.0,0.0,3.5232527256011965,0.0,0.0,0.25434765219688418,0.0660916417837143,0.0,0.0,0.0,0.044068124145269397,0.0,0.38675710558891299,0.0,0.0706....

build:

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
        "index": false,
        "build_index": true
    },
  "mllib": {"extract_layer": "pool5/7x7_s1"}
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"simsearch","time":534.0},"body":{"predictions":[{"last":true,"vals":[0.0,1.1430059671401978,0.0,0.0,0.0,0.0,0.13838012516498567,0.02941223978996277,16.24222183227539,0.0,0.0,3.5232527256011965,0.0,0.0,0.25434765219688418,0.0660916417837143,0.0,0.0,0.0,0.044068124145269397,0.0,0.38675710558891299,0.0,0.0706....

search (also it doesn't a result)

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
   "search_nn": 10,
   "search": true
  },
  "mllib": {
   "extract_layer": "pool5/7x7_s1"
  }
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"simsearch","time":584.0},"body":{"predictions":[{"last":true,"vals":[0.0,1.1430059671401978,0.0,0.0,0.0,0.0,0.13838012516498567,0.02941223978996277,16.24222183227539,0.0,0.0,3.5232527256011965,0.0,0.0,0.25434765219688418,0.0660916417837143,0.0,0.0,0.0,0.044068124145269397,0.0,0.38675710558891299,0.0,0.0706...

restart the docker:

$ docker-compose restart
Restarting cpu_platform_ui_1   ... done
Restarting cpu_deepdetect_1    ... done
Restarting cpu_filebrowser_1   ... done
Restarting cpu_jupyter_1       ... done
Restarting cpu_platform_data_1 ... done
Restarting cpu_dozzle_1        ... done

create the service again (without template):

$ curl -X PUT 'http://172.29.229.69:1913/api/deepdetect/services/simsearch' -d '{
  "mllib": "caffe",
  "description": "similarity search service",
  "type": "unsupervised",
  "parameters": {
    "input": {
      "connector": "image",
      "width": 224,
      "height": 224
    },
    "mllib": {
      "nclasses":1000
    },
    "output": {
      "store_config": true
    }
  },
  "model": {
    "templates": "../templates/caffe/",
    "repository": "/opt/platform/models/private/simsearch/",
    "create_repository": true
  }
}'

{"status":{"code":201,"msg":"Created"}}

in this case that I train one picture, search is ok, and return nothing, becase

could not train, maybe not enough data to train with selected index type. index likely to be emptyError in void faiss::Clustering::train_encoded(faiss::Clustering::idx_t, const uint8_t*, const faiss::Index*, faiss::Index&, const float*) at /opt/deepdetect/build/faiss/src/faiss/faiss/Clustering.cpp:276: Error: 'nx >= k' failed: Number of training points (0) should be at least as large as number of clusters (20)

here is the request:

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
   "search_nn": 10,
   "search": true
  },
  "mllib": {
   "extract_layer": "pool5/7x7_s1"
  }
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

{"status":{"code":200,"msg":"OK"},"head":{"method":"/predict","service":"simsearch","time":876.0},"body":{"predictions":[{"last":true,"vals":[0.0,1.1430059671401978,0.0,0.0,0.0,0.0,0.13838012516498567,0.02941223978996277,16.24222183227539,0.0,0.0,3.5232527256011965,0.0,0.0,0.25434765219688418,0.0660916417837143,0.0,0.0,0.0,0.044068124145269397,0.0,0.38675710558891299,0.0,0.07067309319972992,0.0,0.007104216609150171,0.0,0.5068966746330261,0.0,0.5041558146476746,0.0,0.0,0.0,0.0,0.199878320097923.....

but if I do just the same steps as described above (in batches of 5), for about 400 picture, I get this error:

$ curl -X POST 'http://172.29.229.69:1913/api/deepdetect/predict' -d '{
 "service": "simsearch",
 "parameters": {
  "input": {"height": 224, "width": 224},
  "output": {
   "search_nn": 10,
   "search": true
  },
  "mllib": {
   "extract_layer": "pool5/7x7_s1"
  }
 },
 "data": [
  "/opt/platform/data/0dir/09591_thumb - Copy.jpg"
 ]
}'

<html>
<head><title>502 Bad Gateway</title></head>
<body>
<center><h1>502 Bad Gateway</h1></center>
<hr><center>nginx/1.21.1</center>
</body>
</html>

deepdetect_1 | [2021-09-28 13:50:03.005] [torchlib] [info] Ignoring source layer classifier_classifier_0_split
deepdetect_1 | [2021-09-28 13:50:03.087] [simsearch] [info] Net total flops=3860541312 / total params=28070976
deepdetect_1 | [2021-09-28 13:50:03.087] [simsearch] [info] detected network type is classification
platform_ui_1 | 172.29.229.69 - - [28/Sep/2021:13:50:03 +0000] "PUT /api/deepdetect/services/simsearch HTTP/1.1" 201 39 "-" "curl/7.71.1"
deepdetect_1 | [2021-09-28 13:50:03.088] [api] [info] HTTP/1.1 "PUT /services/simsearch" <n/a> 201 524ms
deepdetect_1 | [2021-09-28 13:50:07.362] [api] [info] HTTP/1.1 "GET /info" <n/a> 200 0ms
platform_ui_1 | 172.29.229.1 - - [28/Sep/2021:13:50:07 +0000] "GET /api/deepdetect/info? HTTP/1.1" 200 433 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1 | [2021-09-28 13:50:07.437] [api] [info] HTTP/1.1 "GET /services/simsearch" <n/a> 200 0ms
platform_ui_1 | 172.29.229.1 - - [28/Sep/2021:13:50:07 +0000] "GET /api/deepdetect/services/simsearch HTTP/1.1" 200 420 "http://172.29.229.69:1913/" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/93.0.4577.82 Safari/537.36"
deepdetect_1 | open existing index db
deepdetect_1 | [2021-09-28 13:50:14.307] [torchlib] [info] Opened lmdb /opt/platform/models/private/simsearch//names.bin
platform_ui_1 | 2021/09/28 13:50:14 [error] 24#24: *9 upstream prematurely closed connection while reading response header from upstream, client: 172.29.229.69, server: , request: "POST /api/deepdetect/predict HTTP/1.1", upstream: "http://172.23.0.3:8080/predict", host: "172.29.229.69:1913"
platform_ui_1 | 172.29.229.69 - - [28/Sep/2021:13:50:14 +0000] "POST /api/deepdetect/predict HTTP/1.1" 502 157 "-" "curl/7.71.1"
deepdetect_1 | Segmentation fault

here is my python code for indexing:

# Download dd_client.py from:
# https://github.com/jolibrain/deepdetect/blob/master/clients/python/dd_client.py
import glob
import sys

from dd_client import DD

host = '172.29.229.69'
port = 1913
path = '/api/deepdetect'
dd = DD(host, port, 0, path=path)
dd.set_return_format(dd.RETURN_PYTHON)

parameters_input = {"height": 224, "width": 224}
parameters_mllib = {"extract_layer": "pool5/7x7_s1"}
images = glob.glob(sys.argv[1])
service_name = 'simsearch'
data = []
all_data = []


def classify_build(data_in):
    print(data_in)
    parameters_output = {
        "confidence_threshold": 0.3,
        "index": True,
        "index_type": "IVF20,SQ8",
        "train_samples": 500,
        "ondisk": True,
        "nprobe": 10
    }
    dd.post_predict(service_name, data_in, parameters_input, parameters_mllib, parameters_output)
    parameters_output = {
        "index": False,
        "build_index": True
    }
    dd.post_predict(service_name, data_in, parameters_input, parameters_mllib, parameters_output)


try:
    for image_index, image in enumerate(images):
        print("parsing", image_index, "/", len(images), image, "\n")
        image = image.replace('../img-dataset/images', '/data')
        data.append(image)
        all_data.append(image)
        if image_index % 5 == 0:
            classify_build(data)
            data = []
    if len(data) > 0:
        classify_build(data)
        data = []
except Exception as e:
    print('could not process image index', image_index, 'image', image)
    print(str(e))

@beniz
Copy link
Collaborator

beniz commented Sep 28, 2021

OK, thanks, I'll investigate. However, have you tried the default settings, i.e. not modifying index_type for instance ?

@beniz
Copy link
Collaborator

beniz commented Sep 28, 2021

Also, I believe you may want to try to build the index once after your 400 images have passed. Aside from the current issue, for better results.

@mostafa8026
Copy link
Author

OK, thanks, I'll investigate. However, have you tried the default settings, i.e. not modifying index_type for instance ?

It was ok with default parameters:

{
    "index": true,
    "ondisk": true,
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants