I have a problem/question regarding the init procedure

asked 2019-06-19 21:57:50 +0200

dan.punga
1 ●1 ●1 ●2

updated 2019-06-27 16:21:59 +0200

Hello all,

I'm trying to deploy the Orion ContextBroker on a Openshift/OKD(Kubernetes) cluster and I'm having a problem with its deployment regarding the initialization time.

I'm using the 2.2.0 release tag for the ContextBroker with mongoDB 3.2.0. The startup args for Orion are:

-ipv4 -reqPoolSize 100 -notificationMode threadpool:10000:50 -statNotifQueue -statCounters -statSemWait -statTiming -relogAlarms -httpTimeout 100000

The initialization appears to be somewhat inconsistent regarding the time required for the app to become available.
Sometimes the deployment runs "smoothly" and sometimes the app fails to start (in a reasonable time interval).

The initialization seems to freeze at a certain point which appears in the logs as [1]. The actual service isn't started (lsof -i -n -P doesn't return any processes using port 1026).
I use standard health-checks which, basically, do a curl localhost:1026/version. I've tried modifying the timeouts and also the delay time from which the probe fires. Not even with 360sec (6 minutes) delay do I reach consistent deployments!

I've tested with different resource allocation and this doesn't seem to be the problem.

Also by checking the logs I see some "odd" intervals in the initialization procedure. I have some excerpts at the end of the message, [2], where I can see the last steps of the init procedure being executed(or, at least, logged) at precisely 1 minute intervals.

The problem is that once the readiness health check fails, the deployment fails as well. Orion seems to use a lot of RAM which does not get released even if the notification load disappears, from what I saw. The recommendation would be to restart the process, which, in my case, can be automatically handled if I set an upper memory limit for the container. So the initialization process comes again in question...also for the auto-scaling mechanism, etc.

Any hints towards how to solve this problem would be much appreciated!

Thanks, Dan

[1] - last 10-12 lines from ContextBroker's log; DEBUG -t 0-255

time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[802]:getWriteConcern | msg=getWriteConcern()

time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[807]:getWriteConcern | msg=Database Operation Successful (getWriteConcern)

time=Wednesday 19 Jun 16:50:35 2019.407Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=mongoConnectionPool.cpp[240]:mongoConnect | msg=Active DB Write Concern mode: 1

time=Wednesday 19 Jun 16:50:35 2019.431Z | lvl=DEBUG | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=connectionOperations.cpp[691]:runCollectionCommand | msg=runCommand() in 'admin' collection: '{ buildinfo: 1 }'

time=Wednesday 19 Jun 16:50:35 ...

(more)

edit retag flag offensive close merge delete

Comments

Hi, It would be interesting if you could provide us the .yaml files you used to start the PODs. Thank you.

jicg ( 2019-06-27 10:27:30 +0200 )edit

add a comment

2 answers

Sort by » oldest newest most voted

answered 2019-06-27 11:34:44 +0200

jicg
171 ●2 ●3

The way I tryed to reproduce the problem was creating in Openshift a MongoDB and Orion Service:

oc new-app mongo:3.2 --name mongo1085

And In order to create the Orion, I created the Yaml file before modifiying it:

oc new-app fiware/orion:2.2.0 --name orion1085 -o yaml > orion1085.yaml

So, once I had orion1085.yaml file, I modified it this way:

.....
  spec:
    containers:
    - image: fiware/orion:2.2.0
      name: orion1085
      args:
        - -dbhost
        - mongo1085
        - -ipv4
        - -reqPoolSize
        - "100"
        - -notificationMode
        - threadpool:10000:50
        - -statNotifQueue
        - -statCounters
        - -statSemWait
        - -statTiming
        - -relogAlarms
        - -httpTimeout
        - "100000"
      ports:
      - containerPort: 1026
        protocol: TCP
      resources: {}
.....

The service started without problems. Once the service was up, I tested its IP:

$ oc get service orion1085
NAME        CLUSTER-IP       EXTERNAL-IP   PORT(S)    AGE
orion1085   172.30.214.211   <none>        1026/TCP   12m

And I queried the version:

$ curl 172.30.214.211:1026/version
{
"orion" : {
  "version" : "2.2.0",
  "uptime" : "0 d, 0 h, 14 m, 2 s",
  "git_hash" : "5a46a70de9e0b809cce1a1b7295027eea0aa757f",
  "compile_time" : "Mon Feb 25 15:15:27 UTC 2019",
  "compiled_by" : "root",
  "compiled_in" : "37fdc92c3e97",
  "release_date" : "Mon Feb 25 15:15:27 UTC 2019",
  "doc" : "https://fiware-orion.rtfd.io/en/2.2.0/"
}
}

For sure It would be useful to know the way you deploy it.

edit flag offensive delete link

add a comment

answered 2019-06-27 16:17:53 +0200

dan.punga
1 ●1 ●1 ●2

updated 2019-06-27 16:24:04 +0200

Hi and thanks for the answer and tests!

I'm using the "answer" section simply because the reply doesn't fit a comment.

As I saw this question/problem wasn't getting moderated and made public I went ahead and posted it on the github repo for the project as well: https://github.com/telefonicaid/fiwar...

My configuration is quite similar. I use a custom build where I simply add a wrapper script to start the main process. This serves as a more convenient way to configure different startup arguments for the main process. The script:

\!/bin/bash

/usr/bin/contextBroker -fg -logLevel $ORION_LOG_LEVEL -dbhost $ORION_MONGO_HOST -db $ORION_MONGO_DB \
    -dbuser $ORION_MONGO_USER -dbpwd $ORION_MONGO_PASS $ORION_EXTRA_ARGS

So my yaml definition for the Pod/DeploymentConfiguration is:

              command:
                - /opt/bin/runContextBroker.sh
              image: ""
              imagePullPolicy: IfNotPresent
              env:
                - name: TZ
                  value: ${ENV_TZ}
                - name: ORION_LOG_LEVEL
                  valueFrom:
                    configMapKeyRef:
                      name: ${ORION_CONF}
                      key: ORION_LOG_LEVEL
                - name: ORION_MONGO_HOST
                  valueFrom:
                    configMapKeyRef:
                      name: ${ORION_CONF}
                      key: ORION_MONGO_HOST
                - name: ORION_MONGO_DB
                  valueFrom:
                    secretKeyRef:
                      key: database-name
                      name: ${MONGODB_ENV}
                - name: ORION_MONGO_USER
                  valueFrom:
                    configMapKeyRef:
                      name: ${ORION_CONF}
                      key: ORION_MONGO_USER
                - name: ORION_MONGO_PASS
                  valueFrom:
                    secretKeyRef:
                      key: database-admin-password
                      name: ${MONGODB_ENV}
                - name: ORION_EXTRA_ARGS
                  valueFrom:
                    configMapKeyRef:
                      name: ${ORION_CONF}
                      key: ORION_EXTRA_ARGS
              ports:
                - containerPort: 1026
                  protocol: TCP

The excerpt above uses some parameters that reference a ConfigMap - ORION_CONF and a Secret - MONGODB_ENV(OKD objects; given your reply I imagine you are familiar with them), but the deployed Pod ends up starting with pretty much the same arguments. The ORION_EXTRA_ARGS env var holds all the startup arguments that I've mentioned in the initial mail and that, I see, you use in your test. I don't use an exec in my startup script so that PID 1 of the container would actually go to the contextBroker process. Will have to try this (ss noted in the last github comment by Fermin Marquez).
Again, as mentioned in the github discussion I have switched to 3.6 version of Mongo, but this hasn't led to an improvement in my case.

The problem I have is that sometimes, just like in your test example, the serrvice starts without any problems/right away and sometimes the initialization process gets stuck, the process doesn't begin listening on the 1026 port so it practically doesn't begin to run. What I found is that there's precisely the same step where the init process arrives during failed inits:

time=Monday 24 Jun 09:32:17 2019.632Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=MongoGlobal.cpp[227]:mongoInit | msg=Connected to mongo at mongodb.civ-fiware.svc.cluster.local:admin as user 'admin'

If using the INFO log level, this would be the same 42nd line inside the Pod/container log.

When the Pod starts without issues, the log gets past this point and shows:

time=Monday 24 Jun 09:32:17 2019.632Z | lvl=INFO | corr=N/A | trans=N/A | from=N/A | srv=N/A | subsrv=N/A | comp=Orion | op=MongoGlobal.cpp[227]:mongoInit | msg=Connected to ...

(more)

edit flag offensive delete link

add a comment

I have a problem/question regarding the init procedure

Comments

2 answers

Question Tools

Stats

Related questions