Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add feature: Prevention of data loss at IOT agent if n/w failed #397

Open
chandradeep11 opened this issue Apr 15, 2019 · 14 comments
Open

Add feature: Prevention of data loss at IOT agent if n/w failed #397

chandradeep11 opened this issue Apr 15, 2019 · 14 comments

Comments

@chandradeep11
Copy link
Contributor

If network between IOT agent and Orion (Context Broker) get failed then currently the device measurment sent from Device to IOT agent will be lost.
There should be a mechanism support whch will prevent the data loss at IOT agent

@onrao
Copy link

onrao commented May 18, 2021

Hi

Any update about this feature enablement?
We are observing this issue of socket connection lost between IoT Agent JSON and Orion if my IoT Agent data rate is >5000 records per minute.
Any pointers to fix this issue or what will be the limitation on data rate supported by single IoT Agent Pod in a K8S deployment environment?
looking for a quick solution or fix.

Thanks & Regards
ONR

@fgalan
Copy link
Member

fgalan commented May 19, 2021

@Chandradeep-NEC @onrao could you elaborate on which feature are you proposing for IOT Agent? The original issue description is too broad and we should need which mechanism in particular are you proposing to prevent data lost.

@onrao
Copy link

onrao commented May 20, 2021

@fgalan Observation is that

  1. Except the MEASURE002 or ORION -ALARAM error logging we are not able to know what issue has cause this socket error and how to recover it back.
  2. As it is coming from IoT Agent node lib functionality of updatecontext() we need to explicitly define the socket error due to the orion endpoint not reachable or timeout ..etc
  3. We would like to know when this ALARAM or ERROR will be logged is it due to the more number of update request are happening or more message size has to be updated and what will be the limitation on max number of updates can happen per second/minute from IoT Agent to Orion CB?

@mapedraza
Copy link
Collaborator

In my humble opinion what @Chandradeep-NEC is proposing sounds different compared to what you (@onrao) are describing here.

I understand you are facing issues while running on K8s the IoTA when a certain load on the agent (but you are no describing well your infrastructure resources which also limits and the configuration deployed on the IoTA, which really impact to the throughput ). Concerning the list observation you write in your last comment, I can't understand it well.

  1. You are saying you are not able to know when having socket error in the IoTA Log, but you can see it on the Orion log, right?
  2. Could you elaborate more on this topic?
  3. I can not understand this point, do you mind to rephrase it?

@onrao
Copy link

onrao commented May 26, 2021

@fgalan Sorry for the late reply

  1. We didn't find any error on Orion log.
  2. I mean, as per our understanding Socket error is handled and captured in side the IoT Agent Node Lib. and the log indicates only " Socket open error" No further details are provided why it coudn't open the socket.
  3. We would like to know in which conditions this error will be thrown, is it due to no of requests per sec/minute or timeout is not followed before each session...etc.

as per the documentation it is described as follows,
MEASURES-002: COULDN'T SEND THE UPDATED VALUES TO THE CONTEXT BROKER DUE TO AN ERROR: %S
There was some communication error connecting with the Context Broker that made it impossible to send the measures. If this log appears frequently, it may be a signal of network problems between the IoTAgent and the Context Broker. Check the IoTAgent network connection, and the configured Context Broker port and host.

ORION-ALARM | Critical | Indicates a persistent error accessing the Context Broker

There is no further info on this alarm/error.

@mapedraza
Copy link
Collaborator

Could you provide a procedure to reproduce the problem in order to analyze what is happening?

@onrao
Copy link

onrao commented Jun 1, 2021

FIWARE Stack used

  1. IoT Agent JSON(1.12.0) with MQTT Binding enabled + MQTT Broker
  2. Orion CB(2.6.0)
  3. Draco(~1.3.0)
    All these are deployed as a docker containers deployed as a service in Kubernetes cluster with 2 Nodes of Midrange VM's.
    All these services are internally connected with service endpoints in the cluster with auto-scaling enabled.

step1: Provisioned the 12 devices in IoT Agent and generated the ACL for each device
step2:Configured VerneMQ to enable the devices ACL and validation accordingly
step4: Simulating 12 devices with data rate @5500 data topics/minute
step5 : Observed Devices data is loss between IoT Agent and Orion CB but both are running fine on VM Node where as IoT Agent log indicates there is an "MEASURE-002 error/ ORION-ALARM due to the Socket open error"
The Step#4 with @5000 records/topics per minute run for 1 hour duration of simulation , then no ERROR or ALARM triggered inside the EKS pod logs.

We need your quick alternative solution and to know any limitation as per the IoT Agent code , where it is not updated in the document

@fgalan
Copy link
Member

fgalan commented Jun 2, 2021

Thank you for your feedback, but note that full detail is needed to precisely reproduce your case. Please see me comments inline.

FIWARE Stack used

  1. IoT Agent JSON(1.12.0) with MQTT Binding enabled + MQTT Broker
  2. Orion CB(2.6.0)
  3. Draco(~1.3.0)
    All these are deployed as a docker containers deployed as a service in Kubernetes cluster with 2 Nodes of Midrange VM's.
    All these services are internally connected with service endpoints in the cluster with auto-scaling enabled.

Could you provide the exact Kubernets deployment configuration you are using? (helm charts or whatever)

step1: Provisioned the 12 devices in IoT Agent and generated the ACL for each device

How are you provisioning the devices? Could you provide the curl command you are using (or the equivalent in curl to the command you are using)?
How are you generating the ACLs? Could you provide the curl command you are using (or the equivalent in curl to the command you are using)?

step2:Configured VerneMQ to enable the devices ACL and validation accordingly

What is VerneMQ?

step4: Simulating 12 devices with data rate @5500 data topics/minute

How do you simulate this? Could you provide the script program or similar (e.g. JMeter configuration, etc.) you are using?

step5 : Observed Devices data is loss between IoT Agent and Orion CB but both are running fine on VM Node where as IoT Agent log indicates there is an "MEASURE-002 error/ ORION-ALARM due to the Socket open error"
The Step#4 with @5000 records/topics per minute run for 1 hour duration of simulation , then no ERROR or ALARM triggered inside the EKS pod logs.

We need your quick alternative solution and to know any limitation as per the IoT Agent code , where it is not updated in the document

@onrao
Copy link

onrao commented Jun 24, 2021

@fgalan please find the required details.We need a quick confirmation and way forward for this issue.

Environment Setup:
image

Error:
image

Device Simulator:

<style> </style>
Device_ID Crane1_1_S                
No_of_Sensors 9                
Activity_Duration_mins 240                
Attribute Type Active(Y/N) Range/Value minvalue maxvalue Unit Set_No Frequency_secs Topic
liftingHeight number N Range 0 0 m 1 1 /iot/Crane1_1_S/attrs
windingSpeed number N Range 0 0 m/s 1 1 /iot/Crane1_1_S/attrs
load number N Range 10.4 10.4 N 1 1 /iot/Crane1_1_S/attrs
turningAngle number N Range 10 10 ° 1 1 /iot/Crane1_1_S/attrs
turningspeed number N Range 12.5 12.5 °/s 1 1 /iot/Crane1_1_S/attrs
motorCurrent number Y Range 0 10 A 1 1 /iot/Crane1_1_S/attrs
brake text N Range 0 0   1 1 /iot/Crane1_1_S/attrs
hoist text N Range 9.7 9.7   1 1 /iot/Crane1_1_S/attrs
hoistCoolingFan text N Range 0 0   1 1 /iot/Crane1_1_S/attrs
Device_ID Crane1_2_S                
No_of_Sensors 9                
Activity_Duration_mins 240                
Attribute Type Active(Y/N) Range/Value minvalue maxvalue Unit Set_No Frequency_secs Topic
liftingHeight number N Range 0 0 m 1 1 /iot/Crane1_2_S/attrs
windingSpeed number N Range 0 0 m/s 1 1 /iot/Crane1_2_S/attrs
load number N Range 10.4 10.4 N 1 1 /iot/Crane1_2_S/attrs
turningAngle number N Range 10 10 ° 1 1 /iot/Crane1_2_S/attrs
turningspeed number N Range 12.5 12.5 °/s 1 1 /iot/Crane1_2_S/attrs
motorCurrent number Y Range 0 10 A 1 1 /iot/Crane1_2_S/attrs
brake text N Range 0 0   1 1 /iot/Crane1_2_S/attrs
hoist text N Range 9.7 9.7   1 1 /iot/Crane1_2_S/attrs
hoistCoolingFan text N Range 0 0   1 1 /iot/Crane1_2_S/attrs
Device_ID Crane1_3_S                
No_of_Sensors 9                
Activity_Duration_mins 240                
Attribute Type Active(Y/N) Range/Value minvalue maxvalue Unit Set_No Frequency_secs Topic
liftingHeight number N Range 0 0 m 1 1 /iot/Crane1_3_S/attrs
windingSpeed number N Range 0 0 m/s 1 1 /iot/Crane1_3_S/attrs
load number N Range 10.4 10.4 N 1 1 /iot/Crane1_3_S/attrs
turningAngle number N Range 10 10 ° 1 1 /iot/Crane1_3_S/attrs
turningspeed number N Range 12.5 12.5 °/s 1 1 /iot/Crane1_3_S/attrs
motorCurrent number Y Range 0 10 A 1 1 /iot/Crane1_3_S/attrs
brake text N Range 0 0   1 1 /iot/Crane1_3_S/attrs
hoist text N Range 9.7 9.7   1 1 /iot/Crane1_3_S/attrs
hoistCoolingFan text N Range 0 0   1 1 /iot/Crane1_3_S/attrs
Device_ID Crane2_1_S                
No_of_Sensors 9                
Activity_Duration_mins 240                
Attribute Type Active(Y/N) Range/Value minvalue maxvalue Unit Set_No Frequency_secs Topic
liftingHeight number N Range 0 0 m 1 1 /iot/Crane2_1_S/attrs
windingSpeed number N Range 0 0 m/s 1 1 /iot/Crane2_1_S/attrs
load number N Range 10.4 10.4 N 1 1 /iot/Crane2_1_S/attrs
turningAngle number N Range 10 10 ° 1 1 /iot/Crane2_1_S/attrs
turningspeed number N Range 12.5 12.5 °/s 1 1 /iot/Crane2_1_S/attrs
motorCurrent number Y Range 0 10 A 1 1 /iot/Crane2_1_S/attrs
brake text N Range 0 0   1 1 /iot/Crane2_1_S/attrs
hoist text N Range 9.7 9.7   1 1 /iot/Crane2_1_S/attrs
hoistCoolingFan text N Range 0 0   1 1 /iot/Crane2_1_S/attrs

@fgalan
Copy link
Member

fgalan commented Jun 25, 2021

I'm afraid you last comment doesn't correspond to what I asked...

With regards to kubernetes configuration, an screenshoot is not acceptable. Could you provide the files in textual form, please?

With regards to the simulation information, a similar problem occurs. I don't know what that table means... you don't even mention which simulation tool are you using. Please, provide more detail on this.

Finally, we would need an answer (as direct and detailed as you can) to the following questions:

  • How are you provisioning the devices? Could you provide the curl command you are using (or the equivalent in curl to the command you are using)?
  • How are you generating the ACLs? Could you provide the curl command you are using (or the equivalent in curl to the command you are using)?
  • What is VerneMQ?

@onrao
Copy link

onrao commented Aug 25, 2022

@fgalan We saw the issue already reported here telefonicaid/iotagent-ul#383
is same as what we faced. But it is still open.
Can we have any suggestions from FIWARE community?

The above verneMQ what we used is the MQTT Broker.

@fgalan
Copy link
Member

fgalan commented Aug 25, 2022

@onrao thanks for the feedback!

We will be more than happy to review and eventually merge any pull request that identifies and solves the problem, if at the end it is confirmed. Thanks! :)

@vijapandey
Copy link

Reviewing all comments but not able to conclude about the valid solution... Could @fgalan please suggest about solution of above issue... I am having same problem.

@fgalan
Copy link
Member

fgalan commented Nov 15, 2022

I'm afraid I cannot provide any solution because, honestly, I'm a bit lost in this issue and I don't know what the exact problem is :)

If somebody could summarize and explain the exact problem and the proposed solution it would be great. Based on that I could provide better feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants