Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improved BirdWeather Submissions #875

Closed
wants to merge 2 commits into from
Closed

Improved BirdWeather Submissions #875

wants to merge 2 commits into from

Conversation

jaredb7
Copy link

@jaredb7 jaredb7 commented Apr 26, 2023

BirdWeather Submissions run in a thread

Run BirdWeather submissions in a separate thread to run in the background it doesn't hold up or slow down processing audio and detections.

Previously if BirdWeather was slow or there were other network factors slowing down the submission, the analysis server would pause for the duration of the upload.... sometimes to a point of a backlog (I have experienced backlogs up to 5 minutes due to slow network conditions).

This enhancement allows the BirdWeather detections to added to a queue and processed in the background via a separate thread, this allow the analysis server to continue processing audio while sound scapes and detections upload at their own pace.

The single thread processing the queue also aids in accidentally sending too many BirdWeather submission at once, so we're only ever sending 1 submission at any one time.

Also greatly improved exception handling around requests, plus useful output in case things go wrong.
image

Request Timeouts, Retries & Back-off

  • gerritzen Added timeout values to the requests as they will run indefinitely if no timeout is specified.

  • A slightly longer timeout is used for the sound scape upload as this can take a while depending on network conditions. 60 seconds should be enough for a slow 3G cellular connection. And we want to give the sound scape the best chance to upload.

  • Further, the ability to retry requests was implemented but only for connection timeouts (request didn't complete within our specified timeout period), connection errors (network problem (e.g. DNS failure, refused connection, etc)) and specific response codes that are known and can be retried on.
    If a request raises a error which isn't a Connection Timeout or Connection Error, it will fall into the HTTP Error exception... we then check the response code to see what the error is and if it is any of the following it will be retried, anything else we consider a failure

404 Not Found
429 Too Many Requests
500 Internal Server Error
502 Bad Gateway
503 Service Unavailable
504 Gateway Timeout

The Sound scape upload and detection upload are retried separately, the sound scape may succeed but the detection might not, so the detection will get retried until succeeding or exhausting retries.
It's more likely the sound scape upload will be the source a issue as it's the most intensive of the two
Submissions will only try 6 times to upload, time-outs together with the back-off will see this happen over multiple minutes.

  • A exponential back-off delay is calculated before retrying again to provide adequate delay between between requests and ease API server pressure.
BirdWeather Submission:: Soundscape POST - HTTP Request Exception for <bird_weather>, Retrying after 1.7s, Retry Attempt:1 - TOO MANY REQUESTS
BirdWeather Submission:: Soundscape POST - HTTP Request Exception for <bird_weather>, Retrying after 3.4s, Retry Attempt:2 - TOO MANY REQUESTS
BirdWeather Submission:: Soundscape POST - HTTP Request Exception for <bird_weather>, Retrying after 6.8s, Retry Attempt:3 - TOO MANY REQUESTS
BirdWeather Submission:: Soundscape POST - HTTP Request Exception for <bird_weather>, Retrying after 13.6s, Retry Attempt:4 - TOO MANY REQUESTS
BirdWeather Submission:: Soundscape POST - HTTP Request Exception for <bird_weather>, Retrying after 27.2s, Retry Attempt:5 - TOO MANY REQUESTS
BirdWeather Submission:: Soundscape POST - HTTP Request Exception for <bird_weather>, Retrying after 54.4s, Retry Attempt:6 - TOO MANY REQUESTS

A working example where the retry system helped submit the detection

May  1 07:00:44 BirdNET python3[2232720]: BirdWeather Submission:: Soundscape Successfully Uploaded - status:201 soundscape_id:4XX155XX
May  1 07:00:44 BirdNET python3[2232720]: BirdWeather Submission:: Detection Data - json: {"timestamp": "2023-05-01T07:00:06+10:00", "lat": "-23.XXXX", "lon": "150.XXXX", "soundscapeId": "4XX155XX", "soundscapeStartTime": "3.0", "soundscapeEndTime": "6.0", "commonName": "Magpie-lark", "scientificName": "Grallina cyanoleuca", "algorithm": "alpha", "confidence": "0.7420777"}
May  1 07:00:45 BirdNET python3[2232720]: BirdWeather Submission Error:: Detection POST - HTTP Request Exception for https://app.birdweather.com/api/v1/stations/AxBxC/soundscapes?timestamp=2023-05-01T07:00:03+10:00, Retrying after 1.7s, Retry Attempt:1
May  1 07:00:47 BirdNET python3[2232720]: BirdWeather Submission:: Retry (1) - Detection Data - json: {"timestamp": "2023-05-01T07:00:06+10:00", "lat": "-23.XXXX", "lon": "150.XXXX", "soundscapeId": "4XX155XX", "soundscapeStartTime": "3.0", "soundscapeEndTime": "6.0", "commonName": "Magpie-lark", "scientificName": "Grallina cyanoleuca", "algorithm": "alpha", "confidence": "0.7420777"}
May  1 07:00:48 BirdNET python3[2232720]: BirdWeather Submission Error:: Detection POST - HTTP Request Exception for https://app.birdweather.com/api/v1/stations/AxBxC/soundscapes?timestamp=2023-05-01T07:00:03+10:00, Retrying after 3.4s, Retry Attempt:2
May  1 07:00:51 BirdNET python3[2232720]: BirdWeather Submission:: Retry (2) - Detection Data - json: {"timestamp": "2023-05-01T07:00:06+10:00", "lat": "-23.XXXX", "lon": "150.XXXX", "soundscapeId": "4XX155XX", "soundscapeStartTime": "3.0", "soundscapeEndTime": "6.0", "commonName": "Magpie-lark", "scientificName": "Grallina cyanoleuca", "algorithm": "alpha", "confidence": "0.7420777"}
May  1 07:00:52 BirdNET python3[2232720]: BirdWeather Submission Error:: Detection POST - HTTP Request Exception for https://app.birdweather.com/api/v1/stations/AxBxC/soundscapes?timestamp=2023-05-01T07:00:03+10:00, Retrying after 6.8s, Retry Attempt:3
May  1 07:00:59 BirdNET python3[2232720]: BirdWeather Submission:: Retry (3) - Detection Data - json: {"timestamp": "2023-05-01T07:00:06+10:00", "lat": "-23.XXXX", "lon": "150.XXXX", "soundscapeId": "4XX155XX", "soundscapeStartTime": "3.0", "soundscapeEndTime": "6.0", "commonName": "Magpie-lark", "scientificName": "Grallina cyanoleuca", "algorithm": "alpha", "confidence": "0.7420777"}
May  1 07:00:59 BirdNET python3[2232720]: BirdWeather Submission Error:: Detection POST - HTTP Request Exception for https://app.birdweather.com/api/v1/stations/AxBxC/soundscapes?timestamp=2023-05-01T07:00:03+10:00, Retrying after 13.6s, Retry Attempt:4
May  1 07:01:13 BirdNET python3[2232720]: BirdWeather Submission:: Retry (4) - Detection Data - json: {"timestamp": "2023-05-01T07:00:06+10:00", "lat": "-23.XXXX", "lon": "150.XXXX", "soundscapeId": "4XX155XX", "soundscapeStartTime": "3.0", "soundscapeEndTime": "6.0", "commonName": "Magpie-lark", "scientificName": "Grallina cyanoleuca", "algorithm": "alpha", "confidence": "0.7420777"}
May  1 07:01:14 BirdNET python3[2232720]: BirdWeather Submission Error:: Detection POST - HTTP Request Exception for https://app.birdweather.com/api/v1/stations/AxBxC/soundscapes?timestamp=2023-05-01T07:00:03+10:00, Retrying after 27.2s, Retry Attempt:5
May  1 07:01:41 BirdNET python3[2232720]: BirdWeather Submission:: Retry (5) - Detection Data - json: {"timestamp": "2023-05-01T07:00:06+10:00", "lat": "-23.XXXX", "lon": "150.XXXX", "soundscapeId": "4XX155XX", "soundscapeStartTime": "3.0", "soundscapeEndTime": "6.0", "commonName": "Magpie-lark", "scientificName": "Grallina cyanoleuca", "algorithm": "alpha", "confidence": "0.7420777"}
May  1 07:01:43 BirdNET python3[2232720]: BirdWeather Submission:: Retry (5) - Detection Successfully Uploaded - status:201 - 07:00:06/Magpie-lark/0.7420777

Any errors can be monitor for with grep as the Birdnet log can get a little busy
sudo grep "BirdWeather" /var/log/syslog

Closes #863

@jaredb7 jaredb7 marked this pull request as ready for review April 30, 2023 23:06
@jaredb7 jaredb7 changed the title WIP - Improved BirdWeather Submissions Improved BirdWeather Submissions Apr 30, 2023
Copy link

@gerritzen gerritzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think your code could do the same in like half the number of lines.

Maybe it's too late, maybe it's the hundreds of lines of nested indentation, but I'm not quite sure how two of the main problems are treated:

  1. How do you avoid race conditions? (Try to upload detection number 2 of one soundscape before number 1 is uploaded)
  2. How many threads do you spawn? You mentioned that you were able to launch up to 4, but I probably missed it. Or is it just one thread that works on the queue of submissions/soundscape uploads? If so, Doesn't this potentially create an enormous backlog if you wait something like 2 minutes per upload?

@@ -42,27 +43,51 @@
server = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
server.setsockopt(socket.SOL_SOCKET, socket.SO_REUSEADDR, 1)

BirdNET_GLOBAL_3K_MODEL_NAME = "BirdNET_GLOBAL_3K_V2.3_Model_FP16"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMHO, the model update should be addressed in a separate branch/PR. (It should have been part of commit c016427 anyway). Can you disentangle this and create a separate PR?

bw_soundscape_submission_cache_limit = 20
#
# Retry on these response codes
bw_request_status_retry_for = [429, 500, 502, 503, 504]

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you encountered other error codes than these? If not, what's the benefit of only retrying for those?

#
# Retry on these response codes
bw_request_status_retry_for = [429, 500, 502, 503, 504]
# Default timeout for requests

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd add "in seconds" or leave out the comment. Otherwise it doesn't give more information than the variable name itself.

# Open most recent Configuration and grab DB_PWD as a python variable
with open(userDir + '/BirdNET-Pi/scripts/thisrun.txt', 'r') as f:
this_run = f.readlines()
audiofmt = "." + str(str(str([i for i in this_run if i.startswith('AUDIOFMT')]).split('=')[1]).split('\\')[0])
priv_thresh = float("." + str(str(str([i for i in this_run if i.startswith('PRIVACY_THRESHOLD')]).split('=')[1]).split('\\')[0])) / 10
priv_thresh = float("." + str(
str(str([i for i in this_run if i.startswith('PRIVACY_THRESHOLD')]).split('=')[1]).split('\\')[0])) / 10

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the following, there are a lot of cosmetic changes that inflate your PR. As with the model, can you disentangle this so that a reviewer (for example me) doesn't have to check those changes? Maybe create another PR "beautifying the code".


# Determine the algorithm used
if model == BirdNET_GLOBAL_3K_MODEL_NAME:
post_data['algorithm'] = "2p2"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does 2p2 not stand for 2.2? At the moment the code publishes 2.3 detections as "2p2". This was introduced after a request by @timsterc, maybe he can comment on this.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs updating before it can be merged.

f"BirdWeather Submission Error:: {retry_text}Detection POST did not succeed, detection_url, detection_json or soundscape_id missing or not supplied",
flush=True)

else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same as 5 lines above.

print("BirdWeather Submission Error:: Soundscape POST did not succeed, detection was not submitted",
flush=True)

else:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same.

print(f"birdweather_soundscapes - mode:{mode} - for {soundscape_filename_to_find}")

# Default mode to search
if mode is None:

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment doesn't explain this, but an explanation of "mode" would be nice. Also,
def birdweather_soundscapes(mode='search', soundscape_filename_to_find, ... would do the same as this if clause

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It was going to be a function you could call without the mode to do a search but worked out differently.
Removed 1061

there's a small snippet in each in the if and elif that mentions briefly what the following code

time.sleep(1)


def birdweather_soundscapes(mode, soundscape_filename_to_find, bw_soundscape_id_for_filename=None):

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you give this function a more meaningful name?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

changed to birdweather_soundscape_id_cache

# Get the BirdWeather submission data, this is a dictionary containing the necessary data
bw_submission_data = bw_submission_queue.get()

# Verify we have data

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't line 1034 ensure this already?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

1304 just checks the actual if the queue has items in it, else calling .get() on a empty queue will just block. 1039 isn't really required though. removed.

@jaredb7
Copy link
Author

jaredb7 commented May 4, 2023

  • How do you avoid race conditions? (Try to upload detection number 2 of one soundscape before number 1 is uploaded)
    The soundscape upload and detection are a single item in the queue and processed as single block, once that completes processes the next queue item.
  • How many threads do you spawn? You mentioned that you were able to launch up to 4, but I probably missed it. Or is it just one thread that works on the queue of submissions/soundscape uploads? If so, Doesn't this potentially create an enormous backlog if you wait something like 2 minutes per upload?

Just the 1 thread spawned now, a backlog could eventuate if the soundscape upload is constantly timing out or if the user is on a connection with weak upload bandwidth (the time-out is set to try accommodate this). Outside of that the max time the retry loops sleep is 54 seconds at the last (6th) retry, this can changed around but it's more or less same method the URLib Retry class uses

I have considered a backlog occurring but not sure how you'd prevent it if you want also give the uploads the very best chance of getting out.

@jaredb7
Copy link
Author

jaredb7 commented May 8, 2023

@gerritzen I've rearranged and trimmed of any unneeded/left in by mistake code and run a few days just to be sure, looking better. do you have any further suggestions?

Copy link

@gerritzen gerritzen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks a lot better! I haven't tested it, but apart from the "2p2" I saw no obvious error. If you say that it runs stably, then I'd give my thumbs up.


# Determine the algorithm used
if model == BirdNET_GLOBAL_3K_MODEL_NAME:
post_data['algorithm'] = "2p2"

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This needs updating before it can be merged.

continue
else:
print(f"BirdWeather Submission Error:: Soundscape POST - HTTP Request Exception! Cannot retry - {request_exc}", flush=True)
# break the loop on non-retryable status

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if that's a word, but if it is, it's spelled "retriable" ;)

Copy link
Author

@jaredb7 jaredb7 Jun 19, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah it's definitely not a common place compound word but gets used here and there, I've mostly seen it around Microsoft Azure AD SSO integration and SQL Server errors.

Either "retryable" or "retriable" are correct spelling for basically the same word, though either way might be suitable depending on the context.

I've merged current main branch to get the new changes for the new models and algorithm

@ehpersonal38 ehpersonal38 marked this pull request as ready for review June 13, 2023 00:04
Run BirdWeather submissions in a separate thread to run in the background it doesn't hold up or slow down processing audio and detection's.

Instead of creating a new thread to process each Birdweather submission as they occur, add the data used for the Birdweather submission to a queue.

Then use a single thread running birdweather_submission_processor in a endless loop to process Birdweather submissions as they are added to that queue.

This avoids accidentally sending too many Birdweather submission at once, so we're only ever sending 1 submission at any one time.

Add timeout values to the requests as they will run indefinitely if no timeout is specified

Further, the ability to retry requests was implemented but only for connection timeouts and specific responses codes that can be retried on.

A exponential backoff delay is calculated before retrying again to provide adequate delay between requests and ease server pressure
@jaredb7 jaredb7 closed this by deleting the head repository Apr 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Analysis stops and needs many reboots
2 participants