Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Resumable uploads failing #122

Open
MarkEdmondson1234 opened this issue May 22, 2020 · 18 comments
Open

Resumable uploads failing #122

MarkEdmondson1234 opened this issue May 22, 2020 · 18 comments
Labels
bug need-reprex Issues needing a reproduable example to fix

Comments

@MarkEdmondson1234
Copy link
Collaborator

As reported in #120

@MarkEdmondson1234
Copy link
Collaborator Author

MarkEdmondson1234 commented May 22, 2020

@BillPetti wrote:

I'm facing what I think is a similar issue, but in my case the upload is actually failing. I am not asking for it to find a resumable upload, but when I try to upload an updated file it appears to find on and hangs after reading about half of the file, then I get this message:

<- HTTP/2 408 
<- content-type: text/html; charset=UTF-8
<- referrer-policy: no-referrer
<- content-length: 1557
<- date: Sat, 16 May 2020 16:10:06 GMT
<- alt-svc: h3-27=":443"; ma=2592000,h3-25=":443"; ma=2592000,h3-T050=":443"; ma=2592000,h3-Q050=":443"; ma=2592000,h3-Q049=":443"; ma=2592000,h3-Q048=":443"; ma=2592000,h3-Q046=":443"; ma=2592000,h3-Q043=":443"; ma=2592000,quic=":443"; ma=2592000; v="46,43"
<- 
2020-05-16 12:10:06 -- File upload failed, trying to resume...
2020-05-16 12:10:06 -- Retry 3 of 3
Error in gcs_retry_upload(upload_url = upload_url, file = temp, type = type) : 
  Must supply either retry_object or all of upload_url, file and type
Calls: gcs_upload ... do_upload -> do_resumable_upload -> gcs_retry_upload
In addition: Warning messages:
1: No JSON content detected 
2: In doHttrRequest(req_url, shiny_access_token = shiny_access_token,  :
  API checks failed, returning request without JSON parsing
Execution halted

And here's my original call:

gcs_upload(file = r_object,
           object_function = f,
           upload_type = 'simple',
           name = 'directory/file_name')

@MarkEdmondson1234
Copy link
Collaborator Author

@BillPetti could you rerun the upload that fails with options(googleAuthR.verbose = 1) so we can get more logging info.

Also what type of file is uploading - is it a big file and/or an R list or data.frame?

@MarkEdmondson1234 MarkEdmondson1234 added the need-reprex Issues needing a reproduable example to fix label Jun 6, 2020
@LukasWallrich
Copy link

I have a similar issue with a large RDS file (9 GB) - whenever I try to upload it, I get

gcs_upload("full_IAT_data_file.RDS", name = "full_IAT_data_file.RDS", bucket = "iat_data")
2020-12-16 21:26:30 -- File size detected as 9.8 Gb
2020-12-16 21:26:30 -- Found resumeable upload URL: https://www.googleapis.com/upload/storage/v1/b/iat_data/o/?uploadType=resumable&name=full_IAT_data_file.RDS&predefinedAcl=private&upload_id=ABg5-UyYJCKTjF10-whqQa3ohDt8ELcAFPXjzxLgutIt4xjqKMPnmq99595PIRLLCf_3ZnFubw2I2NqzaJwK0oQb8oZrL5og3w
2020-12-16 21:27:55 -- File upload failed, trying to resume...
2020-12-16 21:27:55 -- Retry 3 of 3
Error: Must supply either retry_object or all of upload_url, file and type

Rerunning it with options(googleAuthR.verbose = 1) ends with

<- HTTP/2 400 
<- x-guploader-uploadid: ABg5-Uw53IDqndX7BtNbfvHuWpplSzb37rmJkv-Isl7pVy5by8rUJRuFP60ATBwSSWTVowvkwZ73Usp4GumfQ11h0XA
<- content-type: application/json; charset=UTF-8
<- date: Wed, 16 Dec 2020 20:53:19 GMT
<- vary: Origin
<- vary: X-Origin
<- cache-control: no-cache, no-store, max-age=0, must-revalidate
<- expires: Mon, 01 Jan 1990 00:00:00 GMT
<- pragma: no-cache
<- content-length: 498
<- server: UploadServer
<- 
2020-12-16 20:53:19 -- File upload failed, trying to resume...
2020-12-16 20:53:19 -- Retry 3 of 3
Error: Must supply either retry_object or all of upload_url, file and type

Given that I am trying to save from Google Cloud Engine within the same region, I thought I would give simple upload a go - however, that fails because the option needs to be specified as an integer. Any other suggestions?

@LukasWallrich
Copy link

Apparently, for me the issue was that I did not choose "Fine-grained: Object-level ACLs enabled" when creating the bucket. With that, the upload seems to work now. Not sure if that is a general limitation, or because of how I created the JSON ... but all seems well for now (even though it might be worth documenting this, in case it is a common mistake people make?) Many thanks for this helpful package (and I will be back if the issue reappears :)).

@MarkEdmondson1234
Copy link
Collaborator Author

Thanks @LukasWallrich - this is a tricky one to pin down as I need to find a failing example to replicate. I think in your case you were missing the new Acl parameter defined in #111

gcs_upload(mtcars, bucket = "mark-bucketlevel-acl",
                   predefinedAcl = "bucketLevel")

Perhaps I can use this to test the above retry issue :)

@jeremy-allen
Copy link

@MarkEdmondson1234 I'm having the same or similar issue when I upload a batch of pdf files. I have a list of 500 pfd files that I upload via a for loop. Each time I do this a different subset of files will fail, so I don't think it is an issue with the files. You'll see in my script that I log which ones fail, then run the loop again on just those, and many of them upload fine on round 2, then I do a round 3. I'll also include the log so you can see the errors.

Upload script with three rounds of uploads

library(tidyverse)
library(fs)
library(googleCloudStorageR)


my_dir <- "<your dir here>"

write(x = as.character(Sys.time()), file = paste0(my_dir, "/log.txt"), append = TRUE)

# list files for upload
my_files <- dir_ls(
  path    = here::here("downloads"),
  glob    = "*.pdf",
  recurse = TRUE
) %>% unique()

total <- length(my_files)

# gcs_create_bucket(
#  "capitol-docs",
#  project_id,
#  location      = "US",
#  storageClass  = "STANDARD",
#  predefinedAcl = "publicRead",
#  predefinedDefaultObjectAcl = "bucketOwnerFullControl"
# )

# modify boundary between simple and resumable uploads
# By default the upload_type will be 'simple' if under 5MB, 'resumable' if over 5MB. Use gcs_upload_set_limit to modify this boundary - you may want it smaller on slow connections, higher on faster connections. 'Multipart' upload is used if you provide a object_metadata.
gcs_upload_set_limit(upload_limit = 2500000L)

#options(googleAuthR.verbose = 0)




 #---- ROUND 1: TRY TO UPLOAD ALL FILES ----

# upload
for (i in seq_along(my_files)) {
  
  skip_to_next <- FALSE
  closeAllConnections()
  Sys.sleep(.5)
  message("... ", i, " of ", total, " ... trying to uplod ",  path_file(my_files[i]))
 
  tryCatch(
    
   expr = 
   {
    gcs_upload(
     file = my_files[i],
     bucket = "capitol-docs",
     name = path_file(my_files[i]),
     predefinedAcl = "bucketLevel"
    )
   },
   error = function(e) {
    message("... Upload seems to have failed for ", i, ":\n")
    write(x = paste0(my_files[i], "\n", e), file = paste0(my_dir, "/log.txt"), append = TRUE)
    skip_to_next <<- TRUE
   }
   
  )

  if(skip_to_next) { next }
  
}

# check bucket contents
#bucket_contents <- gcs_list_objects("capitol-docs")
# delete contents
#map(bucket_contents$name, gcs_delete_object, bucket = "capitol-docs")

closeAllConnections()
gc()




#---- ROUND 2: TRY FAILED FILES AGAIN ----

my_failed_files <- readr::read_lines("log.txt") %>% 
  as_tibble() %>% 
  filter(str_detect(value, "pdf$")) %>% 
  drop_na() %>% 
  pull(value)

new_total <- length(my_failed_files)

# upload
for (i in seq_along(my_failed_files)) {
  
  skip_to_next <- FALSE
  closeAllConnections()
  Sys.sleep(.5)
  message("... ", i, " of ", new_total, " ... trying to uplod ",  path_file(my_failed_files[i]))
  
  tryCatch(
    
    expr = 
      {
        gcs_upload(
          file = my_failed_files[i],
          bucket = "capitol-docs",
          name = path_file(my_failed_files[i]),
          predefinedAcl = "bucketLevel"
        )
      },
    error = function(e) {
      message("... Upload seems to have failed for ", i, ":\n")
      write(x = paste0(my_failed_files[i], "\n", e), file = paste0(my_dir, "/log2.txt"), append = TRUE)
      skip_to_next <<- TRUE
    }
    
  )
  
  if(skip_to_next) { next }
  
}




#---- ROUND 3: TRY FAILED FILES FROM ROUND 2 AGAIN ----

my_failed_files2 <- readr::read_lines("log2.txt") %>% 
  as_tibble() %>% 
  filter(str_detect(value, "pdf$")) %>% 
  drop_na() %>% 
  pull(value)

new_total2 <- length(my_failed_files2)

# upload
for (i in seq_along(my_failed_files2)) {
  
  skip_to_next <- FALSE
  closeAllConnections()
  Sys.sleep(.5)
  message("... ", i, " of ", new_total2, " ... trying to uplod ",  path_file(my_failed_files2[i]))
  
  tryCatch(
    
    expr = 
      {
        gcs_upload(
          file = my_failed_files2[i],
          bucket = "capitol-docs",
          name = path_file(my_failed_files2[i]),
          predefinedAcl = "bucketLevel"
        )
      },
    error = function(e) {
      message("... Upload seems to have failed for ", i, ":\n")
      write(x = paste0(my_failed_files2[i], "\n", e), file = paste0(my_dir, "/log3.txt"), append = TRUE)
      skip_to_next <<- TRUE
    }
    
  )
  
  if(skip_to_next) { next }
  
}

Logs

Log 1

2021-03-25 16:44:00
/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/anderson_john_steven/anderson_indictment.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/caldwell_thomas_edward/caldwell_crowl_watkins_parker_parker_young_st.pdf
Error: Must supply either retry_object or all of upload_url, file and type

/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/caldwell_thomas_edward/caldwell_et_al_indictment.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/ciarpelli_albert_a/ciarpelli_statement_of_facts.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/crowl_donovan_ray/watkins_crowl_and_caldwell_indictment.pdf
Error: Must supply either retry_object or all of upload_url, file and type

/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/cudd_jenny_louise/cudd_rosa_statement_of_facts.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/evans_iii_treniss_jewell/evans_iii_statement_of_facts.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/fairlamb_scott_kevin/fairlamb_indictment.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/fairlamb_scott_kevin/fairlamb_scott_complaint_and_affidavit.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/griffin_couy/griffin_affidavit.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/griffin_couy/griffin_complaint.pdf
Error: Request failed before finding status code: HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/johnson_adam/johnson_statement_of_facts.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/montgomery_patrick/montgomery_complaint.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/montoni_corinne/montoni_affidavit.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/nalley_verden_andrew/calhoun_and_nalley_indictment.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/nichols_ryan/nichols_complaint.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/nordean_ethan_aka_ruffio_panman/nordean_complaint_and_affidavit.pdf
Error: Must supply either retry_object or all of upload_url, file and type

/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/norwood_iii_william_robert/norwood_complaint.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/norwood_iii_william_robert/norwood_indictment.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/packer_robert_keith/packer_statement_of_facts.pdf
Error: Must supply either retry_object or all of upload_url, file and type

Log 2

/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/caldwell_thomas_edward/caldwell_et_al_indictment.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^


/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/ciarpelli_albert_a/ciarpelli_statement_of_facts.pdf
Error: lexical error: invalid char in json text.
                                       <!DOCTYPE html> <html lang=en> 
                     (right here) ------^

Log 3

/Users/jeremyallen/Dropbox/Data/capitol-attack/downloads/caldwell_thomas_edward/caldwell_et_al_indictment.pdf
Error: Request failed before finding status code: HTTP/2 stream 0 was not closed cleanly: PROTOCOL_ERROR (err 1)

@MarkEdmondson1234
Copy link
Collaborator Author

This upload should work, I should at least add better logging such as the status code (you could see this via options(googleAuthR.verbose=2))

Do you need the PDFs uploaded as separate files? Just to work around your particular issue you could look at gce_save_all() which zips a folder and uploads that instead.

@jeremy-allen
Copy link

This upload should work, I should at least add better logging such as the status code (you could see this via options(googleAuthR.verbose=2))

Do you need the PDFs uploaded as separate files? Just to work around your particular issue you could look at gce_save_all() which zips a folder and uploads that instead.

I'll try the more verbose logging. I'll try a zip file, too.

@ben519
Copy link

ben519 commented Jun 16, 2021

I've also been plagued by my uploads hanging. Finally found the solution today.

It seems that when you upload a file like gcs_upload(file = "foo.rds", bucket = "mybucket"), the file automatically is classified with "Restricted Access". You can see this under the Public Access column of the bucket list view in Google Cloud Storage.

Once this happens, you cannot overwrite the file (or at least I couldn't). For me, every attempt to overwrite the file using the same call to gcs_upload(file = "foo.rds", bucket = "mybucket") resulted in R hanging, waiting on a response.

The trick was to delete the file, then re-upload it using gcs_upload(file = "foo.rds", bucket = "mybucket", predefinedAcl = "bucketLevel") in which case the Public Access would be classified as "Not Public". At this point, I am able to overwrite foo.rds using the same call to gcs_upload(file = "foo.rds", bucket = "mybucket", predefinedAcl = "bucketLevel")

Screen Shot 2021-06-16 at 12 52 30 PM

@MarkEdmondson1234
Copy link
Collaborator Author

Ooooh thanks that makes sense - so the resumable upload needs to have the same ACL permissions as the original upload - which would explain an uptick of these reports when GCS bought in bucket level ACL vs object level.

Is there a change in the code that can be made to make this easier to avoid?

@ben519
Copy link

ben519 commented Jun 16, 2021

Perhaps predefinedAcl = "bucketLevel" should be the default? Not sure what the implications of this would be.

@MarkEdmondson1234
Copy link
Collaborator Author

I finally got a situation where I could make it fail and found a bug for checking the retry, so it should at least attempt a retry now.

MarkEdmondson1234 added a commit that referenced this issue Jan 3, 2022
@stuvet
Copy link
Contributor

stuvet commented Jul 22, 2022

Looks like I've run into this issue while using targets. Most targets were succeeding with repository = 'gcp' & the default predefined_acl = 'private' but larger files were failing unless I set predefined_acl='bucketLevel'.

googleCloudStorageR-specific reprex included below

Setup

Standard Bucket, europe-west2-b region, Uniform access control, No public access, no versioning.

Centos 7 in GCP
R: 4.2.0
Targets: f37af16
Stantargets: 4ee5367
Cmdstan: 2.30.0
CmdstanR: 0.5.2.1
googleCloudStorageR: 0.7.0.9000 (updated after posting targets issue reprex).

Reprex

readRenviron('my_gcs.env')
library(googleCloudStorageR)
#> ✔ Setting scopes to https://www.googleapis.com/auth/devstorage.full_control and https://www.googleapis.com/auth/cloud-platform
#> ✔ Successfully auto-authenticated via my-server-key.json
#> ✔ Set default bucket name to 'my-default-bucket'
my_bucket <- "my-default-bucket"
# Create 5.7MB csv file
payload<-as.data.frame(matrix(rep(1, 3e6), nrow = 1e3))
write.csv(payload, tmpfile<-tempfile())

googleCloudStorageR::gcs_upload(tmpfile, bucket = my_bucket)
#> ℹ 2022-07-22 16:09:20 > File size detected as 5.7 Mb
#> ℹ 2022-07-22 16:09:20 > Found resumeable upload URL:  https://storage.googleapis.com/upload/storage/v1/b/my-default-bucket/o/?uploadType=resumable&name=tmpfile&predefinedAcl=private&upload_id=ADPycdu_o6vVcIQm5iH3g4JtJV5g6LCGPD3b6R9F5y2aZdUl7azw6ovQb1Af9xh4qMIyCapT-GhoRuN-S5-Iep4-h95tS68RC1C7
#> ℹ 2022-07-22 16:09:21 > File upload failed, trying to resume...
#> ℹ 2022-07-22 16:09:21 > Retry 3 of 3
#> Error in value[[3L]](cond): Couldn't get upload status

Created on 2022-07-22 by the reprex package (v2.0.1)

Comments

This looks like it may be a long-standing problem, so perhaps it is tough to resolve for all use-cases? What would be the reasonable resolution?

Should the targets default permissions be 'private', or 'bucketLevel'? Perhaps the success of the other files uploaded with the default acl=private is actually the bug? If it's not readily resolved at this end, perhaps it would be wise to give some guidance in the targets manual @wlandau - it's unexpected for a user since most targets complete successfully with acl=private, so people will end up getting frustrated when targets fail only occasionally.

@MarkEdmondson1234
Copy link
Collaborator Author

MarkEdmondson1234 commented Jul 22, 2022

The default should be bucket level I think since it's by far the most convenient, I think the GCP interface nudges you in that direction when creating the bucket. That level of access is newer though which is why it wasn't default before. There is some logic to retry the fetch with bucket level permissions upon failure since it's so common, which I wonder why hasn't triggered in your case.

I'm finishing writing a book at the moment so am behind on issues.

@MarkEdmondson1234
Copy link
Collaborator Author

MarkEdmondson1234 commented Jul 22, 2022

Ok the logic to retry with bucket level permissions is only in for getting objects, not putting them in.

@stuvet
Copy link
Contributor

stuvet commented Jul 22, 2022

I'll take a look at the retry logic & see if I can figure it out. It's the least I can do for all the hard work you put into the targets integration - I was previously using GCP for everything but targets, so had to add AWS->GCP steps within the pipelines - annoying!

@MarkEdmondson1234
Copy link
Collaborator Author

Much appreciated! And glad to see the target integration with GCP being used in the wild.

@Kvit
Copy link

Kvit commented Sep 2, 2022

Using predefinedAcl option gcs_upload(predefinedAcl = "bucketLevel") solves it for me

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug need-reprex Issues needing a reproduable example to fix
Projects
None yet
Development

No branches or pull requests

6 participants