Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compare values of two columns based on complex conditions #176

Open
josmos opened this issue Sep 18, 2023 · 3 comments
Open

Compare values of two columns based on complex conditions #176

josmos opened this issue Sep 18, 2023 · 3 comments
Labels

Comments

@josmos
Copy link

josmos commented Sep 18, 2023

I have a rather complex function comparing (possibly partial) date strings:

compare_partial_dates <- function(date1, date2, missing_value_pattern = "nk",  sep = ".") {
  no_y_pat <- paste(missing_value_pattern, missing_value_pattern, missing_value_pattern, sep = sep) #  nk.nk.nk
  no_m_pat <- paste(missing_value_pattern, missing_value_pattern, "", sep = sep)  #  nk.%m.%Y
  no_d_pat <- paste(missing_value_pattern, "", sep = sep) #  nk.nk.%Y
  if (is.na(date1) || is.na(date2)) {
    # missing date: no comparison possible
    return(TRUE)
  } else if (str_starts(date1, no_y_pat) == TRUE || str_starts(date2, no_y_pat) == TRUE) {
    # nk.nk.nk.: no comparison possible
    return(TRUE)
  } else if (str_starts(date1, no_m_pat) == TRUE || str_starts(date2, no_m_pat) == TRUE) {
    # missing month: set both dates to 01.01.%Y
    date1 <- paste("01", "01", substr(date1, nchar(date1) - 3, nchar(date1)), sep = ".")
    date2 <- paste("01", "01", substr(date2, nchar(date2) - 3, nchar(date2)), sep = ".")
  } else if (str_starts(date1, no_d_pat) == TRUE || str_starts(date2, no_d_pat)) {
    # missing day: set both dates to 01.%m.%Y
    date1 <- paste("01", substr(date1, nchar(date1) - 6, nchar(date1)), sep = ".")
    date2 <- paste("01", substr(date2, nchar(date2) - 6, nchar(date2)), sep = ".")
  }
  # convert to numeric date
  date1 <- as.Date(strptime(date1, format = "%d.%m.%Y", tz = "UTC"))
  date2 <- as.Date(strptime(date2, format = "%d.%m.%Y", tz = "UTC"))

  # print(paste(date1, operator, date2, sep = " "))
  # compare the numeric date values:
   return(date1 <= date2)
}

I have a lot of date-columns to compare. Making rules with simple expressions for each column combination would be a mess.
Is it possible to make this comparison with validate using a function like this (or similar one)? How could this be implemented?

@markvanderloo
Copy link
Member

Hi There, for any function f(...) that returns a logical vector you can create a rule like this

rules <- validator( f(x,z) == TRUE)

if you need to compare, say variables x and y to z, than you could use a variable group like so:

rules <- validator(
  G := var_group(x,y)
, f(G,z)
)

@markvanderloo
Copy link
Member

The other option is to generate the rules in a file and read them later.

template <- "f(%s,z)"
txt <- paste(sprintf(template, some_vector_of_names), collapse="\n")
write(txt, file="rules.R")
rules <- validator(.file="rules.R")

@akuhnle
Copy link

akuhnle commented Oct 31, 2023

I have a similar issue, in a previous version I was able to use the inline function A %==% B within rules, this seems to no longer be the case. Do I have to rewrite all rules that used this function to something like eq(A,B) == TRUE?


`%==%`<- function(e1,e2){ 
  if(length(e1) == length(e2)){
    isEqual <- e1 == e2 | (is.na(e1)) & (is.na(e2))
    isEqual[is.na(isEqual)] <- FALSE
    return(isEqual)
  }
  else{
   return(FALSE)
}

Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants