Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Script for updating eprint records with ORCIDs #1

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

drn05r
Copy link

@drn05r drn05r commented Aug 31, 2022

This is a script I have been using to update the ORCIDs for creators, editors or any other 'orcid_field' with the ORCID set for the user in their user record. It makes some assumptions by default:

  1. The _id sub-field (e.g. creators_id) will be the field to lookup against the user record.
  2. The email field of the user record will be the field that is looked up against.
    Both these assumptions can be overridden.

I have also added a couple of other features like a --check which does not actually update an eprint records and a --contributor-id option that will only update the creator's, editor's, etc. ORCID field where the provided argument matches the value of the user record lookup field (i.e. email by default).

@drn05r drn05r added the enhancement New feature or request label Aug 31, 2022
@drn05r drn05r requested a review from wfyson August 31, 2022 15:14
@wfyson
Copy link
Collaborator

wfyson commented Aug 31, 2022

I've not looked in detail, but does this not do something very similar to

$c->add_dataset_trigger( 'eprint', EPrints::Const::EP_TRIGGER_BEFORE_COMMIT, sub
{
my( %args ) = @_;
my( $repo, $eprint, $changed ) = @args{qw( repository dataobj changed )};
foreach my $role ( @{$c->{orcid}->{eprint_fields}} )
{
return unless $eprint->dataset->has_field( $role."_orcid" );
my $contributors = $eprint->get_value( "$role" );
my @new_contributors;
my $update = 0;
foreach my $c ( @{$contributors} )
{
my $new_c = $c;
#get id and user profile
my $email = $c->{id};
$email = lc( $email ) if defined $email;
my $user = EPrints::DataObj::User::user_with_email( $eprint->repository, $email );
if( $user )
{
#set the orcid if the user has one and the contributor does not
if( ( EPrints::Utils::is_set( $user->value( 'orcid' ) ) ) && !(EPrints::Utils::is_set( $c->{orcid} ) ) )
{
$update = 1;
$new_c->{orcid} = $user->value( 'orcid' );
}
}
push( @new_contributors, $new_c );
}
if( $update )
{
$eprint->set_value( "$role", \@new_contributors );
}
}
}, priority => 50 );
which is a commit trigger that tries to add ORCIDs to various fields based on a lookup to see if the user has an ORCID?

@drn05r
Copy link
Author

drn05r commented Aug 31, 2022

Now you mention it, I sort of remember this but it only updates an eprint if it is already being updated. I guess this will fix Tomasz's problem on the eprints-tech list. Although, you would not know that was going to happen after you saved the eprint and checked back on the metadata. Also, it does not backfill old eprint records, which was the main reason I wrote the script. So it could run as a cron job, as it may have update lots of eprints, if a user with lots of publications adds an ORCID to their user profile. Also this commit trigger assumes the email address is used for creators_id/editors_id

@photomedia
Copy link

So that commit trigger should fill in the ORCID for anything new that's added, IF the email matches the user email and permission for ORCID are set? I need to clarify what the requirements for this commit trigger are to work. About this part: "Also this commit trigger assumes the email address is used for creators_id/editors_id" - what does that mean exactly, and how can I check if the email address is indeed used for creators_id/editors_id?

@wfyson
Copy link
Collaborator

wfyson commented Sep 5, 2022

Hi @photomedia @drn05r, sorry for the delay getting back to you about this, I was on leave the last couple of days of last week!

The commit trigger I mentioned above will essentially check the various contributor fields for an eprint and see if it can match any users with an ORCID set based on a lookup of their email address. As this commit trigger can be found in the repository's local archive (as opposed to core) it should be relatively simple to change which contributor field and user field are used to match contributors to users on but I agree this bit of configuration could be lifted out of the function itself.

This is just an eprint trigger however and it looks like @drn05r's script will make sure all the eprints associated with a particular group of users are updated. I'd suggest this commit could be updated a bit however and just commit the eprints it finds rather than update the various contributor fields directly as the aforementioned trigger should do this, e.g. replace lines 188-209 with the following:

$eprints->map( sub {
    my( $session, $eprint_dataset, $eprint, $params ) = @_;
    $eprint->commit;
});

However it also looks like all of this has been designed to operate within the context of the ORCID Support plugin, i.e. not the Advance plugin. The Advance plugin also has some steps in there to help ensure when a user connects to an ORCID for the first time, their eprints are updated: https://github.com/eprints/orcid_support_advance/blob/dd18186aac77f5187971f65be9e9aee69a183e98/cfg/cfg.d/z_orcid_support_advance.pl#L447-L474

Once again this wouldn't work for users who have already have an ORCID, although in theory with the Advance plugin in use, such users should go and official connect their repository user accounts with ORCID to ensure that only verified ORCIDs are pulled through to the eprints! If you wanted to update some older user records which had ORCIDs but hadn't been officially connected however, a version of @drn05r's script above could be used again, but this time you'd just need to do a $user->commit for every user you wanted to process and the various triggers would take care of the rest.

I hope this all makes sense and let me know if there any further questions. I think the script is a worthwhile inclusion to the plugin so happy to merge it in as it's a useful way to back fill ORCIDs if we're just adding them to users, but we may want to simplify it a bit to just recommit eprints rather than have another copy of the code that is updating contributor fields here as well as in the trigger.

@photomedia
Copy link

@wfyson Thank you for all of those explanations. I did not understand how it was working in the orcid_support_advance plugin, but now I think I do.
Can you please confirm if this is correct?
ORCIDs are associated with contributors/creators:

  • when a new eprint is committed and email in the creator/contributor fields match a user email that has an authenticated ORCID in the repository
  • a new ORCID is authenticated, and the new authenticated ORCID account's email matches on the email with existing creator/contributor fields in other eprints. In this second case, many existing eprints from the past are potentially updated with an ORCID.
    If that is how it works, then theoretically, there may not be a need for a bin script to run, as any "new" authenticated ORCIDs back-fill existing eprints anyway using the trigger.

@wfyson
Copy link
Collaborator

wfyson commented Oct 10, 2022

Hi @photomedia - yes, both of those points are correct:

  1. When an eprint is committed, if the creator/contributor can be associated with a user and that user has an ORCID, the ORCID will be added to the record (see code at:

    #automatic update of eprint contributor fields ($c->{orcid}->{eprint_fields})
    $c->add_dataset_trigger( 'eprint', EPrints::Const::EP_TRIGGER_BEFORE_COMMIT, sub
    {
    my( %args ) = @_;
    my( $repo, $eprint, $changed ) = @args{qw( repository dataobj changed )};
    foreach my $role ( @{$c->{orcid}->{eprint_fields}} )
    {
    return unless $eprint->dataset->has_field( $role."_orcid" );
    my $contributors = $eprint->get_value( "$role" );
    my @new_contributors;
    my $update = 0;
    foreach my $c ( @{$contributors} )
    {
    my $new_c = $c;
    #get id and user profile
    my $email = $c->{id};
    $email = lc( $email ) if defined $email;
    my $user = EPrints::DataObj::User::user_with_email( $eprint->repository, $email );
    if( $user )
    {
    #set the orcid if the user has one and the contributor does not
    if( ( EPrints::Utils::is_set( $user->value( 'orcid' ) ) ) && !(EPrints::Utils::is_set( $c->{orcid} ) ) )
    {
    $update = 1;
    $new_c->{orcid} = $user->value( 'orcid' );
    }
    }
    push( @new_contributors, $new_c );
    }
    if( $update )
    {
    $eprint->set_value( "$role", \@new_contributors );
    }
    }
    }, priority => 50 );
    ). BY default this will look for users based on their email, but this could be configured.

  2. Past eprints are only updated when a user adds an authenticated ORCID with code from the ORCID Support Advance plugin (see code at: https://github.com/eprints/orcid_support_advance/blob/dd18186aac77f5187971f65be9e9aee69a183e98/cfg/cfg.d/z_orcid_support_advance.pl#L447-L474). In this instance a search is done for all eprints which have that ORCID set.

I think the reason why these two triggers exist in two different plugins is because of the slightly different nature of how the two plugins are designed to be used:

ORCID Support, is simply adding ORCID fields to the repository for eprints creators/contributors and users, along with a few handy functions to help get values populated and draw connections between eprints and users where possible.

ORCID Support Advance is designed for repositories to connect to the member API and is much more concerned with authenticated ORCIDs, hence it has functionality for backfilling eprints when a user comes along and connects their repository account with their ORCID account.

Technically there's no reason why the basic ORCID Support function shouldn't carry out this functionality too, but because it doesn't mandate authenticated ORCIDs coming from the member API, there's a risk users could accidentally pollute the eprint metadata with unauthenticated ORCIDs, if the basic plugin allowed all of a user's eprints to be updated as soon as user edited a field in their user account!

I hope this clears things up a bit! The way the plugins have evolved over time perhaps reveals some odd design decisions that need revisiting. Perhaps we should start considering merging the two plugins so there's just the one to worry about which has optional member API functionality!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants