Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to purge post after changing status from Draft to Published #223

Open
predaytor opened this issue Jun 14, 2023 · 19 comments
Open

Unable to purge post after changing status from Draft to Published #223

predaytor opened this issue Jun 14, 2023 · 19 comments
Assignees
Labels
status: actionable Ready for work to begin type: bug Issue that causes incorrect or unexpected behavior

Comments

@predaytor
Copy link

When the status of a post changes from Draft to Published, the query for specific post (project in the example) does not have an associated key as it did before. That is, we can't invalidate the keys, it will always be HIT until the Cache-Control expires.

Even updating the data in such a post does not purge anything because it no longer has an associated key.
All other events are working fine, as far as the post was not in Draft state (or restored from trash) and then turned to Published again.

Querying a completely new created post works fine.


Testing custom Project Post type (single post is present):

Query example (testing via Insomnia):

(projects) http://localhost/wp/graphql?query={projects{edges{node{id,title}}}}
(project) http://localhost/wp/graphql?query={project(id:"our-slug",idType:SLUG){id,title}}


First run:

projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project cG9zdDozNDU=

project query -> MISS
51404600df1e2ceed74fbd14251755e4b7958a369dc28ae551d6e5ff2a64831f graphql:Query cG9zdDozNDU=


Change status to Draft:

[14-Jun-2023 09:21:47 UTC] (graphql_purge) key: cG9zdDozNDU=, event: post_DELETE, user: 1, page: /wp/wp-admin/admin-ajax.php
[14-Jun-2023 09:21:47 UTC] (graphql_purge) key: skipped:post, event: post_DELETE, user: 1, page: /wp/wp-admin/admin-ajax.php

projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project

project query -> MISS
51404600df1e2ceed74fbd14251755e4b7958a369dc28ae551d6e5ff2a64831f graphql:Query


Change status back to Published:

[14-Jun-2023 09:22:52 UTC] (graphql_purge) key: list:project, event: post_CREATE, user: 1, page: /wp/wp-admin/admin-ajax.php

projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project cG9zdDozNDU=

project query -> HIT
51404600df1e2ceed74fbd14251755e4b7958a369dc28ae551d6e5ff2a64831f graphql:Query


Post is updated:

[14-Jun-2023 09:26:35 UTC] (graphql_purge) key: cG9zdDozNDU=, event: post_UPDATE, user: 1, page: /wp/wp-admin/admin-ajax.php
[14-Jun-2023 09:26:35 UTC] (graphql_purge) key: skipped:post, event: post_UPDATE, user: 1, page: /wp/wp-admin/admin-ajax.php

projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project cG9zdDozNDU=

project query -> HIT
51404600df1e2ceed74fbd14251755e4b7958a369dc28ae551d6e5ff2a64831f graphql:Query


New post is published:

[14-Jun-2023 09:32:00 UTC] (graphql_purge) key: list:project, event: post_CREATE, user: 1, page: /wp-json/wp/v2/project/1038

projects query -> MISS
c42fab2f4dc81c75f2e4b6180f1b5d0cdfd0fb22f863231bebab2f2a02bd3e10 graphql:Query list:project cG9zdDoxMDM4 cG9zdDozNDU=

project query (new post)-> MISS
d9154add95f1fabfe6844d1716c23cd1b7ef0c4ca43d1ba867219e16859a88ed graphql:Query cG9zdDoxMDM4


purge-varnish-cache.php:

<?php

function purge_varnish_cache( $purge_keys ) {
    // Determine the URL to be purged based on server host
    $server_host = isset($_SERVER['HTTP_X_FORWARDED_HOST']) ? $_SERVER['HTTP_X_FORWARDED_HOST'] : $_SERVER['HTTP_HOST'];
    // The server must be running Varnish Proxy, we must be able to target the host with a PURGE request (http only)
    $purge_url = 'http://' . $server_host . '/';

    // Create headers for the request with the purge keys
    $headers = array(
        'Xkey: ' . $purge_keys
    );

    // Initialize and configure a cURL request
    $curl = curl_init();

    curl_setopt_array(
        $curl,
        array(
            CURLOPT_URL            => $purge_url,                  // URL to purge
            CURLOPT_CUSTOMREQUEST  => 'PURGE',                     // HTTP method as PURGE
            CURLOPT_RETURNTRANSFER => true,                        // Return response as string
            CURLOPT_HTTPHEADER     => $headers,                    // Set headers
        )
    );

    // Execute the cURL request
    curl_exec( $curl );

    // Check for errors and log error message if any
    if ( curl_errno( $curl ) ) {
        $error_message = curl_error( $curl );
        error_log( print_r( $error_message, true ) );
    }

    // Close the cURL session
    curl_close( $curl );
}

// Add an action hook for purging cache using WP Graphql Smart Cache plugin
add_action( 'graphql_purge', function ( $purge_keys ) {

    purge_varnish_cache( $purge_keys );

}, 10, 1 );

default.vcl Varnish config:

vcl 4.1;

import xkey;
import std;

backend default {
    .host = "app.internal";
    .port = "8080";
}

acl purge {
    "0.0.0.0"/0;  // Allow all IP addresses
    "::0"/0;      // Allow all IPv6 addresses
}

sub vcl_recv {
    # Remove empty query string parameters
    # e.g.: www.example.com/index.html?
    if (req.url ~ "\?$") {
        set req.url = regsub(req.url, "\?$", "");
    }

    # Remove port number from host header
    set req.http.Host = regsub(req.http.Host, ":[0-9]+", "");

    # Sorts query string parameters alphabetically for cache normalization purposes
    set req.url = std.querysort(req.url);

    # Remove the proxy header to mitigate the httpoxy vulnerability
    # See https://httpoxy.org/
    unset req.http.proxy;

    # Add X-Forwarded-Proto header when using https
    if (!req.http.X-Forwarded-Proto) {
        if(std.port(server.ip) == 443 || std.port(server.ip) == 8443) {
            set req.http.X-Forwarded-Proto = "https";
        } else {
            set req.http.X-Forwarded-Proto = "http";
        }
    }

    # Set the X-Forwarded-Host header to the proxy host
    set req.http.X-Forwarded-Host = "proxy-app.internal";

    # Purge logic to remove objects from the cache.
    if (req.method == "PURGE") {
        if (client.ip !~ purge) {
            return (synth(403, "Forbidden"));
        }

        if (req.http.xkey) {
            set req.http.n-gone = xkey.softpurge(req.http.xkey);
            return (synth(200, "Invalidated " + req.http.n-gone + " objects"));
        }
        return(synth(200, "Purged"));
    }

    # Only handle relevant HTTP request methods
    if (
        req.method != "GET" &&
        req.method != "HEAD" &&
        req.method != "PUT" &&
        req.method != "POST" &&
        req.method != "PATCH" &&
        req.method != "TRACE" &&
        req.method != "OPTIONS" &&
        req.method != "DELETE"
    ) {
        return (pipe);
    }

    # Remove tracking query string parameters used by analytics tools
    if (req.url ~ "(\?|&)(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=") {
        set req.url = regsuball(req.url, "&(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "");
        set req.url = regsuball(req.url, "\?(utm_source|utm_medium|utm_campaign|utm_content|gclid|cx|ie|cof|siteurl)=([A-z0-9_\-\.%25]+)", "?");
        set req.url = regsub(req.url, "\?&", "?");
        set req.url = regsub(req.url, "\?$", "");
    }

    # Only cache GET and HEAD requests
    if (req.method != "GET" && req.method != "HEAD") {
        set req.http.X-Cacheable = "NO:REQUEST-METHOD";
        return(pass);
    }

    # Mark static files with the X-Static-File header, and remove any cookies
    # X-Static-File is also used in vcl_backend_response to identify static files
    if (req.url ~ "^[^?]*\.(7z|avi|bmp|bz2|css|csv|doc|docx|eot|flac|flv|gif|gz|ico|jpeg|jpg|js|less|mka|mkv|mov|mp3|mp4|mpeg|mpg|odt|ogg|ogm|opus|otf|pdf|png|ppt|pptx|rar|rtf|svg|svgz|swf|tar|tbz|tgz|ttf|txt|txz|wav|webm|webp|woff|woff2|xls|xlsx|xml|xz|zip)(\?.*)?$") {
        set req.http.X-Static-File = "true";
        unset req.http.Cookie;
        return(hash);
    }

    # No caching of special URLs, logged in users and some plugins
    if (
        req.http.Cookie ~ "wordpress_(?!test_)[a-zA-Z0-9_]+|wp-postpass|comment_author_[a-zA-Z0-9_]+|woocommerce_cart_hash|woocommerce_items_in_cart|wp_woocommerce_session_[a-zA-Z0-9]+|wordpress_logged_in_|comment_author|PHPSESSID" ||
        req.http.Authorization ||
        req.url ~ "add_to_cart" ||
        req.url ~ "edd_action" ||
        req.url ~ "nocache" ||
        req.url ~ "^/addons" ||
        req.url ~ "^/bb-admin" ||
        req.url ~ "^/bb-login.php" ||
        req.url ~ "^/bb-reset-password.php" ||
        req.url ~ "^/cart" ||
        req.url ~ "^/checkout" ||
        req.url ~ "^/control.php" ||
        req.url ~ "^/login" ||
        req.url ~ "^/logout" ||
        req.url ~ "^/lost-password" ||
        req.url ~ "^/my-account" ||
        req.url ~ "^/product" ||
        req.url ~ "^/register" ||
        req.url ~ "^/register.php" ||
        req.url ~ "^/server-status" ||
        req.url ~ "^/signin" ||
        req.url ~ "^/signup" ||
        req.url ~ "^/stats" ||
        req.url ~ "^/wc-api" ||
        req.url ~ "^/wp-admin" ||
        req.url ~ "^/wp-comments-post.php" ||
        req.url ~ "^/wp-cron.php" ||
        req.url ~ "^/wp-login.php" ||
        req.url ~ "^/wp-activate.php" ||
        req.url ~ "^/wp-mail.php" ||
        req.url ~ "^/wp-login.php" ||
        req.url ~ "^\?add-to-cart=" ||
        req.url ~ "^\?wc-api=" ||
        req.url ~ "^/preview=" ||
        req.url ~ "^/\.well-known/acme-challenge/"
    ) {
        set req.http.X-Cacheable = "NO:Logged in/Got Sessions";

        if (req.http.X-Requested-With == "XMLHttpRequest") {
            set req.http.X-Cacheable = "NO:Ajax";
        }

        return(pass);
    }

    # Remove any cookies left
    unset req.http.Cookie;

    return(hash);
}

sub vcl_hash {
    if (req.http.X-Forwarded-Proto) {
        # Create cache variations depending on the request protocol
        hash_data(req.http.X-Forwarded-Proto);
    }
}

sub vcl_backend_response {
    # Inject URL & Host header into the object for asynchronous banning purposes
    set beresp.http.x-url = bereq.url;
    set beresp.http.x-host = bereq.http.host;

    # If we dont get a Cache-Control header from the backend
    # we default to 1h cache for all objects
    if (!beresp.http.Cache-Control) {
        set beresp.ttl = 1h;
        set beresp.http.X-Cacheable = "YES:Forced";
    }

    # If the file is marked as static we cache it for 1 day
    if (bereq.http.X-Static-File == "true") {
        unset beresp.http.Set-Cookie;
        set beresp.http.X-Cacheable = "YES:Forced";
        set beresp.ttl = 1d;
    }

    if (beresp.http.Set-Cookie) {
        set beresp.http.X-Cacheable = "NO:Got Cookies";
    } elseif (beresp.http.Cache-Control ~ "private") {
        set beresp.http.X-Cacheable = "NO:Cache-Control=private";
    }

    # The cached document should be "tagged" using the values returned by the x-graphql-keys header
    if (beresp.http.x-graphql-keys) {
        set beresp.http.xkey = beresp.http.x-graphql-keys;
    }
}

sub vcl_deliver {
    # Debug header
    if (req.http.X-Cacheable) {
        set resp.http.X-Cacheable = req.http.X-Cacheable;
    } elseif (obj.uncacheable) {
        if(!resp.http.X-Cacheable) {
            set resp.http.X-Cacheable = "NO:UNCACHEABLE";
        }
    } elseif (!resp.http.X-Cacheable) {
        set resp.http.X-Cacheable = "YES";
    }

    set resp.http.X-Cache-Hits = obj.hits;

    if (obj.hits > 0) {
        set resp.http.X-Cache = "HIT";
    } else {
        set resp.http.X-Cache = "MISS";
    }

    # Cleanup of headers
    unset resp.http.x-url;
    unset resp.http.x-host;
    unset resp.http.via;
    unset resp.http.x-varnish;
}
@jasonbahl
Copy link
Collaborator

@predaytor do you have GraphQL Object Caching enabled or disabled?

Since you're using varnish, make sure to disable the GraphQL Object Caching.

Can you confirm that it is disabled when you experience this behavior?

@predaytor
Copy link
Author

predaytor commented Jun 20, 2023

@jasonbahl, confirm, I have Object Caching disabled.

Graphql Smart Cache panel

@jasonbahl
Copy link
Collaborator

@predaytor so if I understand correctly, you're querying for a post by slug, but that post is a draft? So you get a null response and no node ID in the cache key response?

How would the client know about that slug if the draft post wasn't already publicly visible?

Can you help me understand more about the flow?

Are you using authenticated requests to query draft posts?

@predaytor
Copy link
Author

predaytor commented Jun 21, 2023

@jasonbahl not exactly. The post has a status of Draft - everything is fine, the query projects gets MISS, without this drafted post), the query project by slug is also (data is null).

But after changing that particular post's status back to Published - the projects query gets a MISS (displaying this post among others, all is fine), but the project query for that slug remains with null data.

Creating a new post (which I assume has no status at all) works fine, all queries get MISS (purge works).

So the problem is only with a query for a specific post (querying by slug in the example) that has a status of Published after being in a status of Draft.

Authorization headers are not used during request execution (using Insomnia GET requests).

@jasonbahl
Copy link
Collaborator

@predaytor ya, so my question is, how is the client making the query for the draft post by slug if the slug isn't known yet?

@predaytor
Copy link
Author

predaytor commented Jun 21, 2023

@jasonbahl, but I'm assuming that after changing the status to Published from Draft, the post should be publicly visible, including slug requests? Sorry if I'm not following correctly.

@jasonbahl
Copy link
Collaborator

Ya, so it seems like you're querying for a draft post by slug:

query { 
  project( id: "draft-project" ) {
    id
    title
  }
}

Since it's a draft, the query gets cached with null as the results, and no Node IDs are added to the cache keys because no nodes were resolved.

So, when the project is published and purge( 'list:post' ) is called, this query is not purged because it didn't have list:post or an individual node ID in the keys, so it remains cached.

So, my question remains, how does the client know about the "draft-project" slug if the draft project isn't publicly queryable?

Like what path is leading the client codebase to executing queries for draft projects by slug if the slug isn't actually known publicly because it's still a draft?

@jasonbahl
Copy link
Collaborator

This is an interesting scenario, though.

Perhaps we need to track nodes that are resolved under the hood but not returned.

Like in this case, the draft node (project:123, for example) was resolved under the hood, but was not returned in the response because a public user cannot see drafts.

But, perhaps we need to output those un-returned Nodes in the X-GraphQL-Keys for cases like this where a node transitions from draft to published and back 🤔

@predaytor
Copy link
Author

Oh, I get it. Will try to investigate more!

@predaytor
Copy link
Author

I have a page with a list of all projects with links to each of them (projects query). For example, one publication received the status of Draft. The following query shows the list without this draft (MISS).

I use varnish soft-purge which revalidate the cache and shows the new version on the next request.

I've also enabled prefetching (using the Astro framework) which pre-renders the HTML, styles, etc. of a particular page on hover in the background (ie our project query is running).

Or the user already knows a certain link to the project (in bookmarks, etc.).

So, if the post is Draft, the user received zero data and request got cached (MISS). Changing the status back to Published does not revalidate the request, the next request will still be a HIT, should be MISS.

@jasonbahl
Copy link
Collaborator

@predaytor ok, thanks for the info.

I think we might need to consider tracking ids of nodes that are resolved but not returned, as proposed here: #223 (comment)

🤔

@jasonbahl
Copy link
Collaborator

fyi, I'm actively working on this. I think I might have a solution. Will try to get an update out soon! 🤞🏻

@predaytor
Copy link
Author

Thanks 🙏

@jasonbahl jasonbahl added the type: bug Issue that causes incorrect or unexpected behavior label Sep 15, 2023
@Sinash
Copy link

Sinash commented Oct 3, 2023

Hi @jasonbahl, is there any update on this bug? I am having the same issue with draft and scheduled posts.

@jasonbahl
Copy link
Collaborator

I thought I had a solution in July, but didn't quite work the way I hoped so it needs to be re-visited.

Definitely interested in contributions if anyone has ideas on how to support this scenario.

Since draft posts are not public, their IDs are never exposed, so their ID is not tracked in the X-GraphQL-Keys, and therefore a cached query for a draft post will not be tagged with the ID of the post, and will not be purged from the cache when that post is published and purge( $postId ) is called.

I'm not sure the best way to handle this.

We could maybe output the IDs of nodes that aren't public in the X-GraphQL-Keys header potentially, allowing for a query for a draft that returns null to still be tagged with the ID of the draft post. 🤔

@jasonbahl
Copy link
Collaborator

My current thought is that if a query asks for node(s) and no nodes are returned, we could return a no-cache header.

🤔

@jasonbahl jasonbahl added the status: actionable Ready for work to begin label Oct 20, 2023
@jasonbahl jasonbahl self-assigned this Oct 20, 2023
@jasonbahl
Copy link
Collaborator

jasonbahl commented Oct 20, 2023

Another idea is that we could return a node:null cache key, and then call purge( 'node:null' ) whenever publish events occur.

This would allow queries with null responses to remain cached until some event purges them, which would likely benefit the server compared to not caching at all. 🤔

i.e.

{
  nodeByUri( uri: "non-existing-page" ) { ... }  
}

would be cached.

Subsequent hits to that query would be cache: HIT, and the server would not have to attempt to resolve it.

Whenever a publish event (publishing a post being the most common one) we'd call purge( "node:null" ) (similar to how we call purge( "list:$type" ) and that would clear these caches.

So it's a balance again of allowing these types of queries to benefit from caching, but also benefit from being cache miss when potentially related actions have occurred.

This also would allow us to NOT expose IDs of non-public nodes in headers, as ideated here: #223 (comment)

@jasonbahl
Copy link
Collaborator

👋🏻 This filter can be used:

add_filter( 'graphql_response_headers_to_send', function( $headers ) {
	$request = \WPGraphQL\Router::get_request();
	if ( ! $request instanceof \WPGraphQL\Request ) {
		return $headers;
	}
	$query_analyzer = $request->get_query_analyzer();
	if ( empty( $query_analyzer->get_runtime_nodes() ) ) {
		$headers['Cache-Control'] = 'no-cache, must-revalidate';
	}
	return $headers;
}, 20, 1 );

This will return a 'no-cache, must-revalidate' header if no nodes are resolved in the request.

Trade Offs

With this filter in place queries for things that are not nodes, such as WordPress settings, would not be cached with this strategy.

However, currently settings are not being properly invalidated once they are cached. . .so whether this is addressed or not there's a "bug" with querying settings and caching at the moment.

Related issues:

@predaytor
Copy link
Author

predaytor commented Feb 19, 2024

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: actionable Ready for work to begin type: bug Issue that causes incorrect or unexpected behavior
Projects
None yet
Development

No branches or pull requests

3 participants