Python Tales and Plone Stories

4teamwork

Debugging CSRF Protection False Positives in Plone

In light of the recent plone4.csrffixes security fix, I’d like to share some of our experiences in debugging and fixing CSRF protection false positives. Because plone.protects approach for automatic CSRF protection is pretty comprehensive (which is good), it can result in cases where there’s false positives – a dialog that is shown to the user asking them to confirm their intent (to prevent the request forgery), even though no actual CSRF attack has occurred.

Confirm action dialog

We’ve been using the automatic CSRF protection from plone.protect 3.x with Plone 4 for a little more than half a year now, before it was officially supported. We therefore hit quite a few situations where we had to debug false positives caused by a write-on-read, both the ones in stock Plone 4 (which now have been addressed), and ones caused in our own add-ons.

Particularly because of the recently introduced HTTP_REFERER check you should rarely ever hit those false positives any more, but if you do, here’s some techniques for debugging and fixing them.

How plone.protect’s auto CSRF protection works

CSRF protection (automatic or manual) in Plone is done via plone.protect. Plone 4 used to pin plone.protect == 2.x, which put a basic framework for manual CSRF protection in place. plone.protect >= 3.0, which was targeted at Plone 5, then introduced automatic CSRF protection.

The automatic CSRF protection works by requiring requests to include a CSRF token that has been issued by the application.

The basic logic for checking a request is as follows:

If the user is authenticated AND the request caused a DB write, require a valid CSRF token.

If no valid CSRF token can be found, the transaction is aborted and the user is redirected to a confirmation page where they have to confirm their intent by clicking a button and submit the original request again, this time including a valid token. (That confirmation page is only displayed for GET requests with responses of type text/html, for other requests the transaction will simply be aborted).

There’s a few exceptions to this logic for handling special cases, but that’s the gist of it. For all the gory details, see the ProtectTransform._check() method in the plone.protect.auto module.

Now, in order to ensure that legitimate requests that modify the DB contain the required CSRF token, plone.protect applies a transform to (most) responses of content type text/html. This transform inserts a hidden field with a CSRF token into any <form> contained in that response. This means that for most cases where you modify the DB by submitting forms via a POST request, the necessary CSRF token will automatically have been included for you.

What plone4.csrffixes does

The plone4.csrffixes package addresses the ZMI vulnerabilities mentioned in the security advisory by pulling in plone.protect >= 3.x instead of a default Plone 4’s plone.protect == 2.x. This will enable the automatic CSRF protection mentioned above for your entire Plone site and the Zope Application Root.

The actual code in the plone4.csrffixes package just contains necessary changes to avoid false positives with automatic CSRF protection in Plone 4:

So to summarize, the actual automatic CSRF protection happens in plone.protect >= 3.0, while the code in plone4.csrffixes is just Plone 4 compatibility glue to avoid false positives.

Chainlink Fence

How to debug CSRF false positives

Triggering the CSRF protection intentionally

During development and/or debugging, it can be helpful to be able to trigger the CSRF protection dialog intentionally. For example in order to

  • See what your users see if they do get a CSRF protection dialog
  • Style that dialog and test potential overrides to the template
  • Test your own CSRF protection debugging tools
  • Test logging of CSRF incidents

For simple cases, you can simply call the @@confirm-action view and supply it with a value for original_url, for example by visiting http://localhost:8080/Plone/@@confirm-action?original_url=foo

But there may be situations where you really want to put the automatic CSRF protection through its paces during development or debugging. For that purpose, a simple view does the trick:

Code: trigger.py (gist)

Simply call that view by visiting http://localhost:8080/Plone/@@trigger-csrf directly (i.e. not via a link from your actual Plone site, otherwise the HTTP referrer check will whitelist the write).

Inspecting _registered_objects

If you’re developing an add-on, and you suddenly get hit by the CSRF protection dialog, indicating that there is an unexpected write-on-read, it can be quite tricky to figure out where that write is happening.

In order to determine the root cause, you first need to understand one detail about how plone.protect’s automatic CSRF protection works: The ProtectTransform in plone.protect.auto looks at conn._registered_objects to determine whether a database write occured in the current transaction or not. You can basically think of _registered_objects as the list of dirty objects in a ZODB connection: The first time an object is modified, it gets added to _registered_objects.

This allows you to easily get a hold of the offending objects (those that have been written to) by setting a PDB breakpoint in plone.protect.auto and then inspect self._registered_objects() (ignoring the ones listed in safe_oids).

Depending on the modified object(s), that may or may not give you a clue to which part of the code base actually modified that object. In my experience, more often than not, you’re just sitting there and staring at a __dict__ dump or some annotations, not even knowing what on the object got changed, let alone where from.

For situations like these, there’s two approaches that helped me in the past.

Tracking down references to the object

Using Python’s garbage collector, you can get a list of references to any value. Simply call gc.get_referrers(obj) and you’ll get a list of referrers:

1
2
3
(Pdb) import gc
(Pdb) obj = self._registered_objects()[0]
(Pdb) pp gc.get_referrers(obj)

Note that this list will also include your local scope and any references you created while debugging, for example while in PDB.

This approach is usually a long shot, but it might for example help you figure out that some persistent data structure is referenced from some annotations, and you then can follow that up by tracking down references to those annotations, hopefully finding their context.

But still, this doesn’t really lead you to the code causing the write in a reliable, systematic way, that’s why I often go straight for the second approach.

Tracing object registrations

In order to capture writes to persistent objects, I wrote a helper that intercepts calls to a ZODB Connection’s register() method using a call trace function, and dumps a stack trace that should include the strack frame where the DB write was caused.

Code: trace_register.py (gist)

Since this will dump a stack trace every time an object gets first written to, I implemented this helper as a context manager, so it can be used with as narrow a scope as possible (the context manager removes the call trace function in its __exit__() method).

So with the @@trigger-csrf view from above, you’d use it like this:

1
2
    with TraceObjectRegistrations(tb_limit=5):
        self._do_write()

Visiting that view will then dump a stack trace like the following:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
==================================================================
DB write to <PloneSite at Plone> (0x11) from "trigger.py", line 16
==================================================================
  File ".../ZPublisher/Publish.py", line 138, in publish
    request, bind=1)
  File ".../ZPublisher/mapply.py", line 77, in mapply
    if debug is not None: return debug(object,args,context)
  File ".../ZPublisher/Publish.py", line 48, in call_object
    result=apply(object,args) # Type s<cr> to step into published object.
  File "trigger.py", line 11, in __call__
    self._do_write()
  File "trigger.py", line 16, in _do_write
    self.context.myattr = 'foo'

2015-10-11 17:02:17 INFO plone.protect aborting transaction due to no CSRF protection on url http://localhost:8080/Plone/@@trigger-csrf

This should show you the exact location in your code where the DB write was caused.

Notes:

  • This helper is intended for DEBUGGING, not for use in production!
  • This helper will dump stack traces for all calls to register(), not just the ones triggering the confirmation dialog

Barbed Wire

How to fix your code

Don’t cause writes in GET requests

If you can at all avoid it, don’t cause DB writes in your GET requests.

While plone.protect tries to insert CSRF tokens automatically for you in forms, it can’t really do so for plain <a href="..."> links without making all your URLs look hideous.

With GET requests there’s also the possibility of leaking the CSRF token, for example through log files or the HTTP_REFERER (unless your site strictly enforces HTTPS).

OWASP has the following to say about Disclosure of Token in URL:

The ideal solution is to only include the CSRF token in POST requests and modify server-side actions that have state changing affect to only respond to POST requests. This is in fact what the RFC 2616 requires for GET requests. If sensitive server-side actions are guaranteed to only ever respond to POST requests, then there is no need to include the token in GET requests.

Provide CSRF tokens for the GET requests that need them

If you still have a GET request that needs to write, you’ll need to make sure it includes an authenticator token. For that you can use the addTokenToUrl() helper function from plone.protect:

1
2
3
from plone.protect.utils import addTokenToUrl

url = addTokenToUrl(url)

Avoiding writes on read

Lazy initialization of persistent data structures

One common source of a write-on-read is lazy initialization of persistent data structures. Consider the following code:

1
2
3
4
5
6
7
CONFIG_KEY = 'foo-config'

def get_config(self):
    annotations = IAnnotations(self.context)
    if not CONFIG_KEY in annotations:
        annotations[CONFIG_KEY] = PersistentDict()
    return annotations[CONFIG_KEY]

Because annotations[CONFIG_KEY] is lazily initialized the first time it’s accessed, this means that on the first request that attempts to just read the config, you will cause a DB write.

While code like this is generally easier to write, it should be avoided because not only could it trigger the automatic CSRF protection, it can also negatively impact performance and lead to write conflicts.

Some alternatives are:

  • Account for the possibility of the value not existing yet in places where you attempt to access it, and handle that case accordingly.
  • Ensure that persistent data structures like these are being initialized upon object creation, and write the necessary upgrade steps to initialize them on existing objects

Intended writes-on-read

There are also some cases where you actually may want to write some data during an operation that looks like it should be read-only. One case that comes to mind is logging access to resources:

Say you need to keep a journal for your file-like resources that tracks username and timestamp for every download of that file. Unless you store that information outside the ZODB (e.g. on a file system log, in which case you’ll lose transactionality), you will inevitably cause a DB write on every download.

If for some reason you can’t make these actions use POST requests, and can’t include authenticator tokens in the GET request, you’ll have to whitelist those writes.

Whitelisting known writes

If you do find yourself in a situation where you still have a write-on-read you can’t eliminate, you can whitelist that write in one of two ways:

Whitelisting a persistent object that’s being written to

plone.protect includes a function safeWrite(obj, request=None) that whitelists a specific persistent object as a safe write, by adding it to a list of whitelisted objects, and then skips those when checking _registered_objects() for modified objects.

1
2
3
from plone.protect.auto import safeWrite

safeWrite(myobj, request)

This is pretty much the approach we came up with while using plone.protect == 3.x with Plone 4 when it wasn’t officially supported yet. It works well, and allows you to precisely whitelist specific objects, as opposed to disabling CSRF protection completely for the entire request.

Completely disabling CSRF protection for certain requests

If nothing else works, you have the ability to completely disable automatic CSRF protection for an entire request by having it provide the IDisableCSRFProtection interface:

1
2
3
from plone.protect.interfaces import IDisableCSRFProtection
from zope.interface import alsoProvides
alsoProvides(request, IDisableCSRFProtection)

However, this should only be used as a last resort, since it makes the request susceptible to CSRF attacks, and requires you to perform any CSRF protection yourself.

You may be tempted to use this in your functional or integration tests – but even in tests, I’d recommend using IDisableCSRFProtection sparingly, because your tests are an excellent way to discover writes-on-read you didn’t know about, and allow to fix the issue (a false positive) before it reaches production, and potentially scares or annoys customers.

Closing thoughts

Overall I really appreciate the backport of plone.protects automatic CSRF protection to Plone 4. This is something that might not be obvious from the security advisory: Not only have some ZMI issues been fixed, but we also get full, automatic CSRF protection for all our forms, custom or not. In my opinion that’s a huge win, even though there may be a couple integration issues until people get used to dealing with the automatic CSRF protection.

Comments