shibboleth-dev - [Shib-Dev] [IdPv3] Clustering & Data Storage

Subject: Shibboleth Developers

List archive

[Shib-Dev] [IdPv3] Clustering & Data Storage

From: Chad La Joie <>
To: Shib Dev <>
Subject: [Shib-Dev] [IdPv3] Clustering & Data Storage
Date: Wed, 2 Mar 2011 12:59:49 +0100

Another discussion that we had during the developer's face-to-face
meeting regarded clustering and data storage. I just wanted to
provide the highlights to the list.

First, initial testing of Infinispan is going well. It's clear that
network splits and merges will still be an issue. There are three
cases to consider:
1) An item was added to one partition but not the other. In this case
we'll simply add the new item to the merged collection.
2) An existing item was removed from one partition but not the other.
This case actually ends up being indistinguishable from first case,
and that's okay. After the merge the removed item will show back up
again but the eviction policies will eventually kick in and dump it.
Until then we're just wasting a bit of memory. In addition the
largest data items have the shortest lifetime so cruft shouldn't
accumulate to badly.
3) An existing item was modified in both partitions. In this case
we'll keep around the most recently modified item.

We then discussed the issue of "what data are we storing in the
cluster and what happens if things go horribly wrong and all that data
is lost?" I said I'd look in to it since I couldn't recall what all
hit the cluster currently. Here's the information on that.

Replay Cache
This stores short lived data about which messages have already been
accepted by the IdP so that they can't be replayed. None of this data
is persistent so if it all goes away nothing really bad happens. This
data, in fact, need not even be replicated but doing so arguably
increases security a bit.

Artifact Mapping
This information maps an artifact to the data to be sent back during a
resolution. It must be replicated across the cluster.

Transient ID Mapping
This data maps a transient ID to an internal user identifier. Scott
has added the ability to create "reversible" transient IDs, similar to
IdP1.3 crypto handles. So this data need not be clustered but doing
so will speed things up some. Additionally, this data could be stored
in the user's session so if whatever mechanism was used to cluster
that could cover this. If this data is lost nothing bad happens.

Persistent ID Mapping
This data maps a persistent ID to an internal user identifier (e.g.,
an LDAP UID). The way that this data is currently constructed
requires it to be replicated. Alternative means of generating this
value (e.g., something akin to the old crypto handle) could remove
this requirement and I think I'll investigate supporting such a
mechanism for v3. If this data is lost all user settings associated
with those IDs are lost and that would be bad.

IdP Sessions
This is the active sessions for the IdP. If back-channel operations
(artifact resolution, attribute queries, and single logout) are to be
supported this data must be put in to the cluster. If only
front-channel operations are needed this data could, if it were small
enough, be written out to cookies. If this data is lost the user has
to log back in.

Conversation State
IdPv3, as mentioned before, will have the notion of a "conversation"
which is just a unit of state data that may span multiple requests but
goes away when the operation (e.g., authentication request) completes.
If this data is not distributed amongst the cluster nodes then a load
balancer will need to ensure that clients are always directed to the
same node for the duration of the conversation (which probably only
lasts a couple minutes at most). If the data is lost the user will
need to restart the conversation. Note, there tends to be quite a bit
of data that accumulates within a conversation so writing this state
out to a cookie is likely not possible.

Attribute Resolver Cache
The LDAP and RDBMS data connectors can cache data retrieved from the
server. This data need not be replicated amongst cluster nodes but,
depending on the LDAP/DB server it may make be beneficial to do this
(if the LDAP/DB is "far" away and the cluster nodes are "closer"). If
the clustered data is lost nothing bad happens.

Attribute Consent:
This data records a user's consent to the release of attribute data.
That consent will also be logged for auditing purposes. This data
should be replicated to cluster nodes and persisted across restarts.
If the data is lost users will be asked for consent again but nothing
worse happens.

Terms of User Acceptance:
Has the same characteristics as attribute consent data.

So, given the above, a few things become clear.

First, like today, if back-channel requests are not supported it will
be possible to run a cluster of v3 IdPs without clustering if minimal
load-balancing is in place.

Second, if clustering is enabled, a catastrophic failure of the
cluster that results in the loss of all data need not cause any truly
horrible side-effects. At worst users will be prompted again for
things they were already asked.

Given all of that I think the clustering, and the distributed data
store built on top of it, look pretty good in v3.

--
Chad La Joie
www.itumi.biz
trusted identities, delivered

[Shib-Dev] [IdPv3] Clustering & Data Storage, Chad La Joie, 03/02/2011
- Re: [Shib-Dev] [IdPv3] Clustering & Data Storage, Michael Schwartz, 03/02/2011
  - Re: [Shib-Dev] [IdPv3] Clustering & Data Storage, Chad La Joie, 03/02/2011
    - Re: [Shib-Dev] [IdPv3] Clustering & Data Storage, Michael Schwartz, 03/02/2011
      - RE: [Shib-Dev] [IdPv3] Clustering & Data Storage, Cantor, Scott E., 03/02/2011
        
        RE: [Shib-Dev] [IdPv3] Clustering & Data Storage, Michael Schwartz, 03/02/2011
        
        RE: [Shib-Dev] [IdPv3] Clustering & Data Storage, Cantor, Scott E., 03/02/2011
      - Re: [Shib-Dev] [IdPv3] Clustering & Data Storage, Michael Schwartz, 03/02/2011

List archive

[Shib-Dev] [IdPv3] Clustering & Data Storage