Read and write the same spring-data-couchbase document from multiple Couchbase clusters #1873

zandrei · 2023-12-05T16:34:17Z

Hello,

I'm trying to implement a way of moving data from one Couchbase cluster to another with a strategy that involves only moving the data when it is accessed (through external user interaction). To do this, I would need to be able to connect to two different Couchbase clusters and be able to map the same Documents to their target cluster + bucket combination. I've seen how we can configure for multiple buckets inside the same cluster here: https://github.com/spring-projects/spring-data-couchbase/blob/main/src/test/java/org/springframework/data/couchbase/domain/Config.java#L136 .
I think changing to a bucket in a different cluster shouldn't be that difficult by providing a connectionString to myCouchbaseClientFactory instead of using the same one for all buckets, but the problem is that the mapping is done by entity type to a CouchbaseTemplate, meaning that the same entity cannot be mapped to two different templates.

I managed to do this in this sample repository: https://github.com/zandrei/couchbase-multiple-clusters .

I've also added a Readme where I highlighted how it can be started and run and what were the issues and impediments. The question would be if there is a better way of achieving this goal, without re-implementing a lot of the framework glue.

Libraries and versions:
spring-data-couchbase 5.1.4
com.couchbase.client:java-client: 3.4.10

Couchbase server: community 6.6.0

mikereiche · 2023-12-05T18:49:47Z

the problem is that the mapping is done by entity type to a CouchbaseTemplate, meaning that the same entity cannot be mapped to two different templates

The mapping is resolved using the entity type of the repository during creation of the repository. So you just need two repositories - one for each cluster - with entity types such that the entity type of the reading repository can be cast to the entity type of the writing repository.

@Document
@TypeAlias("") // no _class=className predicate as the classes are different
public class UserEntity {
	String id;
	String firstname;
	String lastname;
	...

public class UserEntityWrite extends UserEntity {}

public class UserEntityWrite extends UserEntityRead {}

public interface UserEntityReadRepository extends CouchbaseRepository<UserEntityRead, String>

public interface UserEntityWriteRepository extends CouchbaseRepository<UserEntityWrite, String>

Then...

// read from the read bucket
Optional<UserEntityRead> foundUser = (Optional<UserEntityRead>) readRepo.findById(user.getId());
// write to the write bucket
writeRepo.save((UserEntityWrite) foundUser.get());

zandrei · 2023-12-06T08:38:43Z

Hello Mike,

Thank you for providing this alternate direction. While this works for the most basic case, there are several other things to consider:

The algorithm itself is like this: Search in destination cluster, if not found, search in source cluster, if found save in destination cluster and move on. From this perspective, the destination cluster will have both read and write operations and the source cluster will be read only.
Having said that, because of this business logic, everywhere where we use these documents we would need to implement this kind of process (in every service we need to "know" that we need to inject the 2 separate repositories and apply this business logic)
The business case has further complexity when we think about the mapping logic, in the sense that we need to figure out which is the source tenant for each destination tenant based on some configuration properties. Because of this solution, we would need to have this logic inside each client needing to use the individual documents

With the above it means we will still have a maintenance overhead and on top of it we introduce some unnecessary wrappers that would seem "unnatural" in the business part of the aplication. Apart from that, considering that the proposed solution requires to erase the TypeAlias, it will actually make it difficult to use this with other applications (non-java) that would rely on that type information for their own purposes. The solution still feels like a workaround and it requires users of the solution to be aware of this workaround and it's intricate flows.

I still think that a custom repository that would hide all this from it's clients would be the best approach. The linked repository shows that, in the TestService, the user of the solution does not need to do anything special to get this behavior and does not need to know that this behavior is in place at all. The business logic is inside the SampleReactiveRepository in the findById method and is using the defaultOp and a fallbackOp as "destination" and "source" clusters but in order to send this information to the custom repository we needed to do a lot more changes. Is there a simpler way of getting this information in the custom repository implementation to work with it?

Thank you,
Andrei

mikereiche · 2023-12-06T17:49:10Z

Sounds like you have a solution then. I don't have a simpler way.

mikereiche · 2023-12-07T22:11:59Z

TypeAlias doesn't need to be erased. It just needs to not be set to a classname where it would be used for determining the object type - it can still be set to something else and used for other purposes. Regardless of the implementation you will still need logic to do whatever you need done. If you implement your own annotation, factory bean, factory, registrar, extension and repository - and every method that supports this behavior - that involves development and maintenance as well.

The algorithm itself is like this: Search in destination cluster, if not found, search in source cluster, if found save in destination cluster and move on. From this perspective, the destination cluster will have both read and write operations and the source cluster will be read only.

Wouldn't XDCR satisfy this? The destination cluster would always have the document replicated from the read cluster. And client operations would not require a possible second read operation plus a write operation. And the destination cluster would be kept up-to-date even without a client operation. Or without a client at all. Would the non-java clients also implement this read/fallback/write logic?

If the initial read to the destination cluster failed because the cluster was unavailable, wouldn't the subsequent attempt to write the document to the destination cluster also fail?

What about deletes? While XDCR would properly delete the document from both clusters, the read/fallback/write logic would keep restoring it from the read cluster. So you'll need a special "delete" that deletes it from both clusters (preventing the read cluster from being read-only). What about an update followed by a delete? Both the update and the delete would be lost - being replaced by the stale read-only copy. This would be unpopular with customers making bank deposits.

If the goal is to simply get a nearly-consistent copy if the active is not available - then the template findFromReplicasById() should be used. For no replica of a document to be available, all the replica nodes for that document would have to have failed.

The question would be if there is a better way of achieving this goal, without re-implementing a lot of the framework glue.

Is there a simpler way of getting this information in the custom repository implementation to work with it?

When this was first proposed, I remarked that overriding ReactiveCouchbaseRepositoryConfigurationExtension (and therefore ReactiveCouchbaseRepositoryConfigurationExtension and EnableReactiveCouchbaseRepositories) is not necessary (although they might be used in Auto Configuration). Yet it is still in your POC. What were you looking to simplify?

a custom repository that would hide all this from it's clients

A custom repository won't give this behavior to the template api.

For the custom repository - I don't know why the factory getTargetRepository() method is final. If it wasn't, you could just extend the existing couchbase factory. Similarily with the CrudMethodMetadataPostProcessor - if it was public, you could just use it instead of redefining it. Those two things can be changed to make your implementation a little easier.

Finally - Many Couchbase customers use XDCR to replicate data on multiple clusters. XDCR automatically and effortlessly synchronizes clusters through DCP. It does not require any development and does not affect the performance of client applications. And if you need to customize the synchronization you can create Kafka Connector source/sink pairs. The MCA project would have also been useful, but it seems to have been discontinued.

zandrei · 2023-12-13T08:51:00Z

Hello Mike,

Thank you for reverting with this information. I'll try to answer some of the questions and explain some of the specific requirements for this project.

Wouldn't XDCR satisfy this? The destination cluster would always have the document replicated from the read cluster. And client operations would not require a possible second read operation plus a write operation. And the destination cluster would be kept up-to-date even without a client operation.

As specified in my initial description of the business requirement, moving the data should be done only for users that still interact with the application. The application has been around for more than 10 years and users come and go. There is a big number of user-related documents which have not been accessed in a long time. An update to the application will be done and notifications will be sent to existing users about this with the hope of re-engaging with them. If they return, their data needs to be moved. The migration has a fixed period of x months where it will be active and when we will consider that all the users that would have been re-engaged have done so, it will be deactivated. (it is estimated that about 20% of user data is actively used so the potential in the reduction of data is significant).
There is also the need to have some migration specific actions taken (in the example repository we are changing the tenant in the destination cluster, based on configuration properties). Other actions might be needed and applied which are application specific and would be difficult to do in a pure cluster-sync.

Application specific data will be moved using XDCR since in this case all the operations are clear and don't require user interaction.

Would the non-java clients also implement this read/fallback/write logic?

Yes, this logic is already implemented in non-java clients.

If the initial read to the destination cluster failed because the cluster was unavailable, wouldn't the subsequent attempt to write the document to the destination cluster also fail? What about deletes? While XDCR would properly delete the document from both clusters, the read/fallback/write logic would keep restoring it from the read cluster. So you'll need a special "delete" that deletes it from both clusters (preventing the read cluster from being read-only). What about an update followed by a delete? Both the update and the delete would be lost - being replaced by the stale read-only copy. This would be unpopular with customers making bank deposits.

Well there are probably several generic use case described in these questions but there are a few to answer: we can retry the algorithm if the destination cluster is unavailable for read/write operations. We will mark the data that was migrated on both destination and source cluster which would help solve the problem with: "delete on destination cluster, still present in source cluster" problem. Updates will always target the destination cluster.

When this was first proposed, I remarked that overriding ReactiveCouchbaseRepositoryConfigurationExtension (and therefore ReactiveCouchbaseRepositoryConfigurationExtension and EnableReactiveCouchbaseRepositories) is not necessary (although they might be used in Auto Configuration). Yet it is still in your POC. What were you looking to simplify?

I applied your suggestion in the test repository and you can see it on this branch: https://github.com/zandrei/couchbase-multiple-clusters/tree/test-existing-annotation-without-override . Indeed, it worked without overriding the extension and the annotation, but I needed to override the bean definition for BeanNames.REACTIVE_COUCHBASE_OPERATIONS_MAPPING to return the custom mapping that has both a default and a fallback operation property. This was the reason why I initially added the custom extension and annotation because the extension's postProcess method defines this bean as a dependency and initializes it. I guess it's a bit better this way since we are reusing the framework annotation and extension.

I'm open to any further suggestions that can simplify or improve the design for this purpose. Indeed, the changes you suggested to the current state of getTargetRepository() and CrudMethodMetadataPostProcessor would simplify it even further.

Also, the Kafka connect suggestion would help with adding custom logic to cluster synchronization but we would still do it without knowing which data we can move and what needs to be archived.

Thank you,
Andrei

mikereiche · 2023-12-13T16:25:01Z

we can retry the algorithm if the destination cluster is unavailable

If retries on the write would succeed then retries on the read would succeed in the first place. The algorithm needs to work for the actual behavior, not only the behavior which suits the algorithm. And the Couchbase SDK will do retries. We have experience with applications managing retries - mostly debugging applications retry algorithms and circuit breakers and it's not a path we like to go down. We've seen application retry strategies take out a healthy cluster.

By injecting the replication management into the client application, it places a large function and risk into what would otherwise a simple read operation. There are other solutions to processing recently-read documents such as using get-and-touch and eventing. But it's your application.

I'll open an issue for the changes I suggested

mikereiche · 2023-12-21T23:27:00Z

closing this as there #1877 for the changes.

spring-projects-issues added the status: waiting-for-triage An issue we've not yet triaged label Dec 5, 2023

mikereiche added status: feedback-provided Feedback has been provided and removed status: waiting-for-triage An issue we've not yet triaged labels Dec 5, 2023

mikereiche closed this as completed Dec 6, 2023

mikereiche reopened this Dec 7, 2023

mikereiche mentioned this issue Dec 13, 2023

Make CrudMethodMetadataPostProcessor public and make factory getTargetRepository() not final (so factory can be extended) #1877

Closed

mikereiche closed this as completed Dec 21, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Read and write the same spring-data-couchbase document from multiple Couchbase clusters #1873

Read and write the same spring-data-couchbase document from multiple Couchbase clusters #1873

zandrei commented Dec 5, 2023

mikereiche commented Dec 5, 2023 •

edited

Loading

Uh oh!

zandrei commented Dec 6, 2023

Uh oh!

mikereiche commented Dec 6, 2023 •

edited

Loading

Uh oh!

mikereiche commented Dec 7, 2023 •

edited

Loading

Uh oh!

zandrei commented Dec 13, 2023

Uh oh!

mikereiche commented Dec 13, 2023 •

edited

Loading

Uh oh!

mikereiche commented Dec 21, 2023

Uh oh!

Read and write the same spring-data-couchbase document from multiple Couchbase clusters #1873

Read and write the same spring-data-couchbase document from multiple Couchbase clusters #1873

Comments

zandrei commented Dec 5, 2023

mikereiche commented Dec 5, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zandrei commented Dec 6, 2023

Uh oh!

mikereiche commented Dec 6, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikereiche commented Dec 7, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

zandrei commented Dec 13, 2023

Uh oh!

mikereiche commented Dec 13, 2023 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

mikereiche commented Dec 21, 2023

Uh oh!

mikereiche commented Dec 5, 2023 •

edited

Loading

mikereiche commented Dec 6, 2023 •

edited

Loading

mikereiche commented Dec 7, 2023 •

edited

Loading

mikereiche commented Dec 13, 2023 •

edited

Loading