Wednesday, November 2, 2016

Issues with Replication Queue

Sometimes you may come across to issues with content replication queue. Issue can be at any of 3 levels i.e. Agent level, Service level and corrupt replication queue. Below are the check points for each level to figure out what really went wrong in order to fix them:

1. Issues at Replication Agent Level

First thing to check is the required settings to replication agent. Go to /etc/replication/agents.author.html 



A. Whether agent is enabled? Ensure it is enabled.
B. Verify Transport details e.g. publish server URL, user name, password
C. Verify the trigger tab - The ignore default option should be unchecked unless this agent being used for replication via backend process.
D. Verify the connectivity with the publish instance by clicking "Test Connection".
E. Open the replication log via the "View Log" link and check when the last replication attempt was successful. Take screenshot of the items in replication queue. Try to clear the first item in replication queue, see if it unblocks the replication queue. 
F. Check in CRX Content Explorer, and ensure that there is no /bin/receive node on the publish instance. Otherwise, delete it.
G. Check in CRX Content Explorer, and ensure that there is no /bin/replicate node on the author instance. Otherwise, delete it.

2. Issues at Service Level

To identify issues with replication service, do following things:

a. Disable and enable the replication agent
b. Restart the replication bundle in the Felix console (http://host:port/system/console/bundles/com.day.cq.cq-replication).
c. Restart the Apache Sling Event Support bundle (http://host:port/system/console/bundles/org.apache.sling.event).
d. Restart the Apache Felix EventAdmin (http://host:port/system/console/bundles/org.apache.felix.eventadmin).

3. Issues with corrupt replication queue

Replication Queue is blocked?

Sometimes replication queue is blocked due to issue(s) with a single (or few more) item in replication queue. In this scenario, when you will go to replication queue- it will show Queue is blocked. The root cause might be seen in AEM error logs. So just jump to your error log file and see if you can rectify the root cause of replication error. In case, you are not able the figure out root cause, as a last resort, you can delete the particular entry from replication queue. Start with the very first item in queue and clear it from replication queue. Wait for 30 seconds and refresh to see if queue is unblocked. Repeat it with few other starting items in replication queue until the queue is back in active state. 



In this case, the above does not work then, you need to force the queue clearance by deleting corresponding Sling Jobs. Follow the Adobe link to read in detail- https://helpx.adobe.com/experience-manager/kb/replication-stuck.html

4. Issues with Creating Tmp directory

In case, you have explicitly set an argument like - -Djava.io.tmpdir=/ephetest0/tmp
and you have not created that directory you may see following errors in your log file:

10.10.2017 08:40:52.587 *ERROR* [192.150.10.204 [1507624852318] POST /bin/replicate.json HTTP/1.1] com.day.cq.replication.impl.ReplicatorImpl Error while building replication content.
com.day.cq.replication.ReplicationException: RepositoryException during serialization
at com.day.cq.replication.impl.content.durbo.DurboContentBuilder.create(DurboContentBuilder.java:164) .....
........................
.......................
Caused by: java.io.IOException: No such file or directory
at java.io.UnixFileSystem.createFileExclusively(Native Method)
at java.io.File.createTempFile(File.java:2024)

Resolution to this error is to create the tmp directory at the specified location and give appropriate access.

1 comment:

CDN | Clearing Cloudflare cache

In order to clear Cloudflare cache automatically via code, follow below steps: 1. Develop Custom TransportHandler Develop a custom Trans...