Policy replication fails with high number of ACL rules and resources.
Post upgrade to 10.6.3 when we replicate the master says replication failed. On the recipients it logs a successful replication but the changes are left as pending and don't apply. Also we get Java errors in AMC console until we restart mgmt-server.
*****snippet of management.log*********** Nov 27 05:24:27 127.0.0.1/127.0.0.1 AMC: 2013-11-27 05:24:27 +0000 WARNING com.aventail.mgmt.gbladmin.conversations.SenderConversation - Connection failure trying to start replication conversation: ; nested exception is: <167> java.net.ConnectException: Connection refused Nov 27 05:24:27 127.0.0.1/127.0.0.1 AMC: 2013-11-27 05:24:27 +0000 ERROR com.aventail.mgmt.gbladmin.conversations.ReplicationSenderConversation - Policy replication to lab-internal-3 failed with error: CONNECTION_FAILED
Lot of Java exception errors. There were java memory errors in: receiver-management.log: Dec 3 05:20:04 127.0.0.1/127.0.0.1 java.lang.OutOfMemoryError: Java heap space Dec 3 05:39:36 127.0.0.1/127.0.0.1 java.lang.OutOfMemoryError: GC overhead limit exceeded
Nov 27 05:26:49 127.0.0.1/127.0.0.1 AMC: 2013-11-27 05:26:49 +0000 VERBOSE com.aventail.mgmt.sql.Sql - Query to obtain local user list took 0.211 seconds Nov 27 05:26:50 127.0.0.1/127.0.0.1 AMC: 2013-11-27 05:26:50 +0000 ERROR com.aventail.mgmt.gbladmin.conversations.ReplicationSenderConversation - Policy replication to cpu3-lab-internal-4 failed with error: INSTALL_FAILED Nov 27 05:26:57 127.0.0.1/127.0.0.1 AMC: 2013-11-27 05:26:57 +0000 WARNING com.aventail.mgmt.gbladmin.conversations.ReplicationSenderConversation -
Tracking ID DTS #137424
SonicWall engineering fixed the issue with a workaround (Hand edit) along with a test hotfix. Please contact Technical support for the test-hotfix.
Root cause of the problem is JVM Out Of Memory during replication. To increase the memory allocation, please do the following on BOTH sender and receiver(s) nodes:
1. edit /usr/local/app/mgmt-server/bin/start.sh (line 107: add -Xmx512m after $JAVA_HOME/bin/java) (start.sh attached) 2. #/etc/init.d/mgmt-server/restart 3. Try replicate, we should be able to replicate successfully.
Note: This is officially fixed in 10.6.5 and 10.7.1 already