Websphere MQ Cluster Workload Balancing: messages going to dead letter queue

I have created a WMQ cluster with 3 QMgrs. 2 Full repository and 1 partial repository. Here is the mqsc used:

crtmqm GW
strmqm GW
runmqsc GW
alter qmgr deadq('SYSTEM.DEAD.LETTER.QUEUE')
define listener(gw.listener) trptype(TCP) port(1416) ipaddr(xx.xx.xx.xx)
start listener(gw.listener)
define channel(SYSTEM.ADMIN.SVRCONN) chltype(svrconn)
ALTER QMGR CHLAUTH(DISABLED)
end

runmqsc QM01
alter qmgr repos('DEVELOPMENT.CLUSTER')
end

runmqsc QM02
alter qmgr repos('DEVELOPMENT.CLUSTER')
end

runmqsc QM01
define chl(to.QM01) chltype(clusrcvr) trptype(tcp) +
       conname('xx.xx.xx.xx(1414)') cluster(DEVELOPMENT.CLUSTER) 
end

runmqsc QM02
define chl(to.QM02) chltype(clusrcvr) trptype(tcp) +
       conname('xx.xx.xx.xx(1415)') cluster(DEVELOPMENT.CLUSTER) 
end

runmqsc GW
define chl(to.GW) chltype(clusrcvr) trptype(tcp) +
       conname('xx.xx.xx.xx(1416)') cluster(DEVELOPMENT.CLUSTER) 
end

runmqsc QM01
DEFINE CHANNEL(TO.QM02) CHLTYPE(CLUSSDR) TRPTYPE(TCP) +
       CONNAME('xx.xx.xx.xx(1415)') CLUSTER(DEVELOPMENT.CLUSTER)
end

runmqsc QM02
DEFINE CHANNEL(TO.QM01) CHLTYPE(CLUSSDR) TRPTYPE(TCP) +
       CONNAME('xx.xx.xx.xx(1414)') CLUSTER(DEVELOPMENT.CLUSTER)
end

runmqsc GW
DEFINE CHANNEL(TO.QM01) CHLTYPE(CLUSSDR) TRPTYPE(TCP) +
       CONNAME('xx.xx.xx.xx(1414)') CLUSTER(DEVELOPMENT.CLUSTER)
DEFINE CHANNEL(TO.QM02) CHLTYPE(CLUSSDR) TRPTYPE(TCP) +
       CONNAME('xx.xx.xx.xx(1415)') CLUSTER(DEVELOPMENT.CLUSTER)
end

runmqsc QM02
define qlocal('BACKUP') CLUSTER(DEVELOPMENT.CLUSTER)
define qlocal('PROVIDER') CLUSTER(DEVELOPMENT.CLUSTER)
define qlocal('RESPONSE') CLUSTER(DEVELOPMENT.CLUSTER)
define qlocal('STORE') CLUSTER(DEVELOPMENT.CLUSTER)
REFRESH CLUSTER(DEVELOPMENT.CLUSTER) REPOS(YES) 
end

runmqsc QM01
define qlocal('BACKUP') CLUSTER(DEVELOPMENT.CLUSTER)
define qlocal('PROVIDER') CLUSTER(DEVELOPMENT.CLUSTER)
define qlocal('RESPONSE') CLUSTER(DEVELOPMENT.CLUSTER)
define qlocal('STORE') CLUSTER(DEVELOPMENT.CLUSTER)
REFRESH CLUSTER(DEVELOPMENT.CLUSTER) REPOS(YES)
end

Now I am putting a message to QMgr GW on queue PROVIDER. Please note that GW does not host this queue. It is hosted by QM01 and QM02

amqsput PROVIDER GW

Sadly all the messages are going to dead letter queue of the QMgr GW.

Kindly help to fix this. Any suggestions for debugging will help a lot.

Answers


There are several possible issues here.

The channels are not identically defined. Some have mixed-case names, others have all uppercase names. This may work if you are counting on the lack of quotes to ensure they are all folded to upper case by the QMgr. However, the commands have obviously been edited, at least so far as the CONNAME values are concerned, so I'm not assuming that the resulting objects match.

After creating the cluster, did you check that all channels show as AUTO-EXPLICIT? This is how you know the cluster was properly booted up.

It's also possible that the REFRESH CLUSTER command is causing the outage. This is not required when defining new objects, and in fact is quite disruptive. It waits on the channels to restart and at the point in time that you are running the command, the commands to advertise the new objects have been sent to the repositories but not returned. The REFRESH CLUSTER then requests the channels to stop, possibly in mid-batch, queues up commands for the cluster to delete the information it just received but has not yet replied to, and then sends new commands to the cluster advertising the objects it just deleted. If this sounds confusing, think how the cluster command server on the repository feels.

Remove the REFRESH CLUSTER command from the queue definition script.

Once you ascertain that the channels have all advanced to AUTO-EXPLICIT and removed the REFRESH CLUSTER you can start the actual debugging. It really helps in these cases to look at the DLQ header of the dead messages to find out what reason code is listed. This usually provides sufficient information to find the problem. You can also enable a variety of QMgr diagnostic events and view them using one of the event viewing tools, or look at the error logs on the QMgrs at either end of the channel.


These next suggestions have nothing to do with your diagnostic, other than a QMgr built using Best Practices will in general be less error prone and easier to debug. Here then are some unsolicited MQ Cluster Best Practice recommendations.

  • Abandon the TO.[QMGR] channel names! Use [cluster].[qmgr] names instead, for example DEVCLUS.QM01. This ensures that you will always have channels dedicated to each cluster, even if you have overlapping clusters. It does, however, mean cluster names can't have a . in them and must be < 10 characters.
  • On things that are not full repositories, define an explicit CLUSSDR to only one of the repositories. Should you ever have more than two repositories (for example during a migration) the cluster members will be able to find it this way.
  • Always use quotes in your definitions. If you are having problems getting something to work then definitions that have only one possible interpretation are a shorter path to resolution.
  • Give the channels time to settle when defining a new cluster channel, and verify that it starts and goes to the correct state.
  • Use AMQSPUT to open newly defined remotely-hosted cluster queues for output, but without actually putting messages to them. Make sure you get no errors (i.e. the queues resolve) during the open and then make sure the cluster channels are up. Then execute amqsput and send messages.
  • REFRESH CLUSTER is a command to be used on partial repositories whereas RESET CLUSTER is the command to be used on full repositories. In this case the command is being used improperly and on the wrong type of cluster node. Expect problems.
  • Hopefully in real life there are no application queues hosted on full repository QMgrs. The best possible thing you can do for your cluster is to host the full repositories on dedicated QMgrs - even if those are extra QMgrs on the same host as the application QMgrs. Separating these ensures that cluster operation traffic and application traffic never traverse the same channels. It also makes it possible to patch or upgrade the repositories before the application QMgrs.

Need Your Help

Conversion from RSS to RDF

rss rdf

I don't quite understand RDF, but I am looking to convert RSS feeds from places like yahoo and google into RDF format to display on a website.

How to handle Long Press on Android Navigation Bar menu key?

android navigationbar long-press

I try to handle a long press event on the Navigation Bar (soft keys) to perform an action Inside my Activity. My phone is runing Android 4.4.4.

About UNIX Resources Network

Original, collect and organize Developers related documents, information and materials, contains jQuery, Html, CSS, MySQL, .NET, ASP.NET, SQL, objective-c, iPhone, Ruby on Rails, C, SQL Server, Ruby, Arrays, Regex, ASP.NET MVC, WPF, XML, Ajax, DataBase, and so on.