Tuesday, September 6, 2011

T-Replication 101

Well there have been a lot of articles for starting with transactional-replication. But there have been rarely an article for fixing the common issues one runs into. That's the motivation for this article. Actually I had also run into these issues a couple of times but used to drop the whole replication and bring it up again. But then one fine day I decided to find out the basics and try to help others on this.

After configuring the distribution, publication and subscription when you start the replication monitor you may encounter the below mentioned error.

Error authenticating proxy tushar-hub\lifercks, system error: Logon failure: unknown user name or bad password.). The step failed.

Well the reason may be you have put a windows account with some wrong password. Well the immediate fix can serve as you go to the agent job and in the step('Run agent') where the agent starts; use the 'Run As' in the property and change to run under the sql server service account. This should fix the issue. I am presuming here that your sql server agent service account is running under a domain account with all security checks.

http://social.msdn.microsoft.com/Forums/en/sqlreplication/thread/42229a2c-a9ef-4082-806c-3381e9f6d510

The next most common error will be the replication is failing.
Error :
The row was not found at the Subscriber when applying the replicated command. (Source: MSSQLServer, Error number: 20598)

The issue is some application/user updated/deleted/inserted some row directly in the subscriber. Then when there is a change in publisher and the distributor tries to push those changes into subscriber the error shows up as there is inconsistancy between subscriber and publisher.

To fix this there are a couple of steps we have to follow.
1. To find the sequence number and then the articles involved in the mismatch.
2. Find the data miss match

Now to find the lsn number we will use the query mentioned below.

select * from distribution..MSrepl_errors with(nolock) where time> '20110921' order by time desc

here '20110921' is the time stamp from when we suspect the error has occured.

Now when we have the sequence number we fire the next query to identify the objects which are in suspect.

exec distribution..sp_browsereplcmds @xact_seqno_start='0x00023C6800000BA3000C00000000', @xact_seqno_end='0x00023C6800000BA3000C00000000'

here 0x00023C6800000BA3000C00000000 is the lsn number.

Now since we have the objects we can use the table diff utility to find the actual data issues we have



to be continued.

No comments:

Post a Comment