Play Now Login Create Account
illyriad
  New Posts New Posts RSS Feed - more armies in limbo...?  urgent action needed
  FAQ FAQ  Forum Search   Register Register  Login Login

more armies in limbo...? urgent action needed

 Post Reply Post Reply Page  <1234 6>
Author
Merlinus View Drop Down
Greenhorn
Greenhorn
Avatar

Joined: 15 Feb 2014
Location: Tennessee USA
Status: Offline
Points: 81
Post Options Post Options   Thanks (0) Thanks(0)   Quote Merlinus Quote  Post ReplyReply Direct Link To This Post Posted: 04 Mar 2014 at 23:39
I used to write qa code, but I'm glad I'm not writing THIS code. :)

Just a question: Has anybody looked into the .round function--32 v. 64 bit for time lag crashes? Just wondering.
In Peace we reign. In War we RULE!

Long live the Royal House of Merlinus!
Back to Top
Rill View Drop Down
Postmaster General
Postmaster General
Avatar
Player Council - Geographer

Joined: 17 Jun 2011
Location: California
Status: Offline
Points: 6903
Post Options Post Options   Thanks (0) Thanks(0)   Quote Rill Quote  Post ReplyReply Direct Link To This Post Posted: 04 Mar 2014 at 23:13
Stormcrow, have you ever considered repositioning Illyriad Ltd. as a bug-finding enterprise?  Where you create value by testing the limits of software and hardware architecture and identifying bugs for organizations like Google?

You could put us players on salaries and pay us to break stuff!
Back to Top
Angrim View Drop Down
Postmaster General
Postmaster General
Avatar

Joined: 02 Nov 2011
Location: Laoshin
Status: Offline
Points: 1173
Post Options Post Options   Thanks (0) Thanks(0)   Quote Angrim Quote  Post ReplyReply Direct Link To This Post Posted: 04 Mar 2014 at 23:12
so what i'm reading is: sieging ELECTROK broke the database.

getting a NULL into a NOT NULL column sounds medal-worthy.


Edited by Angrim - 04 Mar 2014 at 23:13
Back to Top
GM Stormcrow View Drop Down
Moderator Group
Moderator Group
Avatar
GM

Joined: 23 Feb 2010
Location: Illyria
Status: Offline
Points: 3820
Post Options Post Options   Thanks (0) Thanks(0)   Quote GM Stormcrow Quote  Post ReplyReply Direct Link To This Post Posted: 04 Mar 2014 at 23:03
Originally posted by Le Roux Le Roux wrote:

. . .  and ?

EDIT:  ty Le Roux for nudging me to give an answer.  It's still not a complete answer, but I don't communicate enough atm.
 
Step through is complete, and it appears to have been caused by a broad combination of multiple things happening at the same time: 
  • nightly backups running at the same time as 
  • nightly cleanouts of inactives, and 
  • a very large set of closely timed military and diplo engagements being processed, and
  • at the same time as some fairly chunky map reading going on (still trying to assess whether this is botting or not)
All of these things put pressure on the database.  However, the database is built so that it should work through these things linearly (and it's been under a lot more pressure than this in the past).

However, basically, a deadlock that should have been broadly impossible (before I get jumped on, I do understand there's no such word in code) in a nolock read /rowlock write environment produced a NULL in a (db-constrained) NOT NULL column.  And other not-directly related things had fits thereafter, based on reading a single NULL data entry that really shouldn't have been there if there was fair and just code deity.

This behaviour was unexpected enough for us to have raised a ticket with our db vendors - with whom we have a fairly close relationship - and we've been rapidly through the "are you patched up qs" and it's been escalated into realms beyond our understanding, but will hopefully descend again in a SQL update patch in the near future.  We'll keep on applying pressure for a satisfactory resolution, but there is only so much we can do.

Somehow we manage to find these issues that have everyone shaking their heads.  On a different issue recently we found a (reproducible) bug in .NET that has entailed us actually running our very own, unique version of the core .NET framework to enable us to continue development.  I do sometimes wonder if we're not a bit too far on the "extreme" side, but I gather a "fix for all" will make it into the next update.

On a more practical level, I can't say it can't and won't happen again, but the areas that exhibited obvious symptoms (such as herb respawning and some armies getting stuck) have been instructed how to bypass such an event so they now understand how to handle NULLs, even in columns that are apparently constrained to be un-NULLable.

We've also written some monitoring procedures to actively look for brief glimpses of these creatures that live at the bottom of the Mandelbrot set, and to alert us immediately should they happen in the future.  I can't guarantee we've got them all or are looking in all the right places, but it's a stab at it anyway.

In terms of a longer term fix, beyond the hopefully forthcoming vendor patch, there are various options we're exploring in terms of re-architecting various core components.

Many thanks for your patience, and I apologise if it got a bit technical in some areas above - I know some of you are techies, and many aren't, so I tried to balance decent answers with explanatory ones.

SC


Edited by GM Stormcrow - 04 Mar 2014 at 23:05
Back to Top
Rill View Drop Down
Postmaster General
Postmaster General
Avatar
Player Council - Geographer

Joined: 17 Jun 2011
Location: California
Status: Offline
Points: 6903
Post Options Post Options   Thanks (0) Thanks(0)   Quote Rill Quote  Post ReplyReply Direct Link To This Post Posted: 04 Mar 2014 at 19:24
darpansah, there might be reinforcements in the city from an alliance with which you are NAP'd or confed.  Have your alliance leader check your diplomacy.
Back to Top
darpansah View Drop Down
New Poster
New Poster


Joined: 19 Jan 2014
Status: Offline
Points: 11
Post Options Post Options   Thanks (0) Thanks(0)   Quote darpansah Quote  Post ReplyReply Direct Link To This Post Posted: 04 Mar 2014 at 19:17
Made some attcks on red cities report came in saying mission aborted cant attack friendly city...lol....is this a glitch or ??????


Edited by darpansah - 04 Mar 2014 at 19:18
Back to Top
Le Roux View Drop Down
Wordsmith
Wordsmith
Avatar

Joined: 30 May 2012
Status: Offline
Points: 151
Post Options Post Options   Thanks (0) Thanks(0)   Quote Le Roux Quote  Post ReplyReply Direct Link To This Post Posted: 04 Mar 2014 at 19:10
. . .  and ?
Back to Top
GM Stormcrow View Drop Down
Moderator Group
Moderator Group
Avatar
GM

Joined: 23 Feb 2010
Location: Illyria
Status: Offline
Points: 3820
Post Options Post Options   Thanks (0) Thanks(0)   Quote GM Stormcrow Quote  Post ReplyReply Direct Link To This Post Posted: 26 Feb 2014 at 10:33
Originally posted by Miklabjarnir Miklabjarnir wrote:

 
That kind of thing happens in complicated systems that need to respond to lots of external input. Sometimes a thread is stuck. It may be a race condition or a deadlock because of a missed event in another thread. It is a true hell to debug these things.
^^ qft, and yes - exactly one (or more) of those things. 

And producing a replication (outside of a live environment under stress) isn't happening as yet.  

But we have, at least, found the precise moment that a thread deadlock/collision occurred whilst one or more threads were under pressure.  However, stepping through without dumps in a hyperthreaded dual hexacore (24-core total) environment isn't a walk in the park :)

Still working on it - will keep you posted.

SC
Back to Top
Rill View Drop Down
Postmaster General
Postmaster General
Avatar
Player Council - Geographer

Joined: 17 Jun 2011
Location: California
Status: Offline
Points: 6903
Post Options Post Options   Thanks (0) Thanks(0)   Quote Rill Quote  Post ReplyReply Direct Link To This Post Posted: 25 Feb 2014 at 17:25
I watched the bug develop and one thing I noticed is that 2-3 armies impacted on the square at the same time.  I imagine that there is coding that is supposed to deal with this, but maybe it is not quite robust enough?  Or doesn't have enough "space" in the code for the number of 0s of the troops involved?
Back to Top
Miklabjarnir View Drop Down
Greenhorn
Greenhorn
Avatar

Joined: 07 Mar 2012
Status: Offline
Points: 73
Post Options Post Options   Thanks (0) Thanks(0)   Quote Miklabjarnir Quote  Post ReplyReply Direct Link To This Post Posted: 25 Feb 2014 at 17:04
Originally posted by Tatharion Tatharion wrote:

Thank you for your prompt answer GM Stormcrow.

Military impacting/impacted by herbs non re-spawning ????

It feels like your algorithm is alive and seriously kicking.

Best of luck in deciphering  that odd bug.

Best,

Tath

That kind of thing happens in complicated systems that need to respond to lots of external input. Sometimes a thread is stuck. It may be a race condition or a deadlock because of a missed event in another thread. It is a true hell to debug these things.
Back to Top
 Post Reply Post Reply Page  <1234 6>
  Share Topic   

Forum Jump Forum Permissions View Drop Down

Forum Software by Web Wiz Forums® version 12.03
Copyright ©2001-2019 Web Wiz Ltd.