more armies in limbo...? urgent action needed |
Post Reply
|
Page <1234 6> |
| Author | |
Merlinus
Greenhorn
Joined: 15 Feb 2014 Location: Tennessee USA Status: Offline Points: 81 |
Post Options
Thanks(0)
Quote Reply
Posted: 04 Mar 2014 at 23:39 |
|
I used to write qa code, but I'm glad I'm not writing THIS code. :)
Just a question: Has anybody looked into the .round function--32 v. 64 bit for time lag crashes? Just wondering. |
|
|
In Peace we reign. In War we RULE!
Long live the Royal House of Merlinus! |
|
![]() |
|
Rill
Postmaster General
Player Council - Geographer Joined: 17 Jun 2011 Location: California Status: Offline Points: 6903 |
Post Options
Thanks(0)
Quote Reply
Posted: 04 Mar 2014 at 23:13 |
|
Stormcrow, have you ever considered repositioning Illyriad Ltd. as a bug-finding enterprise? Where you create value by testing the limits of software and hardware architecture and identifying bugs for organizations like Google?
You could put us players on salaries and pay us to break stuff!
|
|
![]() |
|
Angrim
Postmaster General
Joined: 02 Nov 2011 Location: Laoshin Status: Offline Points: 1173 |
Post Options
Thanks(0)
Quote Reply
Posted: 04 Mar 2014 at 23:12 |
|
so what i'm reading is: sieging ELECTROK broke the database.
getting a NULL into a NOT NULL column sounds medal-worthy. Edited by Angrim - 04 Mar 2014 at 23:13 |
|
![]() |
|
GM Stormcrow
Moderator Group
GM Joined: 23 Feb 2010 Location: Illyria Status: Offline Points: 3820 |
Post Options
Thanks(0)
Quote Reply
Posted: 04 Mar 2014 at 23:03 |
EDIT: ty Le Roux for nudging me to give an answer. It's still not a complete answer, but I don't communicate enough atm. Step through is complete, and it appears to have been caused by a broad combination of multiple things happening at the same time:
All of these things put pressure on the database. However, the database is built so that it should work through these things linearly (and it's been under a lot more pressure than this in the past). However, basically, a deadlock that should have been broadly impossible (before I get jumped on, I do understand there's no such word in code) in a nolock read /rowlock write environment produced a NULL in a (db-constrained) NOT NULL column. And other not-directly related things had fits thereafter, based on reading a single NULL data entry that really shouldn't have been there if there was fair and just code deity. This behaviour was unexpected enough for us to have raised a ticket with our db vendors - with whom we have a fairly close relationship - and we've been rapidly through the "are you patched up qs" and it's been escalated into realms beyond our understanding, but will hopefully descend again in a SQL update patch in the near future. We'll keep on applying pressure for a satisfactory resolution, but there is only so much we can do. Somehow we manage to find these issues that have everyone shaking their heads. On a different issue recently we found a (reproducible) bug in .NET that has entailed us actually running our very own, unique version of the core .NET framework to enable us to continue development. I do sometimes wonder if we're not a bit too far on the "extreme" side, but I gather a "fix for all" will make it into the next update. On a more practical level, I can't say it can't and won't happen again, but the areas that exhibited obvious symptoms (such as herb respawning and some armies getting stuck) have been instructed how to bypass such an event so they now understand how to handle NULLs, even in columns that are apparently constrained to be un-NULLable. We've also written some monitoring procedures to actively look for brief glimpses of these creatures that live at the bottom of the Mandelbrot set, and to alert us immediately should they happen in the future. I can't guarantee we've got them all or are looking in all the right places, but it's a stab at it anyway. In terms of a longer term fix, beyond the hopefully forthcoming vendor patch, there are various options we're exploring in terms of re-architecting various core components. Many thanks for your patience, and I apologise if it got a bit technical in some areas above - I know some of you are techies, and many aren't, so I tried to balance decent answers with explanatory ones. SC
Edited by GM Stormcrow - 04 Mar 2014 at 23:05 |
|
![]() |
|
Rill
Postmaster General
Player Council - Geographer Joined: 17 Jun 2011 Location: California Status: Offline Points: 6903 |
Post Options
Thanks(0)
Quote Reply
Posted: 04 Mar 2014 at 19:24 |
|
darpansah, there might be reinforcements in the city from an alliance with which you are NAP'd or confed. Have your alliance leader check your diplomacy.
|
|
![]() |
|
darpansah
New Poster
Joined: 19 Jan 2014 Status: Offline Points: 11 |
Post Options
Thanks(0)
Quote Reply
Posted: 04 Mar 2014 at 19:17 |
|
Made some attcks on red cities report came in saying mission aborted cant attack friendly city...lol....is this a glitch or ??????
Edited by darpansah - 04 Mar 2014 at 19:18 |
|
![]() |
|
Le Roux
Wordsmith
Joined: 30 May 2012 Status: Offline Points: 151 |
Post Options
Thanks(0)
Quote Reply
Posted: 04 Mar 2014 at 19:10 |
|
. . . and ?
|
|
|
|
![]() |
|
GM Stormcrow
Moderator Group
GM Joined: 23 Feb 2010 Location: Illyria Status: Offline Points: 3820 |
Post Options
Thanks(0)
Quote Reply
Posted: 26 Feb 2014 at 10:33 |
^^ qft, and yes - exactly one (or more) of those things. And producing a replication (outside of a live environment under stress) isn't happening as yet. But we have, at least, found the precise moment that a thread deadlock/collision occurred whilst one or more threads were under pressure. However, stepping through without dumps in a hyperthreaded dual hexacore (24-core total) environment isn't a walk in the park :) Still working on it - will keep you posted. SC
|
|
![]() |
|
Rill
Postmaster General
Player Council - Geographer Joined: 17 Jun 2011 Location: California Status: Offline Points: 6903 |
Post Options
Thanks(0)
Quote Reply
Posted: 25 Feb 2014 at 17:25 |
|
I watched the bug develop and one thing I noticed is that 2-3 armies impacted on the square at the same time. I imagine that there is coding that is supposed to deal with this, but maybe it is not quite robust enough? Or doesn't have enough "space" in the code for the number of 0s of the troops involved?
|
|
![]() |
|
Miklabjarnir
Greenhorn
Joined: 07 Mar 2012 Status: Offline Points: 73 |
Post Options
Thanks(0)
Quote Reply
Posted: 25 Feb 2014 at 17:04 |
That kind of thing happens in complicated systems that need to respond to lots of external input. Sometimes a thread is stuck. It may be a race condition or a deadlock because of a missed event in another thread. It is a true hell to debug these things.
|
|
![]() |
|
Post Reply
|
Page <1234 6> |
|
Tweet
|
| Forum Jump | Forum Permissions ![]() You cannot post new topics in this forum You cannot reply to topics in this forum You cannot delete your posts in this forum You cannot edit your posts in this forum You cannot create polls in this forum You cannot vote in polls in this forum |