Friday, June 13, 2014

It's not just the data that's big

It's not just the data that's big

Big Data ! How cool is that ! Imagine all the things that your big data can tell you !

All you have to do it implement it, right ? How hard can that be ??

Answer: It’s VERY hard indeed for such a simple concept.

But why is it so hard ?

Largely, it’s because of the choices you’ll have to make, the platforms you deliver on, and the suppliers you’ll have to deal with.

Sure, there are lots of big data solutions, just like there are lots of big data consultancies, but ALL of them come at a cost, no matter how ‘free’ they are.

Just by way of further reading, I recommend this:

It’s a short article, detailing the general areas you’ll need to be considering, such as Infrastructure, Analytics and Applications, and is well worth a look.

The main thing you might realise while reading it is the sheer scale of the big data business. A few short years ago, you had your RDBMS of choice (SQL Server, DB2, Oracle,…), some flat files out of your legacy system, and some spreadsheets that Dave in Accounts was using to track how many toner cartridges he needed to order.

Now you got everything. Literally everything. And some other stuff besides that you didn’t even know existed.

Picking the right platform from that lot seems like a Herculean task. Indeed, maybe there is no ‘right’ platform ? And if not, how many platforms are too many? Then, it’s not just about the platform your data is currently stored in, it’s also about integrating those data sources to really deliver valuable insight to your business in simple and flexible way.

Perhaps the question we need to be asking is ‘What does our data look like, and what does the business need it to look like?.

From that question, you can start to see where you want to be. Getting there is still going to be tough, but at least you’ll have an idea of what success might be.

And perhaps, if you can have some fun along the way, that wouldn’t be a bad thing either J

Back soon…

Wednesday, April 23, 2014

SQL Trace Flag 3004 and BACKUP database

SQL Trace Flag 3004 and BACKUP database

Quick one today � Someone asked me if there's a starting equivalent of this message:

04/23/2014 11:46:53,Backup,Unknown,Database backed up. Database: foo<c/> creation date(time): 2013/11/07(12:19:59)<c/> pages dumped: 363<c/> first LSN: 44:888:37<c/> last LSN: 44:928:1<c/> number of dump devices: 1<c/> device information: (FILE=1<c/> TYPE=DISK: {'C:\Temp\foo.bak'}). This is an informational message only. No user action is required.

You know, so we can track when a database backup starts, and what it's doing while backing up… and stuff…

There's plenty on trace flag 3004 for RESTORE operations from a backup file so you can identify which areas of your restore operations are taking a while, such as zeroing out and creating devices, etc, but not much on the BACKUP side of things. All I could see was that the trace flag would output detail to the error log if invoked thusly:

DBCC TRACEON (3004,3605,-1)

Take a look at for further reading on that. And there's plenty others as well, like

I was a bit stumped for the BACKUP bit, so I asked the #sqlhelp tag on twitter if there was another flag for backup. Thanks to @SQLKiwi, turns out it's the same one.

So, with that trace flag turned on as above, the output you get in the SQL error log is far more detailed (I've turned the list so it reads in the correct event order):

Date    Source  Severity        Message
04/23/2014 11:47:51     spid52  Unknown BackupDatabase: Database foo   
04/23/2014 11:47:51     spid52  Unknown Backup: Media open     
04/23/2014 11:47:51     spid52  Unknown Backup: Media ready to backup  
04/23/2014 11:47:51     spid52  Unknown Backup: Clearing differential bitmaps  
04/23/2014 11:47:51     spid52  Unknown Backup: Bitmaps cleared
04/23/2014 11:47:51     spid52  Unknown BackupDatabase: Checkpoint done
04/23/2014 11:47:51     spid52  Unknown Backup: Scanning allocation bitmaps    
04/23/2014 11:47:51     spid52  Unknown Backup: Done with allocation bitmaps   
04/23/2014 11:47:51     spid52  Unknown BackupDatabase: Work estimates done    
04/23/2014 11:47:51     spid52  Unknown Backup: Leading metadata section done  
04/23/2014 11:47:51     spid52  Unknown Backup:Copying data    
04/23/2014 11:47:51     spid52  Unknown Backup: DBReaderCount = 1      
04/23/2014 11:47:51     spid52  Unknown Started file C:\Program Files\Microsoft SQL Server\MSSQL11.TEST1\MSSQL\DATA\foo.mdf    
04/23/2014 11:47:51     spid52  Unknown Completed file C:\Program Files\Microsoft SQL Server\MSSQL11.TEST1\MSSQL\DATA\foo.mdf  
04/23/2014 11:47:51     spid52  Unknown BackupDatabase: Database files done    
04/23/2014 11:47:51     spid52  Unknown �正灵��慢�退%s : Log files done<nl/> : Log files done      
04/23/2014 11:47:51     spid52  Unknown Backup: Trailing config done   
04/23/2014 11:47:51     spid52  Unknown Backup: MBC done       
04/23/2014 11:47:51     spid52  Unknown BackupDatabase: Writing history records
04/23/2014 11:47:51     Backup  Unknown Database backed up. Database: foo<c/> creation date(time): 2013/11/07(12:19:59)<c/> pages dumped: 362<c/> first LSN: 44:1040:37<c/> last LSN: 44:1072:1<c/> number of dump devices: 1<c/> device information: (FILE=1<c/> TYPE=DISK: {'C:\ Temp\foo.bak'}). This is an informational message only. No user action is required. 
04/23/2014 11:47:51     spid52  Unknown Writing backup history records 
04/23/2014 11:47:51     spid52  Unknown BackupDatabase: Finished       

With this detail, it's relatively simple to alert on when a BACKUP event starts, or any part of that backup event, and you could also use it for identifying which parts of your database backup are taking a long time.

Just though I should write something up for BACKUP as well as RESTORE…

Back soon….

Tuesday, January 14, 2014

The Database Tuning Advisor - Schroedinger's Sledgehammer

The Database Tuning Advisor - Schroedinger's Sledgehammer

Database performance tuning is bit of a dark art to most people. Hell, even the best of use gets it wrong sometimes, but that’s usually mitigated by experience. Experience of several late nights and busy days trying to bring the production database back to life, but that’s a whole other story.

We were talking the other day about index fragmentation, and the impact of clustered index inserts on non-clustered indexes and database snapshots. (I know, it doesn’t get much more rock’n’roll than that, right ?)

Trawling through a vendor’s database, we happened to take a look at the indexes on certain tables and we could see that most of them are prefixed ‘_dta_’. Oh, the humanity ! The dread hand of the Database Tuning Advisor (DTA) was witnessed, and it sent a shudder though all those present. Either that, or the air-con went dappy again.

The DTA is a VERY useful tool when developing and tuning your databases. You get a database load, run it through the DTA, and it’ll recommend some indexes and perhaps some statistics that may help when you go live. Hoopy, right ? I mean, you’d be pretty excited if there was a tool that can solve all your performance problems (real or imagined) BEFORE you unleash it on your estate, right ?

The answer, as always, is… (altogether now) ‘It Depends’. Not all recommendations are to be taken as gospel, and the DTA output should be taken as a starting point for your eventual implementation. The DTA doesn’t know how your data will change, how many users will be hitting it, and, most of all, it doesn’t know about those last minute ‘enhancements’ requested by the business at the last minute.

In this way, the DTA could possibly be described as Schroedinger’s Sledgehammer.

We’re all familiar with the famous cat that was used in the theoretical experiment. Wait, you’re not ? Well, lookee here :'s_cat

TL:DR – You can’t tell what’s happened to the cat without opening the box. Sure, you can guess, but you can’t actually know. Theorise all you like, the cat’s state remains undefined until you observe the system (comprised of the box, the radioactive material, the vial, the hammer, and that poor kitty), and you can’t observe the system until you alter it (ironically, by taking a look to see what’s happened).

It is my contention that the DTA could be seen as a quantum thought experiment, in that you have some knowns (the data / the workload / the design), some suppositions (IO performance / CPU speed / constant workload), but you never really know what will happen until you observe those details in the live system.

(The Copenhagen Interpretation (q.v.) would seem to suggest that your database will be both performant and non-performant at the same time, but we know that not to be the case.)

What we DO know is that there is a performance impact on updating multiple non-clustered indexes for every clustered index, and that there are ways of identifying indexes that haven’t been used at all and are just sitting there taking up your storage and IO (sys.dm_db_index_usage_stats  and a plethora of others)

So, there is a risk, as often happens, that the DTA will recommend indexes and stats that turn out to be sub-optimal when implemented in production. Kind of like using a sledgehammer to crack a nut.

By all means, use the DTA to guide you in the right direction. But use it more like a divining rod that points the way than a big stick with which to beat your database into performance.

And if you ever see indexes in a production database that start with ‘_dta_’ spare a thought for the poor developer they were probably only trying their best, after all !

Back soon…

Thursday, January 9, 2014

Fun things to do in 2014

Fun things to do in 2014

It's that time of year, now that spring is in the air….

Scratch that – must have been a hangover from an earlier time. But, it *is* a New Year, and I thought I'd jot down some thinks I'll be thinking about over the next 12 months. Because I can.

1.      SQL 2014 (Hekaton)

      There's been a whole lot of buzz over 2013 about the next release of SQL Server, now commonly called SQL 2014, but sometimes referred to as Hekaton.

      The word Hekaton is derived from the greek word for ten-fold. Actually, I think it's a hundred-fold, but that doesn't seem to matter that much to those happy guys and gals at Microsoft. The inference from the name is that you get a *massive* performance boost using Hekaton as your core engine.

      Not surprisingly, this is utter cobblers, and there's been quite a bit of back pedalling from MS on the subject. What you get in Hekaton is 'memory-optimised' tables, column-store indexes, Vertipaq, and X-Velocity structures right in the database. You also get natively compiled stored procedures, but more on that later, I think.

      But it's certainly something we all need to be aware of – Hekaton is coming, whether we like it or not. I'll write a session up for sometime later in January. No, I WILL. I promise….


2.      Big Data

      Another buzzword of 2013 was 'Big Data'. Yeah, right. Big Data isn't about having huge amounts of data that you need to do something with. It's more about finding out what data you currently capture but have no use for, and finding out what that data may be able to tell you.

      For example, you know that new account openings are at an all time high, and that the average balance in Cash ISA is £1500. What we also capture behind these numbers are demographic and geographic data that can be plotted in any number of software packages and analysed to death, such as if you live in Surrey, your Cash ISA is likely to have a balance greater than one held in Sunderland.

      Big Data can also be used to let you know there are things you don't know, and lets you ask business questions. Like *why* is there a low take-up of FlexPlus in Hampstead, yet the highest average Credit Card available spend is on the Wirral.

      Again, there's a whole lot more to this, but I ask you to bear the following in mind:

          What can our data actually tell us, and how can we design our systems to support Big Data type analysis.

      It's all in the design – get that right, and your Big Data headache is less of a problem


3.      Small Data

       You know when you get a spreadsheet from someone that details some 2,500 servers with something wrong, but there's little or no supporting information to let you act quickly ? Don't you just hate that, to have to filter / sort / arrange the data before you can start working on what you need to do ?

       Well, when you provide data to anyone, be that a project / support contact or a huge distribution list, have a think about what you're actually giving them, and try to imagine the way in which they'll use the data. Sometimes, formatting an Excel sheet as a table is the simplest yet most time saving thing you can do for others.


4.      Information Flow

      I know I've not posted much on the blog recently. There's hods of useful stuff in our heads (presumably), so let's get it out there. I had an interesting discussion with a colleague the other day about deleting records through a view having a different, seemingly more efficient, execution plan.

      Just writing stuff down helps you remember it as well !


5.      Be nice to people. Sometimes.

       OK, so I'm not getting fluffy over this. It's hard to remain smiley all the time (for some people), and sometimes someone needs a slap. But I'm trying to make the former my default response, not the latter.


So, Happy New Year, and watch this space for stuff. There's a lot of stuff about, and it's getting stuffier all the time. And if that's not your thing, try holding a chicken in the air and sticking a deckchair up your nose.


Back soon…

Tuesday, May 7, 2013

Process, process, process...

Process, process, process...

I'm a fan of both documentation and agreed processes. Really. A big fan. There's nothing quite like it. OK, so I'm a DBA and we don't normally document stuff or follow due process, but I really wish we could. It would make things so much easier.

There is a major challenge with this - it's not easy. We all WANT to have everything nice, tidy and available, but we generally rail against processes that don't do what we want, when we want them to. It does seem that a lot of processes and standards exist because then someone can place a tick in a box at audit time (and those timese are becoming more and more frequent, I can tell you).

A business process is usually aligned with a methodology. ITIL, COBIT, PRINCE, TOGAF, etc are all great tools to help you get a handle on what you're doing, how to evidence that you're adhering to to process, and provide auditors with warm feelings inside.

However, when the process just doesn’t make sense (even if it ticks all the boxes), you're less likely to collect good data about your project's performance against key deliverables and KPIs. If the process only exists for it's own sake, then you should really take a long hard look at that process and see if it could be done another way or removed altogether. This, I fear, is a stumbling block for many. Wouldn't it be great if we could just do the job, and the process takes care if itself ? I know I would. I'd also buy-in to a process that was designed with my needs in mind, and if the business needs could also be met from the same data, then all the better.

Got a couple of quotes for you here. One from Albert Einstein, and one from a good friend:

Einstein : "Things should be simple, not simpler."

Friend : "The trick with PRINCE2 is knowing which bits NOT to implement."

Einstein was right on the money that. If the process isn't simple, change the process. All well and good, but try not to get hung up on simplifying EVERY process. There's an iterative approach to evaluating what you're doing, and that's really important. It's not enough to say that you have always done it that way, so that's how it'll be done every time in the future. If you always do what you've always done, you'll always get what you've always got.

My friend was also spot on in his assesment of methodologies - why would you need to run every project to strict principles, when not all of the those principles can be applied to every project ? To quote Monty Python, "Universal affirmatives can only be partially converted : All of Alma Cogan is dead, but only some of the class of dead people are Alma Cogan".

I'll give you a quick example, if I may ?

Today I was passed a document for review. It had a RACI matrix, scope, executive summary, all the things that you need keep your auditors happy. Except content.

That document only contained a link to a spreadsheet that actually had all the data in. The data that was collected and confirmed by those in the RACI matrix.

The problem here is that there's no process for expressing the useful data in the methodology being applied, so the first document only exists to complete the process, and nothing else. It has no other useful meaning. Further, the REAL data has no project metadata associated with it, and as such is considered by the project as 'junk data'. If it's junk, why does it exist ? If it's not junk, how could it be properly classified ?

Perhaps the solution may be to revisit the project processes and standards to make them simple, or perhaps not apply all principles to all projects ?

I'm not sure, but reviewing one appropriate document has to be more efficient that reviewing two that have no actual value.

And it's easier.

Back soon…