To Proc or Not to Proc

I’ve had some interesting conversations and fun arguments about how to author queries for SQL Server Report Services (SSRS) reports. There are a lot of professionals out there who really want hard fast answers on best practices. The challenge with SSRS is the multitude of configurations available for the system. Is everything (Database Engine, SSAS, SSRS, and SSIS) on one box? Is every service on a dedicated box? Is SSRS integrated with a SharePoint cluster? Where are the hardware investments made in the implementation?

Those are a lot of variables to try and make universal best practices for. Lucky for us Microsoft provided a tool to help troubleshoot report performance. Within the Report Server database there is a view called ExecutionLog3. ExecutionLog3 links together various logging tables in the Report Server database. Here are some of the more helpful columns exposed.

  •          ItemPath – The path and report names that was executed in this record.
  •          UserName – The User the report was ran as.
  •          Format – Format the report was rendered in (PDF, CSV, HTML4.0, etc.)?
  •          Parameters – Prompt selections made.
  •          TimeStart – Server local date and time the prport was executed.
  •          TimeEnd – Server local date and time the report finished rendering.
  •          TimeDataRetrieval – Amount of time in milliseconds to get report data from data source.
  •          TimeProcessing – Amount of time in milliseconds SSRS took to process the results.
  •          TimeRendering – Amount of time in milliseconds Required to produce the final output (PDF, CSV, HTML4.0, etc.)
  •          Status – Succeeded, Failed, Aborted, etc.

I always provide two reports based on the information found in this view. The first report utilizes the time columns to give me insight into how the reports are performing and when the systems peaks utilization. The second report focuses on which users are using what reports to gauge the effectiveness of the reports to the audience.

Generally I’m a big fan for stored procedures, mostly because my reports are usually related to a common data source and stored procedures provide me with a lot of code reuse. Standardizing, the report prompt behavior with stored procedures is also a handy tool. A simple query change can cascade to all the reports that use a stored procedure, alleviating the need to open each report and perform the same change. Additionally, I like to order the result sets in SQL not after the data is returned to the report. But that doesn’t mean that you’re not going to find better performance moving some functionality between tiers based on the results you find in ExecutionLog3.

I’m sorry there just isn’t a one size fits all recommendation for how SSRS reports are structured. Which means; 1 you’ll have to do some research on your configuration, and 2 don’t accept a consultant’s dogma on the topic.

Agile: The Consultant’s Savior

Thanks to Jeff Nall for contributing to this post.

How many times have you delivered to spec and hit your milestone only to get the “That’s not what I asked for.” feedback? Guess what, your customers don’t always know what they want much less what they really need. They might think they do, but if that were the case they would have been able to staff the project internally or with some new direct hires.

I once worked on a project where the company actually hired a consultant to translate corporate jargon into generic tech and software development terms. Apparently, they were struggling to find new hires or consultants with the skills they were looking for. Additionally, the resources they would gamble on were so lost trying to understand requirements, the development departments turned into revolving doors. The “Demystification Consultant” had a full time job translating specs and RFP’s so vendors could understand what to bid on. It might have been cheaper to adopt language learning methodologies and switch to a common communication device, illustrations.

In Agile that translates to frequent demonstrations of development progress. Rather than placing the entire project’s success on a nearly finalized demonstration of a product 2 weeks from delivery, Agile iterative development practices frequent illustrations of what the product will be so stake holders can approve or make changes with enough lead time to actually see the modifications implemented.

How does a consultant benefit from this process? Well like it or not, no matter what the contract says, an unhappy customer can remove a consultant and withhold payment if the customer believes they can argue in court that the contract was violated. Consultants are burdened with not just delivering what was promised on the agreed schedule, but also executing the contract in such a way that the customer wants to work with them again or act as a reference for other potential clients. Conducting frequent demonstrations: illustrates your responsiveness to your customers’ needs, reaffirms progress made on the deliverables  and keeps the lines of communication open for timely reactions to change. Agile is the best defense against “That’s not what I asked for” in the final days of your project.

The customer isn’t always right–they’re hiring you to answer the question for them. It’s your job to read between the lines of what the customer says they want and give them what they need. The point is the answers you get from someone who isn’t an expert in YOUR field can’t logically be the solution to the problem. You’re dealing with breadcrumbs, not road maps.

SQL SERVER – All Table & its Index sizes in MBs

SQL Server Portal

Today, I received a query enquiring how to calculate the table Size, its index size along with the total number of rows, as the client’s database file size suddenly grew from MBs to GBs and they wanted to know what was inserted in which tables that caused this instant growth.

So, I started searching for solution over the web, but most of the solutions use cursor to calculate each table & its index size along with the total number of rows. I could not get what I was looking for, so, I thought of writing this and calculate it with an optimized query and convert the table & its index size to MBs instead of KBs because windows keep the file sizes in MBs also.

Given below is the Script :

View original post

SQL Server Indexes: Using the Clustered Index

If you really want to understand SQL Server indexing I suggest following Stairway to SQL Server Indexes. My first blog post on SQL Server indexes is going to focus on clustered indexes I’m going to paraphrase a lot of the information found in the articles linked above to save those of us who don’t want an intimate scientific knowledge on this topic and address what is pertinent to re-factoring indexes for to reduce deadlocks.

Simply put, indexes are smaller, concise reference tables associated with the data tables that tell SQL Server where some of the information requested in a query is located. Without indexes, the query engine performs a full table scan (sequentially looks at every Page in a table) to retrieve the requested data rather than jumping around segmented portions of the data where the requested rows are stored. In some cases where all the information requested resides in an index, SQL Server will simply return the row from the index and not access the main data table at all.

Non-clustered indexes are separate objects with separate storage. A clustered index instructs SQL Server how to sort the data in the main data table itself and creates a logical hidden permanent key. This is why identity columns are popular to create clustered indexes on. These integer or big integer column values are created sequentially by SQL Server when a new row is inserted; there is basically never any resorting when rows are inserted or deleted.

SQL Server saves data tables and non-clustered indexes in 8k byte blocks called Pages within their data files. As CRUD actions are performed on your tables the data within these pages must be re-sorted. SQL Server determines which rows go into which Pages. In the case where sequentially created pages do not contain the data sequentially, external fragmentation is created. The percentage of empty page space is called internal fragmentation. The fixed size nature of pages means some data types are better for indexes then others (creating less fragmentation), additionally the number of field included in an index can adversely affect index performance. I’m going to address fragmentation in a later post and explain why creating clustered indexes on GUID data types is death to a database.

From Stairway to SQL Server Indexes: Level 3, Clustered Indexes:

The clustered index key can be comprised of any columns you chose; it does not have to be based on the primary key.

Keep in mind these additional points about SQL Server clustered indexes:

  • Because the entries of the clustered index are the rows of the table, there is no bookmark value in a clustered index entry. When SQL Server is already at a row, it does not need a piece of information that tells it where to find that row.
  • A clustered index always covers the query. Since the index and the table are one and the same, every column of the table is in the index.
  • Having a clustered index on a table does not impact your options for creating non-clustered indexes on that table.

There can be only one clustered index on a table; the data in a table can’t be sorted two different ways at the same time. Clustering does not require uniqueness. In the case where the clustered index is made up of non-unique fields the sorting results in grouping for these fields.

If the preceding is clear, it should be obvious that creating a clustered index on a particular column and then also creating a non-clustered index with the same column is a waste of resources. The creation of the clustered index enforces sorting of the data and created a permanent key for SQL server to quickly locate the Page where every row is stored.

Additionally, creating indexes in this matter can lead to deadlocks. Consider a large number of rows being inserted into a table in a batch. Even if your clustered index is created on an identity column, SQL Server will perform a sort check on the clustered index. After the Clustered sort is finished the non-clustered indexes that also include this column will be filled and the batch will not be committed until the indexes are ready. The reason is the non-clustered indexes can’t create their bookmarks until page each record is going to reside is determined. When a select against this table happens at the same time, the query engine may decide to access the non-clustered index, but the insert batch has it locked. The insert can’t complete because the select is accessing the non-clustered index is the insert also has to write to.

Now wait, if the clustered column is removed from the non-clustered index, the select query is just going to use the clustered index (if the clustered index column is part of the table links or where clause of the select statement) and then the deadlock will occur due to the clustered index being locked. So how do we get the query engine to use the non-clustered index without using it to create the bookmark in the index? We use a covering index, with is created with the include statement.

From Stairway to SQL Server Indexes: Level 5, Included Columns:

Columns that are in a non-clustered index, but are not part of the index key, are called included columns. These columns are not part of the key, and so do not impact the sequence of entries in the index. Also, as we will see, they cause less overhead than key columns.

There are several advantages in having these columns in the index but not in the index key, such as:

  • Columns that are not part of the index key do not affect the location of the entry within the index. This, in turn, reduces the overhead of having them in the index.
  • The effort required to locate an entry(s) in the index is less.
  • The size of the index will be slightly smaller.
  • The data distribution statistics for the index will be easier to maintain.

Deciding whether an index column is part of the index key, or just an included column, is not the most important indexing decision you will ever make. That said, columns that frequently appear in the SELECT list but not in the WHERE clause of a query are best placed in the included columns portion of the index.

select objt.object_id, idxc.index_id, clmn.column_id

into #ClustIdxCols

from  sys.objects objt

inner join sys.indexes idx on objt.object_id = idx.object_id

inner join sys.index_columns idxc on objt.object_id = idxc.object_id and idx.index_id = idxc.index_id

inner join sys.columns clmn on objt.object_id = clmn.object_id and idxc.column_id = clmn.column_id

where objt.type = ‘U’ and idx.type = 1

order by objt.object_id, idxc.index_id, clmn.column_id

select distinct objt.name, idx.name

from  sys.objects objt

inner join sys.indexes idx on objt.object_id = idx.object_id

inner join sys.index_columns idxc on objt.object_id = idxc.object_id

and idx.index_id = idxc.index_id

inner join sys.columns clmn on objt.object_id = clmn.object_id

and idxc.column_id = clmn.column_id

inner join #ClustIdxCols test on objt.object_id = test.object_id

and idxc.column_id = test.column_id

where idx.type = 2 and idxc.is_included_column = 0

order by objt.name, Idx.name

This script identifies non-clustered indexes using clustered index columns (include columns are filtered out). In some cases the tables only contain two indexes, a clustered index on the identity column for the table and an non-clustered index on the same column.

How are you coming with those TPS reports?

Does anyone remember the original “Weekend at Bernie’s”? When the two accountants are pouring over the green and white dot matrix printouts of the accounts on the hot tar roof of their apartment building? That’s the traditional report, pages and pages of numbers. Until the invention of spreadsheets, this was the means by which accountants reviewed the accounts. Larger companies have since outgrown even spreadsheets and demanded larger data storage, like databases. However a majority of the reporting provided from these robust data stores still looks like a spreadsheet.

Detailed row data has its uses. Financial transactions and system audit logs are very useful when displayed as uniform rows of data for visual scanning. You can easily find the row that doesn’t look like the others when searching for an error, but how easy is it to determine transaction volume, or the frequency of a particular event? Are you going to count the lines and keep a tick mark tally on another sheet? You can calculate some of these statistics and group them by date, and compare the groups if all the data is still available at the source. Hopefully the query doesn’t slow down the system while users are trying to do their work on it. Save the data in monthly spreadsheets that are backed up regularly? In most cases, the generation of these reports just becomes a meaningless process and waste of paper.

Business Intelligence (BI), I don’t know who coined the term, is meant to communicate the difference between a report (any formatted delivery of data) and the display of information in a way that aides in the business decision making process. BI reporting answers questions like how are this month’s sales compared to last month’s? Or has there been a statistically significant increase in defects with the new modifications to our product?

Many professionals familiar with BI reporting make the assumption that it’s really only applicable to data collected and aggregated over a large period of time. Contact center management is the best example of why this isn’t the case. A contact center is much like an old Amateur Radio that requires constant tuning to produce the best receiving and transmitting signals. These machines come with a panel full of dials and switches used to make sure the radio and the antenna are in perfect attunement. Similarly, contact center managers are constantly monitoring the call handle and queue times making sure the correct proportion of agents are staffed for email, voice, or chat processing. These managers require timely 15 or 30 minute latent reports to determine short term staffing levels. Most companies see the customer service departments as necessary expenses to keep their customers happy. Decision makers need nearly real-time information to make constant adjustments maximizing the efficiency of the staff and keeping their customers happy.

The challenge for BI professionals is, understanding the users’ needs well enough to deliver the correct solution for the need. There isn’t a one size fits all approach to BI delivery. The assembly manager needs metrics on how many completed plastic toys are failing inspection every half hour. Management needs to compare this month’s inspection failures to the samples before switching to the new vendor, perhaps a few times a week. The executive might want to know how sales are going this year compared to the last five, but she only needs this information on the first of the month when she first walks into the office. Each one of these examples has different requirements for the size of the data set, the amount of time the report needs to be displayed for, and the near or distant data term period access.

What’s the point? Go run a search on any technology job board for Business Intelligence or BI. Employers are looking for qualified BI professionals to deliver reporting solutions way that aide in the business decision making process. It’s a growing space/niche on par with security and mobile development. If you can get past the stigma placed on this practice by developers that “Reporting Work” is somehow inferior to software development, there is a lot of opportunity to be had.

 

 

Politics be Damned! Fess up When You Mess up.

When I was quite young, between 3 and 5 I think, I made a mistake and lied about it. My father has always worked on and collected various instruments for commercial and amateur radio communications. One of these radios was on the floor by the couch one morning and I was playing with it. I remember at some point dropping a dime in it. I was scared and my father came over to ask what I was doing.

I remember asking, “What would happen if a dime fell in those slots?”

He asked, “Did you drop a dime in one of those slots?”

“No.” I answered and walked away. I entered the dining room some time later and saw him screwing the case back on the radio we had been discussing.

He turned to me and said, “You know if I had plugged this in it could have blown up in my face because of where that dime landed. It’s always better to admit a mistake so someone can help you fix it faster before anything really bad happens.”

I started my professional carrier at Tower Hill Insurance Group, Inc. in Gainesville, FL in 1997. In the early 1990’s the state of Florida had entered the insurance business because so many carrier had decided the hurricane risk was too great after the devastation left by Amelia. Five years or so big carriers were again willing to take risks in Florida and the new insurance commissioner wanted insurance carriers to buy the policies issued by the state in order to earn the privilege to again sell property casualty insurance products in the state that they abandoned. Tower Hill negotiated an agreement between a carrier and the state where Tower Hill would administrate the transfer, premium collection and claim processing for the policies this carrier purchased. My role in this arrangement was to import the policy information from a huge text file sent on CD once a month for the policies that were renewing for the next month. Those policies would renew with the new carrier.

I was… still pretty new at a lot of this stuff. I had only been out of college and at this job for a year. There was still an ample amount of self-directed on the job learning taking place. Well somehow, the details are a little foggy on wither I misunderstood the file specification or I got things switched around when I was creating the values, but I managed to get all the days and months for the effective and expiration dates switched. Yes I managed to create the European formatted dates. Worse yet… it was 3 months into the imports that I found the problem.

Give me some credit, I had a self-imposed sanity check, and found my own mistake before our policy management software vendor ever detected an issue, and just 5 days before we had a scheduled audit… Do I take vacation, abruptly quit, or just go get absurdly drunk?

Actually, I thought of my father’s advice. I took my problem to my manager who was gifted with a saintly proportion of patience and understanding. Mr. Chris Allen as I remember it laughed at me, shook his head, and said, “Common we need to go tell Keaton”. That’s Mr. Benson VP of IT.

“Wait, I have to go with you and explain what happened?” I said.

“Yup.”

Other than thinking I was either getting fired or possibly shot, I only remember one other detail of that situation. As we were walking to the vendor reps office in our building Keaton stopped me and said, “Look George, if you’re not making any mistakes… you’re not working hard.”

That was the biggest motivator of all time. Yeah it was hard, and involved a lot of manual checking from the data entry team, and a torrid of sarcasm form the vendor rep who was threatened by the fact I could write robust GUI apps in 1/10th the time his company could put out a patch. But we got it fixed, and we had a great story of how our processes averted a disaster to give to the carrier’s auditor which earned us shiny gold stars on the audit report for honestly, ethics, compliance, and dedication to accuracy.