I had some data in Access DB and wanted to convert it to XML. Data in Access was in one table called ‘Words’, with the following columns: Word, Meaning, Synonyms, Antonyms, Example etc. I wanted to convert this data in XML as follows:
<
Dictionary>
<Word Spelling="Munificent">
<Meaning>very generous</Meaning>
<Synonyms />
<Antonyms />
<Example />
</Word>
<Word Spelling="Anfractuosity">
<Meaning>Twist or turn</Meaning>
<Synonyms />
<Antonyms />
<Example />
</Word>
</Dictionary>
LINQ made job to read the data and convert it to XML very easy, as you can see all this happen in one line.
using System;
using System.Linq;
using System.Text;
using System.Data.OleDb;
using System.Xml.Linq;
using System.Data.Linq;
using System.Data.Linq.Mapping;
namespace
Convert2XML
{
class Program
{
static void Main(string[] args)
{
string ConnectionString = @"Provider=Microsoft.ACE.OLEDB.12.0;Data Source=Vocabulary.mdb;Persist Security Info=false";
DataContext db = new DataContext(new OleDbConnection(ConnectionString)) ;
Table <words> tblwords = db.GetTable<words>();
System.IO.File.WriteAllText("Data.XML",
( new XElement("Dictionary", from w in tblwords
select new XElement("Word" ,
new XAttribute("Spelling", w.Word),
new XElement("Meaning", w.Meaning),
new XElement("Synonyms", w.Synonyms),
new XElement("Antonyms", w.Antonyms),
new XElement("Example", w.Example)
))).ToString(), Encoding.UTF8);
}
[
Table(Name = "words")]
public class words
{
[Column]
public string Word;
[Column]
public string Meaning;
[Column]
public string Synonyms;
[Column]
public string Antonyms;
[Column]
public int weight;
[Column]
public string Reference;
[Column]
public string Example;
[Column]
public string link;
}
}
}
If you don’t have OLEDB 12.0 then You can use the following connection string too
string ConnectionString = @”Provider=Microsoft.Jet.OLEDB.4.0;Data Source=Vocabulary.mdb;Persist Security Info=True” ;
If you don’t have Microsoft.ACE.OLEDB.12.0 then get it from here
http://www.microsoft.com/downloads/details.aspx?familyid=7554F536-8C28-4598-9B72-EF94E038C891&displaylang=en
These oledb providers are 32 bit, so make sure to compile your project for 32 bit on a 64 bit machine.
Error handling and comments are removed for the brevity.
A study conducted by MSCOM ops team found 1 GB of data took 110 Minute to push and 12.5 minutes to Pull across the continents. Furthermore SQL 2008 with Windows Server 2008 made a huge difference over SQL 2005 and Windows Server 2003.
Push subscription replication of character data with SQL Server 2008 running on Windows Server 2008 yielded a 104 percent increase over SQL Server 2005 running on Windows Server 2003, and pull subscription replication of the same data yielded a 1,298 percent gain.
Lot of performance improvement is attributed to windows server TCP/IP Stack and SQL Native Client driver. Different reader and writer thread behavior resulted in push and pull performance difference.
Reader thread fills two buffers of 40 KB and signals the Writer thread to write the buffer on subscriber, so the 40 KB packet moves thru and fro over the wire, but in case of Pull model Writer thread pulls the data in one huge chunk as big as TCP/IP let it get. One of the most significant improvements to Windows Server 2008 is the autotuning of the receive window size, designed for high-latency environments. In previous versions of the operating system, the maximum window size was limited by the 16-bit Window field in the TCP header amounting to a maximum window size of 64 KB. In Windows Server 2008, by combining the Window field together with a Scale Factor field of the TCP header, the window can be scaled or tuned, up to 16 megabytes (MB) in size.
Read the full study here http://msdn.microsoft.com/en-us/library/dd263442.aspx
I thought everyone who wants to know how to put the windows events or IIS Logs to SQL already know about it. But for my amazement it is not the case. Quick search on the web did not turned up a short tutorial. So how to move my window events to SQL? Here it is
Install the Log Parser on the box.
Get the Log Parser from http://www.microsoft.com/downloads/details.aspx?FamilyID=890cd06b-abf8-4c25-91b2-f8d975cf8c07&displaylang=en
Create a database and a table in SQL.
Run the following Query
Use master
Go
Create Database MyEventDataBase
Go
use MyEventDataBase
go
CREATE TABLE [Audit] (
[EventLog] [varchar] (255) NULL ,
[RecordNumber] [int] NULL ,
[TimeGenerated] [datetime] NULL ,
[TimeWritten] [datetime] NULL ,
[EventID] [int] NULL ,
[EventType] [int] NULL ,
[EventTypeName] [varchar] (255) NULL ,
[EventCategory] [int] NULL ,
[EventCategoryName] [varchar] (255) NULL ,
[SourceName] [varchar] (255) NULL ,
[Strings] [varchar] (max) NULL ,
[ComputerName] [varchar] (255) NULL ,
[SID] [varchar] (255) NULL ,
[Message] [varchar] (max) NULL ,
[Data] [varchar] (max) NULL
)
GO
Make a directory called LogImport in C drive.
Make a bat file with the following entries:
copy \\LogMachineName\Logs\EventLog.evtx . /Y
"c:\Program Files (x86)\Log Parser 2.2\logparser.exe" -i:evt "select * into Audit from c:\logImport\eventlog.evtx" -iCheckPoint:CheckPoint.lpc -o:SQL -oConnString: "Driver={SQL Server Native Client 10.0};server=RAFAT20082;Database=DownloadEventLog;Trusted_Connection=yes;"
del *.evtx /Q
NOTE: Pay attention to the highlighted area and fix the path as per your environment.
Check help for LogParser and play with other options, it is a powerful tool; its numerous options will always give you a solution for your Log problems.
There are many different ways you can partition a cube. Here I am talking about – how you can use C# to partition a cube from a DTS package?
Using Microsoft.AnalysisServices Object API, you can travel the whole Hierarchy of objects and completely administer an Analysis Service Instance. From the connection Object you can get to the Server and database instances. And in database you can transverse the Cube and their different measures. Once you get hold of a measure then you can transverse the different partitions of measures and dig deep in to each partition query definitions etc.
To keep things simple, there is one prerequisite to name your measure partitions in the in following format: MeasureName_DateSKey. DateSKey is a long number showing the date from 1st Jan 1990. To get the DateSKey for a date you can use the following formula:
DateTime d;
d.Date.Subtract(new System.DateTime(1900, 1, 1, 0, 0, 0)).Days
So, if your measure name is ‘Distinct Downloaders Daily’ then it may have following partitions for every 7 days as follows: Distinct Downloaders Daily_39812, Distinct Downloaders Daily_398129… etc
Open a new DTS package, drag the script component and double click to start editing a C# script.
Include the following library:
using Microsoft.AnalysisServices;
using System.Collections.ObjectModel;
Define a new structure as measure, to hold the definition of your measures and there partition boundaries.
[sourcecode language='csharp']
// Structure used to keep the Measure name and their partition durations
struct
Measure
{
public string MeasureName;
// Partition duration in days
public int iPartitionDays;
public Measure(string measureName, int ipartitionDays){
MeasureName = measureName;
iPartitionDays = ipartitionDays;
}
}
[/sourcecode]
Now define a collection of measures which you want to partition. Sure, this information you can get from a Config file too … It is hardcoded in code for the illustration purpose.
Collection Measures =
new Collection();
Measures.Add(new Measure("DownLoad Summary", 7));
Measures.Add(new Measure("Distinct Downloaders Daily", 7));
Measures.Add(new Measure("Distinct Downloaders Weekly", 30));
Measures.Add(new Measure("Top Referer", 30));
Now you can use this collection to call a method called ProcessCube. Here is the code, removed all the error handling for the sake of brevity.
private
void ProcessCube( Collection Measures, string cubeName)
{
Server oServer = new Server();
try
{
// Get Connection object and then get the Server and Database name from DTS connection object
ConnectionManager oConnection = null;
for (int i = 0; i < Dts.Connections.Count; i++)
{
oConnection = Dts.Connections[i];
if (oConnection.Name == "CubeConnectionName") //Connection name as defined in ETL package
{
break;
}else
oConnection = null;
}
string sServer = (string)(oConnection.Properties["ServerName"].GetValue(oConnection));
string sDatabase = (string)(oConnection.Properties["InitialCatalog"].GetValue(oConnection));
oServer.Connect(sServer);// connect to the server and start scanning down the object hierarchy
Database oDB = oServer.Databases.FindByName(sDatabase);
Cube oCube = oDB.Cubes.FindByName(cubeName);
foreach (Measure m in Measures)
{
CreatePartition(oCube, m.MeasureName, m.iPartitionDays); //Create old paritions
DeletePartitions(oCube, m.MeasureName, m.iPartitionDays); // Delete old partitions
}
}
finally
{
if (oServer.Connected)
oServer.Disconnect();
}
}
///
Create a new partition based on the skipdays
///
/// Cube Name to be processed
/// MeasureName which need to be partitioned
/// Partition duration e.g. for daily=1, weekly=7, Monthly =30
/// true on success
///
///
This code Creates the new partition names, using the default Measure name as the partition name and the
/// DateSKey. for e.g. if the measure name is Browser and the DateSKey=39241
/// then the partition name will be 'Browser_39241'
bool CreatePartition(Cube oCube, string measureName, int iDaysToSkip)
{
Byte[] dataBytes = new Byte[5000];
QueryBinding oQueryBinding;
bool bflag = false;
int iCurrentDateSKey = (int)Dts.Variables["CubeCurrentDateSKey"].Value; //Get the currentDateSkey
MeasureGroup oMeasureGroup = oCube.MeasureGroups.FindByName(measureName); // Find all measure groups
if (oMeasureGroup == null) return false;
Partition oPartition = oMeasureGroup.Partitions[oMeasureGroup.Partitions.Count - 1]; //Get the Last partition
int LastCubeDateSkey = GetPartitionDateSKey(oPartition);//Get the lastkey DateSkey
string DefaultPartitionName = oMeasureGroup.Name; //Get the Partition name first part
string sNewPartitionName = string.Empty;
if (LastCubeDateSkey == 1) //if it is first default partition
{
sNewPartitionName = DefaultPartitionName + "_" + iCurrentDateSKey; //Making the new partition Name
}
else
{
if (LastCubeDateSkey + iDaysToSkip <= iCurrentDateSKey)
sNewPartitionName = DefaultPartitionName + "_" + iCurrentDateSKey;
else
return true; //No need to create the partition yet. Every thing is fine, go back
}
//Just another check, if the partition already existing then skip
oPartition = oMeasureGroup.Partitions.FindByName(sNewPartitionName);
Partition oPartitionNew;
if (oPartition == null)
{
//Did not get the partition, lets create one,Get the last partition
oPartition = oMeasureGroup.Partitions[oMeasureGroup.Partitions.Count - 1];
//Clone the properties from the last partition to the new partition.
oPartitionNew = oPartition.Clone();
oPartitionNew.ID = sNewPartitionName;
oPartitionNew.Name = sNewPartitionName;
oQueryBinding = oPartitionNew.Source as QueryBinding;
if (oQueryBinding == null)
{
return true; // No Query, No Partition
}
oQueryBinding.QueryDefinition = GetNewQuery(oQueryBinding.QueryDefinition, iCurrentDateSKey, iDaysToSkip);
if (oQueryBinding.QueryDefinition == null || oQueryBinding.QueryDefinition == string.Empty)
{
Dts.Log("Partition : " + oPartitionNew.Name + " Empty Query returned", 0, dataBytes);
Dts.Events.FireInformation(0, "CreatePartition", "Partition : " + oPartitionNew.Name + " Empty Query returned", null, 0, ref bflag);
return false;
}
if (oMeasureGroup.Partitions.Contains(oPartitionNew))
{
Dts.Log("Partition : " + oPartitionNew.Name + " Already exists.", 0, dataBytes);
return true; //Oh! what a surprise, this partition already existing.
}
oMeasureGroup.Partitions.Add(oPartitionNew);
try
{
oPartitionNew.Update();
Dts.Log("Partition updated: " + oPartitionNew.Name, 0, dataBytes);
Dts.Events.FireInformation(0, "CreatePartition", "Partition updated: " + oPartitionNew.Name, null, 0, ref bflag);
oPartitionNew.Process(ProcessType.ProcessFull);
Dts.Log("Partition Processed: " + oPartitionNew.Name, 0, dataBytes);
Dts.Events.FireInformation(0, "CreatePartition", "Partition Processed: " + oPartitionNew.Name, null, 0, ref bflag);
}
catch (Exception e)
{
if (oPartitionNew != null)
Dts.Log("Error in Create partition: " + oPartitionNew.Name + "Exception: " + e.Message, 0, dataBytes);
else
Dts.Log("Error in Create partition: Unknown. Exception: " + e.Message, 0, dataBytes);
return false;
}
}
return true;
} //~Create partitio ends here
///
/// This function Gets the partition DateSKey from the partition name
///
/// On Success the right DateSKey, on error zero
///
private int GetPartitionDateSKey(Partition oPartition)
{
if (oPartition == null) return 0;
try
{
return Convert.ToInt32(oPartition.Name.Substring(oPartition.Name.LastIndexOf("_") + 1));
}
catch (Exception ex)
{
return 0;
}
}
///
/// Get the new Query for the partition.
///
string GetNewQuery(string sourceQuery, int iCurrentDateSKey, int iDaysToSkip)
{
string mainQuery = string.Empty;
string newQuery = string.Empty;
string newCondition;
//From the Query get the Where clause
int LastIndex = sourceQuery.ToUpper().LastIndexOf("WHERE");
if (LastIndex < 0)
{
mainQuery = sourceQuery.Substring(0, LastIndex);
newCondition = " [DateSKey] >= " + Convert.ToString(iCurrentDateSKey) + " AND [DateSKey] = " + Convert.ToString(iCurrentDateSKey) + " AND [DateSKey] < " + Convert.ToString(iCurrentDateSKey + iDaysToSkip) + " ";
newQuery = sourceQuery + " WHERE " + newCondition;
}
return newQuery;
}
//Deletes old partitions based on the data retention period.
bool DeletePartitions(Cube oCube, string measureName, int iDaysToSkip)
{
Collection PartitionTobeDeleted = new Collection();
int iDateSkeyTobeDeleted;
Byte[] dataBytes = new Byte[5000];
bool bflag = false;
// 13 * 30 = 390 days - 13 months
int iDaysTobeDeleted = 390; //TODO: Get from a variable ["DataRetentionPeriodInDays"]
//Getting the number for which partition needs to deleted. old partitions
iDateSkeyTobeDeleted = GetDateSKey(DateTime.Now) - iDaysTobeDeleted;
MeasureGroup oMeasureGroup = oCube.MeasureGroups.FindByName(measureName);
if (oMeasureGroup == null)
{
Dts.Log("Measure : " + oMeasureGroup.Name + " not found.", 0, dataBytes);
return false;
}
Dts.Log("[DeletePartitions] Working on Measure: " + oMeasureGroup.Name, 0, dataBytes);
string sPartitionName = string.Empty; //used for logging the correct PartitionName
int MaxPartitions = oMeasureGroup.Partitions.Count;
foreach (Partition oPartition2 in oMeasureGroup.Partitions)
{
sPartitionName = oPartition2.Name;
if (!oPartition2.Name.Contains("_"))
continue; //_ is not there it means it is not the partition which we need to delete
try
{
// Get the boundary partition date from partition name
int spartitionBoundaryDateSkey = GetPartitionDateSKey(oPartition2);
if (spartitionBoundaryDateSkey == 0) continue;
//if partition DateSKey is smaller then the partition to be deleted then delete
//MaxPartition protects us from deleting all the partitions from the measures
if (spartitionBoundaryDateSkey > 1)
{
MaxPartitions--;
//can not drop object here because of foreach.
PartitionTobeDeleted.Add(oPartition2);
#region Log
Dts.Log("Found partition that needs to be dropped: " + oPartition2.Name, 0, dataBytes);
Dts.Events.FireInformation(0, "DeletePartition", "Found partition that needs to be dropped: " + oPartition2.Name, null, 0, ref bflag);
#endregion
}
}
catch (Exception e)
{
Dts.Log("Error in Delete partition: " + sPartitionName + "Exception: " + e.Message, 0, dataBytes);
Dts.Events.FireInformation(0, "DeletePartition", "Error in Delete partition: " + sPartitionName + "Exception: " + e.Message, null, 0, ref bflag);
}
}
//Deleting partition from the measure group
foreach (Partition oPartition in PartitionTobeDeleted)
{
XmlaWarningCollection warningColln = new XmlaWarningCollection();
oPartition.Drop(DropOptions.Default, warningColln);
}
return true;
}
These queries can save your life.
Query to find the query using the most CPU:
select
highest_cpu_queries.plan_handle,
highest_cpu_queries.total_worker_time,
q.dbid,
q.objectid,
q.number,
q.encrypted,
q.[text]
from
(select top 10
qs.plan_handle,
qs.total_worker_time
from
sys.dm_exec_query_stats qs
order by qs.total_worker_time desc) as highest_cpu_queries
cross apply sys.dm_exec_sql_text(plan_handle) as q
order by highest_cpu_queries.total_worker_time desc
Indexes not used:
select object_name(i.object_id),
i.name,
s.user_updates,
s.user_seeks,
s.user_scans,
s.user_lookups
from sys.indexes i
left join sys.dm_db_index_usage_stats s
on s.object_id = i.object_id and
i.index_id = s.index_id and s.database_id = DB_ID('ILT_Stage')
where objectproperty(i.object_id, 'IsIndexable') = 1 and
-- index_usage_stats has no reference to this index (not being used)
s.index_id is null or
-- index is being updated, but not used by seeks/scans/lookups
(s.user_updates > 0 and s.user_seeks = 0
and s.user_scans = 0 and s.user_lookups = 0)
order by object_name(i.object_id) asc
Missing indexes:
SELECT
mig.index_group_handle, mid.index_handle,
CONVERT (decimal (28,1),
migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)
) AS improvement_measure,
'CREATE INDEX missing_index_' + CONVERT (varchar, mig.index_group_handle) + '_' + CONVERT (varchar, mid.index_handle)
+ ' ON ' + mid.statement
+ ' (' + ISNULL (mid.equality_columns,'')
+ CASE WHEN mid.equality_columns IS NOT NULL AND mid.inequality_columns IS NOT NULL THEN ',' ELSE '' END
+ ISNULL (mid.inequality_columns, '')
+ ')'
+ ISNULL (' INCLUDE (' + mid.included_columns + ')', '') AS create_index_statement,
migs.*, mid.database_id, mid.[object_id]
FROM sys.dm_db_missing_index_groups mig
INNER JOIN sys.dm_db_missing_index_group_stats migs ON migs.group_handle = mig.index_group_handle
INNER JOIN sys.dm_db_missing_index_details mid ON mig.index_handle = mid.index_handle
WHERE CONVERT (decimal (28,1), migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans)) > 10
ORDER BY migs.avg_total_user_cost * migs.avg_user_impact * (migs.user_seeks + migs.user_scans) DESC
TempDB:
Queries to find which SQL statements are using Tempdb the most:
SELECT
t1.session_id,
(t1.internal_objects_alloc_page_count + task_alloc) as allocated,
(t1.internal_objects_dealloc_page_count + task_dealloc) as
deallocated
from sys.dm_db_session_space_usage as t1,
(select session_id,
sum(internal_objects_alloc_page_count)
as task_alloc,
sum (internal_objects_dealloc_page_count) as
task_dealloc
from sys.dm_db_task_space_usage group by session_id) as t2
where t1.session_id = t2.session_id and t1.session_id >50
order by allocated DESC
Highest CPU queries:
select
t1.session_id,
t1.request_id,
t1.task_alloc,
t1.task_dealloc,
t2.sql_handle,
t2.statement_start_offset,
t2.statement_end_offset,
t2.plan_handle
from (Select session_id,
request_id,
sum(internal_objects_alloc_page_count) as task_alloc,
sum (internal_objects_dealloc_page_count) as task_dealloc
from sys.dm_db_task_space_usage
group by session_id, request_id) as t1,
sys.dm_exec_requests as t2
where t1.session_id = t2.session_id and
(t1.request_id = t2.request_id)
order by t1.task_alloc DESC
Thanks to Santosh Tawde for providing these queries. I have some more useful DMV queries in my previous post.
I have a table, which keeps the country, Referer URL and their counts.
I need to find out the top 2 URL’s for every country.
So, you see this is not a straight top two order by problem. If you know the solution then it is very simple, if not then you may spend some time figuring out. Hopefully this will save your few minutes.
Here is the solution:
CREATE TABLE Referer
(
CountryId VARCHAR (100),
RefererId VARCHAR (100),
Counts INT
)
INSERT INTO Referer VALUES ('US', 'http:\\microsoft.com', 10345)
INSERT INTO Referer VALUES ('US', 'http:\\microsoft.com\SQL', 43)
INSERT INTO Referer VALUES ('US', 'http:\\microsoft.com\Office', 234)
INSERT INTO Referer VALUES ('US', 'http:\\microsoft.com\XML', 2313)
INSERT INTO Referer VALUES ('US', 'http:\\microsoft.com\SQL', 105)
INSERT INTO Referer VALUES ('UK', 'http:\\microsoft.com', 23)
INSERT INTO Referer VALUES ('UK', 'http:\\microsoft.com', 10734)
INSERT INTO Referer VALUES ('UK', 'http:\\microsoft.com\SQL', 10438)
INSERT INTO Referer VALUES ('UK', 'http:\\microsoft.com\SQL', 1039)
INSERT INTO Referer VALUES ('UK', 'http:\\microsoft.com\Office', 14310)
INSERT INTO Referer VALUES ('Canada', 'http:\\microsoft.com', 14151)
INSERT INTO Referer VALUES ('Canada', 'http:\\microsoft.com', 1412)
INSERT INTO Referer VALUES ('Canada', 'http:\\microsoft.com\SQL', 15613)
INSERT INTO Referer VALUES ('Canada', 'http:\\microsoft.com', 14134)
INSERT INTO Referer VALUES ('Canada', 'http:\\microsoft.com\SQL', 13145)
GO
WITH RefererTemp AS
(
SELECT ROW_NUMBER() OVER(PARTITION BY CountryId ORDER BY Counts DESC) AS 'RowNumber' , *
FROM Referer
)
SELECT * FROM RefererTemp
WHERE RowNumber < 3
|
RowNumber |
countryId |
RefererId |
Count |
|
1 |
Canada |
http:\\microsoft.com\SQL |
15613 |
|
2 |
Canada |
http:\\microsoft.com |
14151 |
|
1 |
UK |
http:\\microsoft.com\Office |
14310 |
|
2 |
UK |
http:\\microsoft.com |
10734 |
|
1 |
US |
http:\\microsoft.com |
10345 |
|
2 |
US |
http:\\microsoft.com\XML |
2313 |
Sure, there will be other way to solve the problem. Let me know.
Often it happens, that we want to play with a column value of a table. Not knowing the ramification of the column value, we have to resort to the back breaking manual process of finding the dependency on the column. Using the management studio, get the list of stored procedures depended on the table. Go thru each stored procedure and check which stored procedure does what with the column in question.
Today, I came across 29 stored procedure to be waded thru to get a column significance in a table. The UI in management studio gives the dependency on the table, but does not discuss the column level dependency. However, all this information is nicely tucked in the SQL tables. Using the following query, my quest reduced from 29 Stored procedures to 3 stored procedures. Here is the query
/*
Script to get the name of stored procedure which
effect the column
*/
declare @TableName varchar(250)
declare @ColumnName varchar(250)
declare @TableId int
declare @ColumnId int
set @TableName = 'EventDetail '
set @ColumnName = 'NotificationType'
-- Get the TableId
select @ColumnId = Column_id from sys.columns (nolock)
where object_id = Object_id (@TableName) and Name = @ColumnName
if @ColumnId is not null
begin
select distinct o.Name, d.is_updated, d.is_selected
from sys.objects o (nolock) join sys.sql_dependencies d (nolock)
on d.object_id = o.object_id
where d.referenced_major_id = Object_id (@TableName)
and ( d.class = 0 and d.referenced_minor_id = @ColumnId )
end
else
begin
select distinct o.Name
from sys.objects o (nolock) join sys.sql_dependencies d (nolock)
on d.object_id = o.object_id
where d.referenced_major_id = Object_id (@TableName)
end
From BOL:
Dependencies are established during CREATE only if the referenced (independent) entity exists at the time that the referencing (dependent) object is created. Due to deferred name resolution, the referenced entity need not exist at the time of creation. In this case, a dependency row is not created. Moreover, entities referenced through dynamic SQL do not establish dependencies.
One of the most useful tool from the Patterns and Practice.
Patterns & practices Guidance Explorer is a tool that enables discovery, composition and consumption of high quality development guidance. The guidance library contains a variety of guidance types including checklists and guidelines covering design, implementation and deployment topics.
Check the Guidance Explorer web edition here.
We had an application which was not performing fast enough for our satisfaction. The application updates various tables with millions of records. We tweaked few things in our SQL and suddenly the speed of application improved to our target level. All the changes we made were very simple and well known, but as it happens often enough that we overlook simple things, so here are the changes we made:
Where clause:
We had the business rule which makes us write the query like this
SELECT RecipientName FROM recipients
WHERE email = @email or EmployeeSSN = @EmployeeSSN
The query plan showed us that the above 'where' clause is executed by SQL in parallel for email and EmployeeSSN, SQL tries to help us as much as it can. However, we realized we search 99.9% times only on @email and @EmployeeSSN will be null for most of the times, so there is no need to check EmployeeSSN every time. We changed the code as follows:
IF @Email is null
SELECT RecipientName FROM recipients
WHERE EmployeeSSN = @EmployeeSSN
ELSE
SELECT RecipientName FROM recipients
WHERE Email = @Email
Data Type conversion
Data type for one of the tables column(emailAddress) was 'varchar', but data type of a variable in stored procedure was defined as 'nvarchar', This variable is used in the where clause, as follows:
declare
@email nvarchar (100)
select * from recipient where emailAddress = @email
The above statement made SQL do the data conversion for 'where' clause. This conversion was a drag on the SP execution speed. This again we come to know when we looked at the Query plan, otherwise SQL was keeping quiet about this extra work. I wish SQL should start giving us warning for this kind of over sights.
Fully Qualified Names
In our stored procedure we made all the names of object fully qualified. Complete name of an object consists of four identifiers: server, database, owner, and object name. An object name that specifies all four parts is known as a fully qualified name. Using fully qualified names eliminates any confusion about which stored procedure you want to run and can boost performance because SQL Server has a better chance to reuse the stored procedures execution plans if they were executed using fully qualified names.
SELECT Alias FROM Recipient
Changed to
SELECT Alias FROM DBName.dbo.Recipient
Prefer sp_executesql stored procedure instead of the EXECUTE statement.
When you use the sp_executesql stored procedure to executes a Transact-SQL statements that will be reused many times, the SQL Server query optimizer will reuse the execution plan, hence boosting the performance.
Keep transactions as short as possible.
Some time we had 'begin transaction' much before we actually needed, and 'commit transaction' much latter then actually we needed. We moved begin and commit to make the transactions as short as possible, this not only helped the overall speed but helped in preventing the deadlocks too. Classical mistake we seen again and again:
Begin Tran
Check some condition
Delete
Updates
Selects
End Tran
Most of the time ‘End Tran’ can safely move above select, and if you are returning huge data rows back to middle tier then this will surely improve your application performance.
In our case we changed the SP as follows:
Check some condition
Begin Tran
Delete
Updates
End Tran
Selects
Changed temp table to temp variables
We changed temp tables to temp variables.
Using temporary tables inside stored procedure reduces the chance to reuse the execution plan.
select top 1000 from LimitedProgramTBNEventID
into #Events from LimitedProgramTBNEvent (NOLOCK)
where eventStatus = 7
we changed it as follows:
DECLARE @Events table
(
LimitedProgramTBNEventID int NOT NULL
)
insert into @Events select top 1000 LimitedProgramTBNEventID
from LimitedProgramTBNEvent (NOLOCK)
where eventStatus = 7
GetDate ()
In couple of stored procedures, in couple of places we had GetDate() funtion, we replaced this function with a date variable and initialized this variable only once, so we don’t need to call getdate again and again.
NOLOCK
Added NOLOCK in many select statements. 'No Lock' not only ignore Exclusive locks on rows, but it does not issue a Shared Lock on the records it reads. Therefore, it will not delay or block a transaction trying to write. There are caveats to this approach, please read on:
http://www.databasejournal.com/features/mssql/article.php/3553281
http://msdn.microsoft.com/library/default.asp?url=/library/en-us/acdata/ac_8_con_7a_1hf7.asp
Indexes
SQL Server 2005 has introduced DMV (Dynamic management views), Dynamic management views and functions return server state information that can be used to monitor the health of a server instance, diagnose problems, and tune performance. Please read about them in books on-line. It tells you lots of good thing about your system, Most utilixed Index, Cost of index benefit, hot spot index contentions etc. Everyone knows how important indexing is, and we over do it some time. Using the following query, you can find out which indexes are used, and how much work sql does to maintain each indexes
--- sys.dm_db_index_usage_stats
declare @dbid int
select @dbid = db_id()
select objectname=object_name(s.object_id), indexname=i.name, i.index_id
,reads=user_seeks + user_scans + user_lookups
,writes = user_updates
from sys.dm_db_index_usage_stats s,
sys.indexes i
where objectproperty(s.object_id,'IsUserTable') = 1
and s.object_id = i.object_id
and i.index_id = s.index_id
and s.database_id = @dbid
order by reads desc
go
|
Table Name |
Index Name |
Index_id |
Reads |
Writes |
|
Recipient |
IX_Recipient |
4 |
0 |
3048893 |
|
Process |
IX_Process |
22 |
0 |
11709 |
The above Query tells us we are doing too much work maintaining these two indexes and system never utilities them. Helps us getting rid of lot of redundant work for SQL.
If the user_lookup and user_seek is too high on a clustered index and user_seek high on other non-clustered index then you will be better of turning the non-clustered index to clustered.
declare @dbid int
select @dbid = db_id()
select objectname=object_name(s.object_id), s.object_id, indexname=i.name, i.index_id
, user_seeks, user_scans, user_lookups, user_updates
from sys.dm_db_index_usage_stats s,
sys.indexes i
where database_id = @dbid and objectproperty(s.object_id,'IsUserTable') = 1
and i.object_id = s.object_id
and i.index_id = s.index_id
order by user_updates desc, user_seeks asc
|
Table Name |
Index Name |
User_seek |
User_lookup |
User_update |
|
LimitedProgramRecipient |
LimitedProgramRecipient_PK |
164654 |
3851134 |
3850446 |
|
LimitedProgramRecipient |
LimitedProgramRecipient_UC1 |
3851582 |
0 |
3850446 |
In the above example LimitedProgramRecipient_UC1 columns, should be included in the clustered Index.
Here are some more links for DMV.
Database Engine Tuning Advisor
Of course this going without saying that you should always run the Database Engine Tuning Advisor on your database. It does tells us about some of the missing statistics.
I am sure this is not an exhaustive list for performance increase tips, Please feel free to add any other performance tips …
Before I explain this equation, I should explain how I reached to it.
I work for the company which employ scary smart and insanely intelligent people. It is very hard to keep pace with them. However, as a man of average intelligence, I have to device a way to keep up with these smart people.
After close observation and careful studying there thought process, and seen how any topic, issue, problem is shredded by these people, I reached to following conclusion. First these people are really smart and some what god gifted as far as the grey matter in their head is concern. To analyze any idea in hand, they quickly see it from different angles; they check all the attributes, they evaluate its Extensibility, can this idea be extended for future growth, complexity in implementation, How complex the solution is, can any one else understand it other then the original author. Operation headaches, What other problems it will create for oprations. Manageability, can the code for this cool trick will be maintainable. Performance, How it will effect the performance of the overall system. Localization issues, usability and last what the standard says about it. Of course all these attributes are evaluated in less then 10 millisecond. Now people like me who are not so evolved, may take to go thru this process at least a second, and then I am sure I will not even remember all the important attribute against which any idea is to be evaluated.
So, do you see now, I made a list of attributes against which every idea should be checked, and the first letters of attributes are as follows E,C,O,M,P,L,U,S. E is for extensibility, C for complexity, O for Operations, M-Manageability, P-Performance, L-Localization, U-Usability and S-Standards etc … As you can see that these letters form my above equation.
Now when ever I come across some new idea, a topic, a fragment of code I evaluate it against my E= COM+ equation. Simple, now I can also pretend to be half smart as these people in my company. This equation is still missing many things, many important attributes not covered by this equation. One most important one is the historic perspective of any issue, How in past decisions were made and what are the ramifications of those decision. How industry other players, other technologies solved these problem etc. However it covers a pretty broad base and gives a small tool to aid my thinking process so I can keep my head above water.
Too much of the good thing – Replication. Nice article in ‘The Architecture Journal’, Whenever you see a replication happening in your enterprise, cast a doubtful eye on it, this can signal an Enterprise SOA Anti pattern. It is true to the dots, I still have bite marks on me by this Anti pattern.
I seen couple of articles talking about OUTPUT Clause of SQL 2005, but no one is warning the side affect of it on the scope_Identity() function. Let me explain with an example:
You have a Table T1, and would like to save the auditing data from T1 to T2. To accomplish this you define a trigger on T1 which will insert the T1 data in T2.
create table T1
(
T1_ID int identity,
FirstName char(50),
SecondName char(50)
)
CREATE TRIGGER T1_Insert
ON T1
AFTER INSERT
AS
BEGIN
SET NOCOUNT ON;
INSERT INTO T2 (FirstName, SecondName) SELECT INSERTED.FirstName, INSERTED.SecondName FROM inserted
END
create table T2
(
T2_ID int identity,
FirstName char(50),
SecondName char(50)
)
INSERT INTO T1 (FirstName, SecondName) values ( 'FirstName', 'SecondName')
SELECT SCOPE_IDENTITY()
Above statement return you the identity inserted on T1, and this is what exactly you wanted. Trigger has inserted the data in T2 but scope_identity returned you the identity of T1. So far so good, Every one is happy.
Now you find this OUTPUT Clause in SQL 2005, and change your code as follows. You deleted the trigger and write the insert statement as follows:
INSERT INTO T1 (FirstName, SecondName)
OUTPUT Inserted.FirstName, Inserted.SecondName
INTO T2 (FirstName, SecondName) values ( 'FirstName', 'SecondName')
SELECT SCOPE_IDENTITY()
Now in this case you will get the identity of T2.
Quiz: So, what one should do?
So what is the OLE DB connection string for Excel or Exchange?
Finally, all this on one page http://www.connectionstrings.com/ .
What will be the out put of the following code:
try
{
double k = 19; double m = 0;
k = k / m;
Console.WriteLine(Convert.ToString(k));
Console.Read();
}
catch (Exception ex)
{
Console.WriteLine(ex.Message);
}
Different data type behave differently for divide by zero exception.
The above code does not throw devide by zero exception, but it will quitly execute and 'k' will have the value of 'infinity'. So, don't depend on a exception here. However, if 'k' and 'm' are int then you will get the divideByZero exception
Get some C# ready made code snippets from here. Learn about snippets here.