cancel
Showing results for 
Show  only  | Search instead for 
Did you mean: 

Bulk Insert on MS SQL Cluster

Customer has been supplied dedicated MS SQL Cluster so they finally can set up Bulk Insert.
Question is, can the the Bulk Insert configuration be set up, honoring the Cluster principle.
The SQL share should be linked against the active node.

Our situation:
Followed the Bulk Insert Guidelines for setting up Bulk Insert for a MS SQL server.
However the sysadmin has set up BulkInsert shares (with local authorizations) on both cluster nodes.
And the admin states that it will not work if we try to connect through the sql cluster virtual address.
We have to connect to one of the nodes.
But it this case, we have the situation that if it's not the active node, we choose, the bulk insert will not function. Files end up on the passive node, and the active node likely will not find them there.
(This question appears unanswered in 2013's post by jose baeza: https://answers.dynatrace.com/questions/118220/bu... )


This may require us to switch to remote fileserver approach, where the CAS server will host locally the Bulkinsert folder, and the SQL server accesses the files remotely.

This however also has its drawbacks. Ref: https://answers.dynatrace.com/questions/119273/bul...



Tip: The active node ought to be found with

SQL "Select ServerProperty('ComputerNamePhysicalNetBIOS') "

15 REPLIES 15

adam_piotrowicz
Dynatrace Pro
Dynatrace Pro

Frans,

CAS can look only save bulk insert files to single network location. No intelligence behind it that could detect which host in MSSQL cluster is no w up.

CAS supports also mapped drives so maybe there would be an option for CAS OS to have mapped drive, with fixed volume letter that would be reacting to situations in SQL cluster and changing the underlying share name/IP?

gregg_kniss
Guide

There is not really a drawback to hosting the bulk insert folder on the CAS depending on the SQL configuration. If you run SQL as a non-human account other than the other system accounts you can do this quite easily and it works just fine.

Thanks Gregg. Indeed, not really to be called a draw back. As stated in the reference ariticle, the MSSQLSERVER service on their SQL server box will have to be run with
a higher privilege level than default one, as the account used to run
this service needs to have access to a network (shared) folder on the
CAS.
This method is noted as not preferred by Dyantrace.

Erik_Soderquist
Dynatrace Pro
Dynatrace Pro

What reason does the SQL admin give for saying it won't work with the
virtual cluster name? I know other customers have used a SQL cluster
before, and I believe simply used \\SQLClusterName\BulkInsertShare\ for
both read and write location, with the caveat being that the Windows
share had to exist with the same name on all members of the cluster.

(
for multiple CAS/ADS hosts,
\\SQLClusterName\BulkInsertShare\CAS/ADS_Hostname\ to keep them from
overwriting each other's files)

-- Erik

john_leight
Dynatrace Pro
Dynatrace Pro

When setting up the Bulk Insert share on the CAS I simply give the SQL computer access to the share. When assigning rights to the folder modify the Object Type to include Computers. Enter the name of the SQL server and select "Check Names", it should resolve the server name. Now the SQL server, no matter what account is running SQL, will be able to pull data from the share. The default share permissions are all that is necessary.

hi John, thanks. That was my "plan B", going for that fttb. Thanks for the Computer Object type!

I have had a meeting with the SQL Server Admin, of the third party that is managing the (database) server platform. It bottle's down to the (in)famous "this is not the standard here".
It IS possible, when the SQL server is granted the role of File Server and/or we are going to use shared storage. But since this is in this situation not standard, we will have to go through the paper-mill and be convincing why we do want it this way. (@Erik Soderquist This answered your question, right? :-))

I think one of the main reasons Dynatrace urges us to store the files on the database server side, must be out of performance reasons right? That the SQL process can read the files from fast.local storage, and not over the network.

And what about availability. If the DB server can not reach the share, how will that effect the DB server process.

For the time being, we are going to set it up in the reverse order. The DB pulling it from the CAS.

Yes, that did answer my question

Both performance and reliability are involved for recommending the share be on the SQL host rather than the CAS host.

From the performance side: the bulk inserts have been demonstrated to be measurably faster when the shared location is local to the SQL host than when it is local to the CAS host.

From the reliability side (this I tested myself with several variations), if there is a network glitch or hiccup, the CAS will retry the write operation to the create the bulk insert file, and if it is a transient glitch, the retry works fine. However, if SQL gets a similar glitch on reading the file, it aborts the read/insert, which at minimum will trigger the CAS, due to a failed bulk insert, to switch to prepared inserts (much slower), and in some of my tests, caused data from the bulk insert to be lost. The lost data was very inconsistent, and not very frequent, but in the worst case, the lost data also caused database inconsistencies. (As this was my own test environment, recovery was wiping the system.)

As the lost data issues I created in my tests were both very rare and impossible to reliably reproduce for troubleshooting, I didn't pursue it further. The performance difference alone between local to SQL and local to CAS was more than enough to justify the change in recommendation.

-- Erik

Thank you for your input Erik. Your experience on this matter is a valued contribution.

If shared space is being set up on the DB cluster, what size would be sufficient?
My wild guess would be that 10G is more then enough.

What happend here... her was a reply? and above answer was not from yesterday.


@Erik Soderquist
don't you agree? You replied to his. Or did you remove your answer?

I didn't remove my answer, it's higher up in the page. This forum's threading setup seems to sometimes be a bit less than perfectly intuitive... I'll try to move the comments around to fix the threading.

-- Erik

Edit :: I no longer have access to correct the threading on other peoples' answers/comments/replies. A direct link to my earlier answer is:

https://answers.dynatrace.com/answers/152990/view....

I see it now... odd. It moved, I expected, a chronological order 🙂

Erik_Soderquist
Dynatrace Pro
Dynatrace Pro

The shared space will always depend on the data volume/complexity involved. That said, the space needed for the bulk insert files is "relatively" small, as the files are generally deleted once they have been used. Unless you have a truly massive farm/cluster, I would be very surprised if 10 GB was not sufficient.

-- Erik

Indeed. 5G or less might even be more then enough. The files are there temporary, and usually not too large. In case of not being processed for some reason, CAS I think will stop placing them there and switch to slow insert methode. So I actually wonder what size would be sufficient. 1G?