Showing posts with label complex. Show all posts
Showing posts with label complex. Show all posts

Monday, March 26, 2012

Multi Format Text File

Hello. I am in the process of migrating an old app to SQL Server. The old app reads hundreds of different flat file formats. One of the more complex ones is a multi-format delimited file. For example:

01^Bob Johnson^123 Main St^Anytown^St
02^Book1^$20
02^Book2^$30
03^Gift Cert^Happy Birthday^$100

This file is delimited with the ^ character. Note that the first 2 characters identify the row type. All 01 rows have data in the format: Name, Street Address, City, State. All 02 have data in the format: Book name, price. Etc.

Any clever ideas on how to parse this? I tried setting it up as a flat file source with the ^ delimiter. It doesn't work - in this example it wraps the third row to the end of the second row and keeps adding columns to fill out the row.

The only option that I can think of is to pull the entire row into one long column, and then use a script component to manually substring each column out.

Any help would be greatly appreciated.

Thanks,
Chris

tofferr wrote:

The only option that I can think of is to pull the entire row into one long column, and then use a script component to manually substring each column out.

Yes! That's the best thing to do, in my opinion, and what I do in practice. You can also use a derived column to substring the rows instead of a script component.

Also, you could have one data flow with a conditional split in it to direct each record type to a unique flat file, raw file, sql server table, etc... for staging. From there you can have have an appropriately mapped flat file connection (to the staging file/table/whatever) and continue using a data flow as you normally would.

So, bottom line, which would you rather maintain? substrings or flat file definitions?|||It looks like the rows are related. I think the script will need to generate some type of transaction key for every 01 record and carry it forward for the other row types. I don't think the relationship between the rows can be maintained by using the alternate methods.|||

JayH wrote:

It looks like the rows are related. I think the script will need to generate some type of transaction key for every 01 record and carry it forward for the other row types. I don't think the relationship between the rows can be maintained by using the alternate methods.

If each file only contains one name, then you're fine. If if contains more than one name then when loading the initial file you may want to use a script component to generate a unique key which gets incremented by one each time you read a new 01 record. Carry this key forward and you can later reassemble the records accordingly.|||Thank you for the quick responses!

I simplified my example significantly. For example, a 01 row really has well over 150 fields. And, yes, the rows are related and there are many 01 records in each file, so I believe Jay H is correct.and the alternate methods will not work.

Any other helpful thoughts?

BTW, I tried to check out http://www.ssistalk.com/ but it did not load? Temporary problem?

Thanks,
Chris
|||

tofferr wrote:

Any other helpful thoughts?


Yep, see my response above. You can mark each "record" (the 01 and its associated children) with a unique key. Then later when you work with the data you can work on batches as defined by that key.

tofferr wrote:


BTW, I tried to check out http://www.ssistalk.com/ but it did not load? Temporary problem?

Not sure what the deal was there, but the machine was unresponsive. It's back up. Thanks for the note.

Monday, February 20, 2012

MSSQL questions

1) I haven't seen any use of "SAVE TRANSACTION" in small to mid size apps. Is it used mainly in big and complex apps?

2) Given this linkhttp://www.databasejournal.com/features/mssql/article.php/3584751 about SQL2005. How can what the article presents be done in SQL2000?

Thanks.

SAVE TRANSACTION isn't normally used in small apps because you would only want to use it in a situation in which you have a subroutine that might fail, but you still want the transaction to commit. Small apps usually either have very small transactions (or none), and you either want them to commit or rollback as a single unit. I've used it in mid size apps where I was looping, doing some processing of for a batch transaction, and any loop may fail, but I needed it to skip the bad loop and continue processing. This isn't very common.

For example a batch load of (something), as you are looping through, Save trans, insert, save trans, insert (Insert fails) rollback, save trans, insert, commit.

Record #1, and #3 are commited to the database, while record #2 failed and was rolled back. Many times this type of logic can also be subdivided into separate transactions (BEGIN TRANS, insert, commit, begin trans, insert (fail), rollback, begin trans, insert, commit), but in some cases it can't. It's those rare cases in which save transaction is nice.

|||Please take a look at my second question and advise? Thanks.|||Can't really help you out there very much. In the past I've written my own .NET apps that monitored directories for file drops and sucked them into the database. Never used the bulk load stuff for that as I don't think it's a great idea to give access to the SQL Server itself for file drops.