Monday, March 26, 2012

Multi Format Text File

Hello. I am in the process of migrating an old app to SQL Server. The old app reads hundreds of different flat file formats. One of the more complex ones is a multi-format delimited file. For example:

01^Bob Johnson^123 Main St^Anytown^St
02^Book1^$20
02^Book2^$30
03^Gift Cert^Happy Birthday^$100

This file is delimited with the ^ character. Note that the first 2 characters identify the row type. All 01 rows have data in the format: Name, Street Address, City, State. All 02 have data in the format: Book name, price. Etc.

Any clever ideas on how to parse this? I tried setting it up as a flat file source with the ^ delimiter. It doesn't work - in this example it wraps the third row to the end of the second row and keeps adding columns to fill out the row.

The only option that I can think of is to pull the entire row into one long column, and then use a script component to manually substring each column out.

Any help would be greatly appreciated.

Thanks,
Chris

tofferr wrote:

The only option that I can think of is to pull the entire row into one long column, and then use a script component to manually substring each column out.

Yes! That's the best thing to do, in my opinion, and what I do in practice. You can also use a derived column to substring the rows instead of a script component.

Also, you could have one data flow with a conditional split in it to direct each record type to a unique flat file, raw file, sql server table, etc... for staging. From there you can have have an appropriately mapped flat file connection (to the staging file/table/whatever) and continue using a data flow as you normally would.

So, bottom line, which would you rather maintain? substrings or flat file definitions?|||It looks like the rows are related. I think the script will need to generate some type of transaction key for every 01 record and carry it forward for the other row types. I don't think the relationship between the rows can be maintained by using the alternate methods.|||

JayH wrote:

It looks like the rows are related. I think the script will need to generate some type of transaction key for every 01 record and carry it forward for the other row types. I don't think the relationship between the rows can be maintained by using the alternate methods.

If each file only contains one name, then you're fine. If if contains more than one name then when loading the initial file you may want to use a script component to generate a unique key which gets incremented by one each time you read a new 01 record. Carry this key forward and you can later reassemble the records accordingly.|||Thank you for the quick responses!

I simplified my example significantly. For example, a 01 row really has well over 150 fields. And, yes, the rows are related and there are many 01 records in each file, so I believe Jay H is correct.and the alternate methods will not work.

Any other helpful thoughts?

BTW, I tried to check out http://www.ssistalk.com/ but it did not load? Temporary problem?

Thanks,
Chris
|||

tofferr wrote:

Any other helpful thoughts?


Yep, see my response above. You can mark each "record" (the 01 and its associated children) with a unique key. Then later when you work with the data you can work on batches as defined by that key.

tofferr wrote:


BTW, I tried to check out http://www.ssistalk.com/ but it did not load? Temporary problem?

Not sure what the deal was there, but the machine was unresponsive. It's back up. Thanks for the note.

No comments:

Post a Comment