Daniel Esser

Flat files are commonly used for ETL operations. Imagine you have to load some big (several GB) flat files into a database.

If you watch the performance counters you will see that the processor time and the disk read time will raise up. This is because the disk permanantly delivers new data to the SSIS in-memory pipeline. The processor time raise up also, in this example up to 2o%, because of the ETL and database operations (overhead).

In my experiment I switched NTFS compression on for a 1GB CSV-file. The SSIS package now runs twice as fast on average, but why?

The answers’s simple: Uncompressed the file size on disk is 10GB. Compressed the size is 5GB. Constantly reading a large amount of data from a disk is expensive. In this case about 5GB are not transferred from disk to memory. But what is the downside of this?

The processor needs time to decompress the data. The NTFS compression algorithm loads a portion of the data into memory, decompresses it and pass it to the SSIS in-memory pipeline. You will notice that this behaviour fits good with a concept of an in-memory pipeline.

To sum up, I’d like to say that using NTFS compression can significantly raise up load performance on large flat files. I suggest to enable compression on files bigger than 5GB. But keep in mind that this goes at cost of processor time.