Use CSV-files rather than Excel-files when importing data in ABC Analyzer

When you import data into ABC Analyzer, we normally recommend that you use CSV files – ie. files in raw text – and don’t use Excel files. The following is an explanation why. 

csv-xls

CSV files are pure text, and consequently extremely easy for computers to process. Everything in the file, except field separators and delimiters, get importet to ABC Analyzer.  The part of ABC Analyzer that process CSV files, fetch one row at a time – and due to the fact that it is pure text that is being read, this is done very fast.

Excel files are different and can contain almost everything – just look at the dialog when saving – the standard format is called Excel project. They can contain images, videos and sound – but most important in our work: text and numbers. When ABC Analyzer is reading data from an Excel file, it has to read ALL data before beginning to process it. During the import an Excel component is running in the background – it calculates the value of every cell if necessary.

This might seem irrational in our analytically powered world where data processing is done in Excel every day, but CSV files will produce increased performance and import data in shorter time. 

Problem #1: Memory usage

The below graph shows the use of memory in ABC Analyzer when importing data. When importing from Excel files, all data is imported at one time. 

untitled-2-en

Both imports are based on the same dataset around 10k rows, but the difference is the use of memory. The import of the CSV only use about 60 MB memory, whereas the import of the same data in the Excel format use around 150 MB – that is more than twice as much memory.  This sets an upper limit for datasets stored in Excel files for use with ABC Analyzer, the actual upper limit is very dependent on the properties of the dataset (width, types eg.). The import of Excel files gives an increased memory use with increased dataset size – in fact doubling the size of the dataset doubles the memory usage as well. On the contrary, our tests clearly show that increased dataset sizes does not in fact increase the used memory by the program.

Furthermore, as seen on the graph the time use is almost four times higher when reading the data from Excel, than reading data from CSV-files. 

Problem #2: Installation

The component that ABC Analyzer uses when reading data from Excel files is a part of Excel. This means that in order to read Excel files, it is required to have Excel/Office installed on the machine alongside ABC Analyzer. When installing options are given to do a standard installation which wouldn’t work, or a full installation. A full installation is required to include the data component that ABC Analyzer needs.

Problem #3: Language

When ABC Analyzer reads Excel-files – using the installed Excel component – the component will make sure that all numbers and dates read are properly localized using separators for decimals eg. according to the active locale used by the computer it is running on.

If Excel files are shared between people with different locale settings – in multilingual offices and/or departments across multiple countries eg. – this will prevent the users from using the same template for importing data.

Solution

Use CSV files. According to performance and time consumption, this is a clear advantage, because  the computer and especially ABC Analyzer would seem faster

The Excel component used can get lost when upgrading of Office – and generally speaking its a lot easier to simply avoid it. Unless it is a strict requirement to have data for feeding to ABC Analyzer in Excel format, we recommend that these datasets are stored exclusively in CSV files.

For help and questions contact ABC Softwork Support – support@abcsoftwork.com