Monday, August 27, 2012

[Matlab] fast reading of huge ASCII input files

The following code provides a rather fast way to deal with large ASCII data files containing groups of sub-data with different numbers of columns. For N=1E6 (leading to a ~2 M lines input file), the reading takes ~5 sec while the data reconstruction in matrix form lasts ~55 sec (on a 2012 laptop PC).

% Cleansing
clear all;
clc;

% Input Data
N=1E6;
x = 0:pi/N:pi;
y = [x; cos(x)];            %2 columns
xx= pi:pi/N:2*pi;
yy = [xx;cos(xx);sin(xx)];  %3 columns

% ASCII file writing
fprintf('* writing\n');
tic;
pfile = fopen('data.txt', 'w');
fprintf(pfile, '%12.8f %12.8f\n', y);
fprintf(pfile, '%12.8f %12.8f %12.8f\n', yy);
fclose(pfile);
toc;

% ASCII file reading
fprintf('\n* reading\n');
tic;
pfile=fopen('data.txt');
InputText=textscan(pfile,'%s','delimiter','\n');
fclose(pfile);
NN=length(InputText{1,1});
toc;

% Data matrix creation
fprintf('\n* matrix creation\n');
tic;
A=zeros(NN,3);
for i=1:NN
    line=InputText{1,1}{i,1};
    [str,count] = sscanf(line, '%s');
    if count == 2
        A(i,1:2)= sscanf(line, '%f %f');
    elseif count == 3
        A(i,1:3)= sscanf(line, '%f %f %f');
    end
end
toc;