Look in:

Web oracle-core-dba.blogspot.com

Saturday, March 15, 2008

Usage of Split command in Linux.

Split, have a large file that you need to split into smaller chucks? A Oracle dump maybe? split is your command. Below I split a 110MB file into 30 megabyte chunks.

Split works just fine on both text, and binary (even compressed) files. An example is worth a thousand words of man page (that don't have examples). Here I have TBL_LOSANGELES_CA.dmp, a 109MB Oracle export dump file.

[root@oracle11gr1 ~]# mkdir split
[root@oracle11gr1 ~]# mv /share/TBL_LOSANGELES_CA.dmp split/
[root@oracle11gr1 ~]# cd split/
[root@oracle11gr1 split]# ll
total 111284
-rwxrw-r-- 1 vshare vshare 113836032 Oct 11 22:43 TBL_LOSANGELES_CA.dmp

[root@oracle11gr1 split]#split -b 30m TBL_LOSANGELES_CA.dmp TBL_LOSANGELES_CA_part_
[root@oracle11gr1 split]# ll
total 222584
-rwxrw-r-- 1 vshare vshare 113836032 Oct 11 22:43 TBL_LOSANGELES_CA.dmp
-rw-r--r-- 1 root root 31457280 Mar 14 14:07 TBL_LOSANGELES_CA_part_aa
-rw-r--r-- 1 root root 31457280 Mar 14 14:07 TBL_LOSANGELES_CA_part_ab
-rw-r--r-- 1 root root 31457280 Mar 14 14:07 TBL_LOSANGELES_CA_part_ac
-rw-r--r-- 1 root root 19464192 Mar 14 14:07 TBL_LOSANGELES_CA_part_ad

Cat command is used to join the files split by above split command.

[root@oracle11gr1 split]#cat TBL_LOSANGELES_CA_part_aa TBL_LOSANGELES_CA_part_ab TBL_LOSANGELES_CA_part_ac TBL_LOSANGELES_CA_part_ad >TBL_LOSANGELES

[root@oracle11gr1 split]# ls -lh
total 327M
-rw-r--r-- 1 root root 109M Mar 14 14:15 TBL_LOSANGELES
-rwxrw-r-- 1 vshare vshare 109M Oct 11 22:43 TBL_LOSANGELES_CA.dmp
-rw-r--r-- 1 root root 30M Mar 14 14:07 TBL_LOSANGELES_CA_part_aa
-rw-r--r-- 1 root root 30M Mar 14 14:07 TBL_LOSANGELES_CA_part_ab
-rw-r--r-- 1 root root 30M Mar 14 14:07 TBL_LOSANGELES_CA_part_ac
-rw-r--r-- 1 root root 19M Mar 14 14:07 TBL_LOSANGELES_CA_part_ad

You can check whether the files are identical are not by using the DIFF command.

[root@oracle11gr1 split]# diff -s TBL_LOSANGELES TBL_LOSANGELES_CA.dmp
Files TBL_LOSANGELES and TBL_LOSANGELES_CA.dmp are identical
[root@oracle11gr1 split]#

The cat command can be broken into parts (this is useful if each part is on a separate disk, like one each on CDs). Take note that the first time, a single > is used to make sure that the data from this first part overwrites the destination file (if it already exists), but that each time after that, a double > is used to append to the destination file.

cat TBL_LOSANGELES_CA_part_aa > TBL_LOSANGELES
cat TBL_LOSANGELES_CA_part_ab >> TBL_LOSANGELES
cat TBL_LOSANGELES_CA_part_ac >> TBL_LOSANGELES
cat TBL_LOSANGELES_CA_part_ad >> TBL_LOSANGELES

No comments: