Split
The split command or utility allows you to split by lines, size or the number of smaller files you need. Another related utility is csplit than can also be used.
Split files based on # of lines
Let’s say we want to split the file into several files based on a predetermined number of lines. This works best if the file contains lines separated by the end of line character, as it usually does. Let’s split our big file (eg. bigfile.txt) into files with 275 lines each.
This will split the files into several files, named xaa, xab, xac etc. each of which contain 275 lines each. If you don’t specify the number of lines (-l) then the default is 1000. The default for the output file prefix is x in most cases. I usually like to specify a prefix that ends with “-“, but that is upto you.
If you prefer numerical suffixes instead of the character suffixes in the output, use the -d option with the command.
Sometimes, you want a specific extension for your files such as .txt. You can do this using the command line option –additional-suffix
Split files based on size
Let’s say we want to divide the file into several files each of which is 5k in size. You can specify the size in bytes, kilobytes, mega bytes etc. as well.
Split into specific number of files
If you want to split the file into 2 equally sized files, then you can do something like this:
Of course, to split it in to even more number of files you specify the number with the -n option. One issue with splitting it like this is that it could cause the lines to be split between the files. In most cases, you want the lines to be preserved so that the entire line is within the same output file.
The above example will split the file into 10 equally sized files while preserving the lines. That means that lines will not be split between files. The value or argument is a lowercase L, just to be clear.
Sometimes, you want just part of the file and not the entire file. For example, if you want to split the file into 4 equal parts but is only interested in the 3rd section or part, then you could do something like:
Split based on content
Another common use case is when you want to split based on the content of the file. This is a specialized use case, but can be very useful. The utility named csplit can be used to split files into sections determined by the context or content of the lines.
The generic syntax of the csplit command is
So, as an example if you want to split a file when you encounter the text or line Error, then you could do…
The above command will split the file whenever it finds a line that starts with the word Error. The argument ‘/^Error/’ is the regular expression we are matching against. The next argument {*} specifies how many times the match should be repeated. The argument ‘{*}’ specifies that it will repeated till the end of file.
Last updated