Wiki
  • Introduction
  • 👾Penetration Testing
    • Application Security
      • Mobile App Security
        • Android Application Testing
          • Security Checklist
          • SSL Pinning Bypasses
          • Non-Proxy Aware Applications
            • Setting up VPN Server
            • Bypasses
          • Common Proxying Issues
          • Android Local Storage Checks
          • Android Task Hijacking
          • Kiosk Mode / Breakout Testing
          • Magisk on GenyMotion
        • iOS Application Testing
          • iOS Testing Using Objection
          • IPA Analysis Using MobSF
          • iOS Jailbreak Bypass
          • Decrypting iOS Apps
          • iOS Reverse Engineering
          • Jailbreak Detection Bypasses
          • iOS Local Storage Checks
          • Installing IPA
          • ATS Auditing
          • iOS Jailbreaking
          • Frida Pinning Bypasses
          • iOS Jailbreaking
        • Code Security
        • Frida on Windows
      • Web Application Security
        • Web Shells
        • CSV Injection
        • Measure Response Time using CURL
        • OSINT
          • EyeWitness
        • GraphQL Hacking
      • API Security
        • Security Checklist
        • Postman and Burp
        • CURL via BurpSuite
        • SOAP API Pentesting
    • Infrastructure Security
      • Network Infrastructure
        • Red Team Powershell Scripts
        • Mounting NFS Shares
        • Password Cracking/Auditing
        • Remote Access Sheet
        • Password Cracking Using Hashcat
        • Calculate IP Addresses from CIDR
        • Grep IP addresses or IP Ranges from a File
        • Default Credentials Checking
        • Check SSL/TLS Certificates
        • Log a terminal session
        • Unauthenticated Mongo DB
        • Microsoft SQL Server (MSSQL)
        • NTP Mode 6 Vulnerabilities
        • BloodHound
        • AD Offensive Testing
        • CrackMapExec
        • Select all IP addresses in Sublime Text
        • Convert CIDRs to an IP address list
        • Microsoft Exchange Client Access Server Information Disclosure
        • Web Server HTTP Header Internal IP Disclosure
        • smbclient.py
        • GetUserSPNs.py
        • Get-GPPPassword.py
        • SMBMap
        • Mounting Shares
        • mitm6
        • AD Attacks
        • Weak IKE Security Configurations
        • Locked BIOS Password Bypass
      • Wireless Security
        • Cached Wireless Keys
        • Aircrack Suite
    • SSL/TLS Security
    • Secure Code Review
      • Python
      • Semgrep
        • Semgrep to HTML Report
    • Cloud Security
      • Cloud Penetration Testing
    • Social Engineering
      • Simulated Phishing
        • GoPhish
    • Tool Usage
      • Docker
      • Split
      • PhantomJS
      • Aquatone
      • Tmux
      • Ipainstaller
      • Public IP From Command Line
      • Wifite
      • IKE Scan
      • Grep
      • Pulling APKs
      • Bitsadmin
      • Drozer
      • Iptables
      • Python Web Server
      • Crackmapexec
      • Impacket
      • Nessus
      • Adding SUDO User
      • Nmap
      • Metasploit Payloads
      • SMTP Open Relay
      • SQLMap
      • Screen
      • Remove All After Colon
      • Remove Old Linux Kernels
      • CURL
      • Hashcat
      • Secure Copy Protocol (SCP)
      • SSH & PGP Tools
      • IP Calculator
      • BloodHound
      • Netcat File Transfer
      • OpenVAS
      • BurpSuite
      • Exiftool
      • Python Virtual Environments
    • Errors and Solutions
      • Kill Process On Specific Port
      • Kill SSH Port Forwarding
      • SSH Key
      • Expanding Disk on Kali VM
    • Scoping
      • Scoping Questionnaires
        • Mobile App Testing
    • OSINT
      • Dark Web OSINT
      • Certificate Chain Check
      • EyeWitness - Web Service Screenshot
      • Tor to Browse Onion Links
      • DarkDump - Scan Dark Web for Onion Links
      • Domain related File Search
      • Google Dorking
      • IP / Network Blocks owned by a Company
  • ⌨️Programming
    • Automation
      • Running a Service at Boot
      • Network Connectivity Cron
    • Python
      • Adding Columns in Pandas
      • Copy Entire Column Data To New Column Pandas
      • Loading Progress Bar
      • Reorder Columns in Pandas
      • Filename with Date/Time Stamp
      • Command Line Arguments
      • Changing Date Format
      • Removing Index Column Pandas
      • Regex - Remove HTML Tags
      • Column Header Mapping
  • 🌐Miscellaneous
    • Scripts
      • Clickjacking Checker
      • Bulk WHOIS
      • SMB Signing Check
      • FDQN to IP Address
      • Grep IP Addresses
      • Nessus Parser
      • Build Review Audit
      • Nessus Merger
      • Nmap2CSV
      • Remove Audio From Videos
    • Favourite Reads/Links
    • Hacking Posters
    • Windows Developer VMs
    • Windows Workspaces
    • GitHub Pages
    • Interview Prep
      • Senior Penetration Tester
    • CVSS Formula
    • Android Rooting
      • Lineage OS 18.1 on OnePlus X
      • TWRP Recover on OnePlus X
      • Magisk Rooting
    • Presentation Slides
      • BlackHat - USA [2022]
  • 🐞Vulnerability Wiki
    • 🌐APPLICATION LEVEL
      • 🔒AUTHENTICATION
        • Authentication Bypass
        • Lack of Password Confirmation
        • 2FA Code Brute-forceable
        • Lack of Verification
        • Lack of Throttling on Form Submissions
        • Lack of Rate Limiting on Login
        • Weak Password Complexity Rules
        • 🖥️SESSION MANAGEMENT
        • 🔑ACCESS CONTROL
      • 🔢INPUT VALIDATION
      • ➗CRYPTOGRAPHY
      • 📉LOGGING
      • 📕DATA PROTECTION
      • 📲COMMUNICATION
      • 👨‍💻MALICIOUS CODE
      • 💡LOGIC
      • 🗄️FILE UPLOAD
      • ⚙️API ISSUES
      • 🔍CONFIGURATIONS
    • 💾INFRASTRUCTURE LEVEL
      • ICMP Timestamp Request Remote Date Disclosure (CVE-1999-0524)
      • ASP.NET Debug Mode Validation
Powered by GitBook
On this page
  • Split files based on # of lines
  • Split files based on size
  • Split into specific number of files
  • Split based on content

Was this helpful?

  1. Penetration Testing
  2. Tool Usage

Split

The split command or utility allows you to split by lines, size or the number of smaller files you need. Another related utility is csplit than can also be used.

Split files based on # of lines

Let’s say we want to split the file into several files based on a predetermined number of lines. This works best if the file contains lines separated by the end of line character, as it usually does. Let’s split our big file (eg. bigfile.txt) into files with 275 lines each.

$ split -l 275 bigfile.txt

This will split the files into several files, named xaa, xab, xac etc. each of which contain 275 lines each. If you don’t specify the number of lines (-l) then the default is 1000. The default for the output file prefix is x in most cases. I usually like to specify a prefix that ends with “-“, but that is upto you.

$ split -l 275 bigfile.txt smallfile-

If you prefer numerical suffixes instead of the character suffixes in the output, use the -d option with the command.

$ split -l 275 -d bigfile.txt smallfile-

Sometimes, you want a specific extension for your files such as .txt. You can do this using the command line option –additional-suffix

$ split -l 275 -d bigfile.txt smallfile- --additional-suffix=.txt

Split files based on size

Let’s say we want to divide the file into several files each of which is 5k in size. You can specify the size in bytes, kilobytes, mega bytes etc. as well.

$ split -b 5k bigfile.txt smallfile-

Split into specific number of files

If you want to split the file into 2 equally sized files, then you can do something like this:

$ split -n 2 -d bigfile.txt smallfile-

Of course, to split it in to even more number of files you specify the number with the -n option. One issue with splitting it like this is that it could cause the lines to be split between the files. In most cases, you want the lines to be preserved so that the entire line is within the same output file.

$ split -n l/10 -d bigfile.txt smallfile-

The above example will split the file into 10 equally sized files while preserving the lines. That means that lines will not be split between files. The value or argument is a lowercase L, just to be clear.

Sometimes, you want just part of the file and not the entire file. For example, if you want to split the file into 4 equal parts but is only interested in the 3rd section or part, then you could do something like:

$ split -n l/3/4 bigfile.txt > myfile.txt

Split based on content

Another common use case is when you want to split based on the content of the file. This is a specialized use case, but can be very useful. The utility named csplit can be used to split files into sections determined by the context or content of the lines.

The generic syntax of the csplit command is

$ csplit [options] <source file> <regex expression>

So, as an example if you want to split a file when you encounter the text or line Error, then you could do…

$ csplit -k bigfile.txt '/^Error/' {*}

The above command will split the file whenever it finds a line that starts with the word Error. The argument ‘/^Error/’ is the regular expression we are matching against. The next argument {*} specifies how many times the match should be repeated. The argument ‘{*}’ specifies that it will repeated till the end of file.

PreviousDockerNextPhantomJS

Last updated 4 years ago

Was this helpful?

👾