# Manipulating Texts

Now, we will move on to text manipulation, which is an important part of Linux since we will be working with files a lot, of course.

## Viewing Files

The `/etc/passwd` file is a critical system file in Linux that stores user account information. Each line represents a user and contains seven colon-separated fields: **username**, **password** (usually an `x` indicating the password is stored in `/etc/shadow`), **user ID (UID)**, **group ID (GID)**, **user info (GECOS)**, **home directory**, and **login shell**. For example, `root:x:0:0:root:/root:/bin/bash` describes the root user with UID 0, home directory `/root`, and shell `/bin/bash`.

```bash
┌──(kali㉿kali)-[~/Desktop]
└─$ cat /etc/passwd
root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
--snip--
```

## Filtering Text

While `cat` displays the entire content of a file at once, which can overwhelm the terminal when dealing with large files, `more` and `less` allow you to view the file content screen by screen. This makes it easier to read and navigate through the file without losing visibility of the text as it scrolls off the screen.

### More

The `more` command is a basic paging tool that lets you scroll through a file forward, either line by line or page by page. However, it does not support backward navigation, which can be limiting.

```bash
┌──(kali㉿kali)-[~/Desktop]
└─$ more /etc/ssh/sshd_config

```

* The `more` command will display the contents of the `/etc/ssh/sshd_config` file one screen at a time.
* You can press <kbd>**Enter**</kbd> to scroll down line by line.
* Press <kbd>**Space**</kbd> to move to the next page.
* Press <kbd>**q**</kbd> to quit and return to the terminal.

### Less

The `less` command is a more advanced and versatile tool. It allows both forward and backward navigation, supports searching within the file, and is more memory-efficient as it does not load the entire file at once. This makes `less` the preferred choice for viewing and analyzing large files, such as log files or configuration files.&#x20;

```bash
┌──(kali㉿kali)-[~/Desktop]
└─$ less /etc/ssh/sshd_config

```

* The `less` command opens the `/etc/ssh/sshd_config` file in a paginated view.
* You can navigate through the file using various commands.

#### Key Navigation Features in `less`:

1. **Scroll Line by Line**:
   * Use the <kbd>**Up Arrow**</kbd> or <kbd>**Down Arrow**</kbd> keys.
   * Alternatively, press <kbd>**j**</kbd> to move down or **k** to move up.
2. **Scroll Page by Page**:
   * Press <kbd>**Space**</kbd> to move forward one page.
   * Press <kbd>**b**</kbd> to move backward one page.
3. **Search**:
   * Press <kbd>**/**</kbd> followed by a search term (e.g., `/Port`) to search forward.
   * Press <kbd>**?**</kbd> followed by a search term to search backward.
   * After a search, press <kbd>**n**</kbd> to jump to the next match or <kbd>**N**</kbd> to go to the previous match.
4. **Jump to the Start or End**:
   * Press <kbd>**g**</kbd> to go to the start of the file.
   * Press <kbd>**G**</kbd> to go to the end of the file.
5. **Quit**:
   * Press <kbd>**q**</kbd> to exit `less` and return to the terminal.

### head

The **`head`** command displays the first few lines of a file, with a default of 10 lines. This is useful when you only need to inspect the beginning of a file. For example, `head -n 5 /etc/passwd` would display the first 5 lines of the `/etc/passwd` file.

```bash
┌──(kali㉿kali)-[~/Desktop]
└─$ head -n 5 /etc/passwd
root:x:0:0:root:/root:/usr/bin/zsh
daemon:x:1:1:daemon:/usr/sbin:/usr/sbin/nologin
bin:x:2:2:bin:/bin:/usr/sbin/nologin
sys:x:3:3:sys:/dev:/usr/sbin/nologin
sync:x:4:65534:sync:/bin:/bin/sync
```

* **`head`**: A command used to display the beginning of a file.
* **`n 5`**: An option for `head` that tells it to show only the first 5 lines of the file.
* **`/etc/passwd`**: The file being read, which contains user account information on Unix-like systems.

### Tail

The `tail` command displays the last few lines of a file, with a default of 10 lines. This is useful when you only need to inspect the end of a file. For example, `tail -n 5 /etc/passwd` would display the last 5 lines of the `/etc/passwd` file.

```bash
┌──(kali㉿kali)-[~/Desktop]
└─$ tail -n 5 /etc/passwd
postgres:x:130:132:PostgreSQL administrator,,,:/var/lib/postgresql:/bin/bash
mosquitto:x:131:133::/var/lib/mosquitto:/usr/sbin/nologin
inetsim:x:132:134::/var/lib/inetsim:/usr/sbin/nologin
_gvm:x:133:136::/var/lib/openvas:/usr/sbin/nologin
kali:x:1000:1000:,,,:/home/kali:/usr/bin/zsh
```

### Cut

The `cut` command in Unix/Linux is used to extract specific columns, fields, or characters from text input, typically using delimiters like commas, tabs, or colons. It is ideal for parsing structured data, such as CSV files or system files like `/etc/passwd`. With options to select fields (`-f`), specify delimiters (`-d`), or extract character ranges (`-c`), it is a powerful yet lightweight tool for text processing and scripting tasks. Its simplicity and efficiency make it a go-to utility for quick data extraction and manipulation.

```bash
┌──(kali㉿kali)-[~/Desktop]
└─$ cut -d':' -f1 /etc/passwd
root
daemon
bin
sys
sync
games
man
lp
mail
---snip---
```

1. **`d':'`**:
   * This option specifies the delimiter (separator) used in the file. In this case, the delimiter is a colon (`:`), which is the standard separator in the `/etc/passwd` file.
2. **`f1`**:
   * This option specifies the field to extract. Here, `1` refers to the first field. In the `/etc/passwd` file, the first field represents the **username**.
3. **`/etc/passwd`**:
   * This is the file being processed. The `/etc/passwd` file stores user account information, with each line representing a user and fields separated by colons (`:`).

## Word Count

Instead of manually counting lines or characters, we can utilize the **`wc` command**. By using the `-l` option with `wc`, we instruct it to count only the **number of lines**, providing a quick and efficient way to tally matches. This combination of tools enhances text processing workflows, enabling precise replacements and accurate match counting.

```bash
┌──(kali㉿kali)-[~/Desktop]
└─$ wc -l /etc/passwd                                                                                                                  
59
```

Not only this, but there are also numerous other commands for filtering text or content, such as `sort, sed, awk, tr, and column`, we encourage you to explore these commands independently, conduct some research, and familiarize yourself with the various switches and options they offer. Additionally, practice using them to solidify your understanding. It is impractical for us to cover every command and all their switches in detail, as doing so would require an extensive amount of time.

## **File Descriptors** and **Redirections**

**File Descriptors** and **Redirections** are fundamental concepts in Unix/Linux systems that allow you to manage input and output (I/O) for commands and processes. They provide powerful ways to control where data is read from or written to.

### **File Descriptors**

A **file descriptor** is a numeric identifier used by the operating system to access files or I/O resources. In Unix/Linux, the following file descriptors are standard:

1. **0 (Standard Input - stdin)**: Used for reading input (e.g., keyboard input).
2. **1 (Standard Output - stdout)**: Used for writing output (e.g., terminal display).
3. **2 (Standard Error - stderr)**: Used for writing error messages.

### **Redirections**

Redirection allows you to change where input comes from or where output goes. It is done using special operators in the shell.

#### **Common Redirection Operators**

1. **Redirect Standard Output (`>`)**
   * Sends the output of a command to a file (overwrites the file).
   * Example: `ls > file.txt` writes the output of `ls` to `file.txt`.
2. **Append Standard Output (`>>`)**
   * Appends the output of a command to a file (does not overwrite).
   * Example: `ls >> file.txt` appends the output of `ls` to `file.txt`.
3. **Redirect Standard Error (`2>`)**
   * Sends error messages to a file.
   * Example: `ls /nonexistent 2> error.log` writes errors to `error.log`.
4. **Redirect Both Standard Output and Error (`&>`)**
   * Sends both output and errors to the same file.
   * Example: `ls /nonexistent &> output.log` writes both output and errors to `output.log`.
5. **Redirect Standard Input (`<`)**
   * Reads input from a file instead of the keyboard.
   * Example: `sort < file.txt` sorts the contents of `file.txt`.

### Pipes

**Pipes** in Linux are used to connect the output of one command directly to the input of another, enabling seamless data flow between commands. Represented by the `|` symbol, pipes allow you to chain commands together to perform complex operations in a single line. They are particularly useful for filtering, processing, or transforming data without intermediate files. For example, you can use `ls` to list files and pipe its output to `grep` to filter specific files. Pipes streamline workflows and make command-line operations more efficient.

```bash
┌──(kali㉿kali)-[~/Desktop]
└─$ sudo cat /etc/shadow | grep 'kali'                                                                                                 
kali:$y$j9T$ufXTBpN1QpgwlgqRFmb/B0$/.y0ybAF4iNQXniErsDWf9QSl2HZH7LnBeRHB4ZiQa9:20057:0:99999:7:::
```

The `/etc/shadow` file is a secure system file in Unix/Linux that stores **encrypted user password information** and related account details. It is only accessible by the root user or processes with root privileges. This file enhances security by separating password data from the publicly readable `/etc/passwd` file.

1. **`sudo`**:
   * The `sudo` command is used to execute the following command with superuser (root) privileges. Access to `/etc/shadow` requires root permissions because it contains sensitive information.
2. **`cat /etc/shadow`**:
   * The `cat` command reads and outputs the contents of the `/etc/shadow` file.
3. **`|` (Pipe)**:
   * The pipe symbol (`|`) takes the output of the `cat` command and passes it as input to the next command (`grep`).
4. **`grep 'kali'`**:
   * The `grep` command searches for the string `kali` in the input it receives. In this case, it looks for the line in `/etc/shadow` that corresponds to the user `kali`.

## Regular Expressions (`RegEx`)

Regular expressions (RegEx) are an art of expression language to search for patterns in text and files. They can be used to find and replace text, analyze data, validate input, perform searches, and more.

A regular expression is a sequence of letters and symbols that form a search pattern. In addition, regular expressions can be created with patterns called metacharacters. Meta characters are symbols that define the search pattern but have no literal meaning. We can use it in tools like grep or sed or others. Often regex is implemented in web applications for the validation of user input.

```bash
┌──(kali㉿kali)-[~/Desktop]
└─$ grep -E ':[0-9]{4,}:' /etc/passwd                                                                                                  
sync:x:4:65534:sync:/bin:/bin/sync
_apt:x:42:65534::/nonexistent:/usr/sbin/nologin
nobody:x:65534:65534:nobody:/nonexistent:/usr/sbin/nologin
dhcpcd:x:100:65534:DHCP Client Daemon,,,:/usr/lib/dhcpcd:/bin/false
strongswan:x:103:65534::/var/lib/strongswan:/usr/sbin/nologin
sshd:x:105:65534::/run/sshd:/usr/sbin/nologin
dnsmasq:x:999:65534:dnsmasq:/var/lib/misc:/usr/sbin/nologin
---snip---
```

1. **`:`**: Matches the colon separator.
2. **`[0-9]{4,}`**: Matches a sequence of 4 or more digits (UIDs ≥ 1000).
3. **`:`**: Matches the next colon separator.

Regular expressions are an extensive and intricate subject in their own right, making it impractical to cover every aspect within the confines of this book. Therefore, I encourage you to delve deeper into regular expressions, conduct further research, and expand your understanding of their capabilities. Additionally, I urge you to apply these concepts in practical scenarios to solidify your knowledge and proficiency.


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://handbook.ncateam.xyz/fundamentals/linux/manipulating-texts.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
