piconv Command in Linux



The piconv command in Linux converts text encodings. It is part of the Perl distribution and is often used to convert files from one character encoding to another. It reads input from STDIN or specified files, converts the encoding, and outputs the result to STDOUT.

Table of Contents

Here is a comprehensive guide to the options available with the piconv command −

Syntax of piconv Command

The syntax of the Linux piconv command is as follows −

piconv [options] [file...]

In the above syntax, the [options] field is used to specify various options to change the command's behavior. The [file...] field is used to specify one or more input files that need to be converted.

piconv Command Options

The options of the piconv command are listed below −

Flag Options Description
-f from_encoding --from from_encoding Specifies the encoding to convert from. If omitted, the current locale is used.
-t to_encoding --to to_encoding Specifies the encoding to convert to. If omitted, the current locale is used.
-s, --string Uses the provided string instead of a file as the source of text.
-l, --list Lists all available encodings in case-insensitive order. Only canonical names are shown.
-r encoding_alias --resolve encoding_alias Resolves encoding_alias to the Encode canonical encoding name.
-C N --check N Checks stream validity (N=1). If N=-1, handles invalid characters uniquely.
-c Alias for -C 1.
-p, --perlqq Transliterates missing characters to \x{HHHH} (hexadecimal Unicode code point).
--htmlcref Transliterates missing characters to &#NNN; (decimal Unicode code point).
--xmlcref Transliterates missing characters to &#xHHHH; (hexadecimal Unicode code point).
-h --help Displays usage information.
-D --debug Enables debugging mode, mainly for Encode hackers.
-S --scheme Selects the conversion scheme. Options include from_to (default), decode_encode, or perlio.

Examples of piconv Command in Linux

In this section, the usage of the piconv command in Linux will be discussed with examples −

Converting File Encoding

To convert a file encoding from UTF-8 to ASCII, use the piconv command in the following way −

piconv -f UTF-8 -t ASCII file.txt > output.txt
piconv Command in Linux1

In the above command, the -f option specifies the source encoding (the encoding of the input file or text), and the -t option specifies the target encoding (the encoding to which the input will be converted). If the output file is not specified the output will be displayed to stdout.

For verification of the conversion, use the file command −

file output.txt
piconv Command in Linux2

Note that the current locale will be used if -f and -t options are not specified.

Converting Encoding of a String

To convert text from a string, use the -s or --string option with a string in quotes −

piconv -f UTF-8 -t ASCII -s "Hello, world, Привет, мир" > output.txt

Listing Available Encodings

To list the supported encodings by the piconv command, use the -l or --list option −

piconv -l
piconv Command in Linux3

Resolving the Encodings

To resolve an encoding alias to its canonical name, use the -r or --resolve option −

piconv -r latin1
piconv Command in Linux4

The alias is a shorthand or alternative name for a character encoding. While the canonical name is the standard, officially recognized name for that encoding.

Checking Encoding Validity

To check the encoding validity, use the -C or --check option with the piconv command −

piconv -C 1 -f UTF-8 -t UTF-8 file.txt > output.txt

In the above command, the -C 1 option tells piconv to check the validity of the encoding. If any invalid characters are encountered, the command will return an error.

All the check levels are listed below −

Options Actions
-C 1 Replace invalid characters with �(Unicode replacement characters)
-C 2 Skip invalid characters and omit them from the output
-C 3 Skip invalid characters silently (no error or replacement)
-C 4 Warn and keep invalid characters as is, with a warning message
-C 5 Replace invalid characters with a user-defined string
-C 6 Skip invalid characters silently without any indication
-C 7 Throw an error and halt the process if invalid characters are encountered

Handling Missing Characters while Conversion

To replace missing characters with a Perl-style Unicode representation, use the -p or --perlqq option −

piconv -p -f UTF-8 -t ASCII file.txt > output.txt

To replace missing characters with HTML or XML character references, use the following commands −

piconv --htmlcref -f UTF-8 -t ASCII file.txt > output.html
piconv --xmlcref -f UTF-8 -t ASCII file.txt > output.xml

Specifying a Conversion Scheme

To specify the conversion scheme, use the -S or --scheme option −

piconv -S perlio -f UTF-8 -t ASCII file.txt > output.txt

Other available schemes are, from_to and decode_encode.

Displaying Usage Help

To display the usage help of the piconv command, use the -h or --help option −

piconv -h

Conclusion

The piconv command in Linux is used to convert text encodings. It offers various options for encoding conversion, validating streams, handling missing characters, and listing supported encodings.

The piconv command can operate on files or strings. By specifying source and target encodings with the -f and -t options, files can be easily converted between different formats. Additionally, options like -p, -S, and -l provide further control over the conversion process, enabling the handling of missing characters, specifying conversion schemes, and viewing available encodings.

Advertisements