Julia - Strings


Advertisements

A string may be defined as a finite sequence of one or more characters. They are usually enclosed in double quotes. For example: “This is Julia programming language”. Following are important points about strings −

  • Strings are immutable, i.e., we cannot change them once they are created.

  • It needs utmost care while using two specific characters − double quotes(“), and dollar sign($). It is because if we want to include a double quote character in the string then it must precede with a backslash; otherwise we will get different results because then the rest of the string would be interpreted as Julia code. On the other hand, if we want to include a dollar sign then it must also precede with a backslash because dollar sign is used in string interpolation./p>

  • In Julia, the built-in concrete type used for strings as well as string literals is String which supports full range of Unicode characters via the UTF-8 encoding.

  • All the string types in Julia are subtypes of the abstract type AbstractString. If you want Julia to accept any string type, you need to declare the type as AbstractString.

  • Julia has a first-class type for representing single character. It is called AbstractChar.

Characters

A single character is represented with Char value. Char is a 32-bit primitive type which can be converted to a numeric value (which represents Unicode code point).

julia> 'a'
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

julia> typeof(ans)
Char

We can convert a Char to its integer value as follows −

julia> Int('a')
97

julia> typeof(ans)
Int64

We can also convert an integer value back to a Char as follows −

julia> Char(97)
'a': ASCII/Unicode U+0061 (category Ll: Letter, lowercase)

With Char values, we can do some arithmetic as well as comparisons. This can be understood with the help of following example −

julia> 'X' < 'x'
true

julia> 'X' <= 'x' <= 'Y'
false

julia> 'X' <= 'a' <= 'Y'
false

julia> 'a' <= 'x' <= 'Y'
false

julia> 'A' <= 'X' <= 'Y'
true

julia> 'x' - 'b'
22

julia> 'x' + 1
'y': ASCII/Unicode U+0079 (category Ll: Letter, lowercase)

Delimited by double quotes or triple double quotes

As we discussed, strings in Julia can be declared using double or triple double quotes. For example, if you need to add quotations to a part in a string, you can do so using double and triple double quotes as shown below −

julia> str = "This is Julia Programming Language.\n"
"This is Julia Programming Language.\n"

julia> """See the "quote" characters"""
"See the \"quote\" characters"

Performing arithmetic and other operations with end

Just like a normal value, we can perform arithmetic as well as other operations with end. Check the below given example −

julia> str[end-1]
'.': ASCII/Unicode U+002E (category Po: Punctuation, other)

julia> str[end÷2]
'g': ASCII/Unicode U+0067 (category Ll: Letter, lowercase)

Extracting substring by using range indexing

We can extract substring from a string by using range indexing. Check the below given example −

julia> str[6:9]
"is J"

Using SubString

In the above method, the Range indexing makes a copy of selected part of the original string, but we can use SubString to create a view into a string as given in the below example −

julia> substr = SubString(str, 1, 4)
"This"

julia> typeof(substr)
SubString{String}

Unicode and UTF-8

Unicode characters and strings are fully supported by Julia programming language. In character literals, Unicode \u and \U escape sequences as well as all the standard C escape sequences can be used to represent Unicode code points. It is shown in the given example −

julia> s = "\u2200 x \u2203 y"
"∀ x ∃ y"

Another encoding is UTF-8, a variable-width encoding, that is used to encode string literals. Here the variable-width encoding means that all the characters are not encoded in the same number of bytes, i.e., code units. For example, in UTF-8 −

  • ASCII characters (with code points less than 080(128) are encoded, using a single byte, as they are in ASCII.

  • On the other hand, the code points 080(128) and above are encoded using multiple bytes (up to four per character).

The code units (bytes for UTF-8), which we have mentioned above, are String indices in Julia. They are actually the fixed-width building blocks that are used to encode arbitrary characters. In other words, every index into a String is not necessarily a valid index. You can check out the example below −

julia> s[1]
'∀': Unicode U+2200 (category Sm: Symbol, math)
julia> s[2]
ERROR: StringIndexError("∀ x ∃ y", 2)
Stacktrace:
 [1] string_index_err(::String, ::Int64) at .\strings\string.jl:12
 [2] getindex_continued(::String, ::Int64, ::UInt32) at .\strings\string.jl:220
 [3] getindex(::String, ::Int64) at .\strings\string.jl:213
 [4] top-level scope at REPL[106]:1,

String Concatenation

Concatenation is one of the most useful string operations. Following is an example of concatenation −

julia> A = "Hello"
"Hello"
julia> B = "Julia Programming Language"
"Julia Programming Language"
julia> string(A, ", ", B, ".\n")
"Hello, Julia Programming Language.\n"

We can also concatenate strings in Julia with the help of *. Given below is the example for the same −

julia> A = "Hello"
"Hello"
julia> B = "Julia Programming Language"
"Julia Programming Language"
julia> A * ", " * B * ".\n"
"Hello, Julia Programming Language.\n"

Interpolation

It is bit cumbersome to concatenate strings using concatenation. Therefore, Julia allows interpolation into strings and reduce the need for these verbose calls to strings. This interpolation can be done by using dollar sign ($). For example −

julia> A = "Hello"
"Hello"
julia> B = "Julia Programming Language"
"Julia Programming Language"
julia> "$A, $B.\n"
"Hello, Julia Programming Language.\n"

Julia takes the expression after $ as the expression whose whole value is to be interpolated into the string. That’s the reason we can interpolate any expression into a string using parentheses. For example −

julia> "100 + 10 = $(100 + 10)"
"100 + 10 = 110"

Now if you want to use a literal $ in a string then you need to escape it with a backslash as follows −

julia> print("His salary is \$5000 per month.\n")
His salary is $5000 per month.

Triple-quoted strings

We know that we can create strings with triple-quotes as given in the below example −

julia> """See the "quote" characters"""
"See the \"quote\" characters"

This kind of creation has the following advantages −

Triple-quoted strings are dedented to the level of the least-intended line, hence this becomes very useful for defining code that is indented. Following is an example of the same −

julia> str = """
                  This is,
                  Julia Programming Language.
               """
" This is,\n Julia Programming Language.\n"

The longest common starting sequence of spaces or tabs in all lines is known as the dedentation level but it excludes the following −

  • The line following “””

  • The line containing only spaces or tabs

julia> """ This
             is
               Julia Programming Language"""
"       This\nis\n Julia Programming Language"

Common String Operations

Using string operators provided by Julia, we can compare two strings, search whether a particular string contains the given sub-string, and join/concatenate two strings.

Standard Comparison operators

By using the following standard comparison operators, we can lexicographically compare the strings −

julia> "abababab" < "Tutorialspoint"
false

julia> "abababab" > "Tutorialspoint"
true

julia> "abababab" == "Tutorialspoint"
false

julia> "abababab" != "Tutorialspoint"
true

julia> "100 + 10 = 110" == "100 + 10 = $(100 + 10)"
true

Search operators

Julia provides us findfirst and findlast functions to search for the index of a particular character in string. You can check the below example of both these functions −

julia> findfirst(isequal('o'), "Tutorialspoint")
4

julia> findlast(isequal('o'), "Tutorialspoint")
11

Julia also provides us findnext and findprev functions to start the search for a character at a given offset. Check the below example of both these functions −

julia> findnext(isequal('o'), "Tutorialspoint", 1)
4
julia> findnext(isequal('o'), "Tutorialspoint", 5)
11
julia> findprev(isequal('o'), "Tutorialspoint", 5)
4

It is also possible to check if a substring is found within a string or not. We can use occursin function for this. The example is given below −

julia> occursin("Julia", "This is, Julia Programming.")
true

julia> occursin("T", "Tutorialspoint")
true

julia> occursin("Z", "Tutorialspoint")
false

The repeat() and join() functions

In the perspective of Strings in Julia, repeat and join are two useful functions. Example below explains their use −

julia> repeat("Tutorialspoint.com ", 5)
"Tutorialspoint.com Tutorialspoint.com Tutorialspoint.com Tutorialspoint.com Tutorialspoint.com "

julia> join(["TutorialsPoint","com"], " . ")
"TutorialsPoint . com"

Non-standard String Literals

Literal is a character or a set of characters which is used to store a variable.

Raw String Literals

Raw String literals are another useful non-standard string literal. They, without interpolation or unescaping can be expressed in the form of raw”…”. They create ordinary String objects containing enclosed contents same as entered without interpolation or unescaping.

Example

julia> println(raw"\\ \\\"")
\\ \"

Byte Array Literals

Byte array literals is one of the most useful non-standard string literals. It has the following rules −

  • ASCII characters as well as escapes will produce a single byte.

  • Octal escape sequence as well as \x will produce the byte corresponding to the escape value.

  • The Unicode escape sequence will produce a sequence of bytes encoding.

All these three rules are overlapped in one or other sense.

Example

julia> b"DATA\xff\u2200"
8-element Base.CodeUnits{UInt8,String}:
 0x44
 0x41
 0x54
 0x41
 0xff
 0xe2
 0x88
 0x80

The above resulting byte array is not a valid UTF-8 string as you can see below −

julia> isvalid("DATA\xff\u2200")
false

Version Number Literals

Version Number literals are another useful non-standard string literal. They can be the form of v”…”. VNL create objects namely VersionNumber. These objects follow the specifications of semantic versioning.

Example

We can define the version specific behavior by using the following statement −

julia> if v"1.0" <= VERSION < v"0.9-"
            # you need to do something specific to 1.0 release series
         end

Regular Expressions

Julia has Perl-compatible Regular Expressions, which are related to strings in the following ways −

  • RE are used to find regular patterns in strings.

  • RE are themselves input as strings. It is parsed into a state machine which can then be used efficiently to search patterns in strings.

Example

julia> r"^\s*(?:#|$)"
r"^\s*(?:#|$)"

julia> typeof(ans)
Regex

We can use occursin as follows to check if a regex matches a string or not −

julia> occursin(r"^\s*(?:#|$)", "not a comment")
false

julia> occursin(r"^\s*(?:#|$)", "# a comment")
true
Advertisements