Article Categories
- All Categories
-
Data Structure
-
Networking
-
RDBMS
-
Operating System
-
Java
-
MS Excel
-
iOS
-
HTML
-
CSS
-
Android
-
Python
-
C Programming
-
C++
-
C#
-
MongoDB
-
MySQL
-
Javascript
-
PHP
-
Economics & Finance
mbrtowc() function in C/C++ program
The mbrtowc() function is used to convert a multibyte character sequence to a wide character. This function is part of the C standard library and is defined in the <wchar.h> header file. It provides a safe way to convert multibyte characters (like UTF-8) to wide character representation.
Syntax
size_t mbrtowc(wchar_t* pwc, const char* s, size_t n, mbstate_t* ps);
Parameters
The function accepts the following parameters −
- pwc − Pointer to the location where the resulting wide character will be stored
- s − Pointer to the multibyte character string to be converted
- n − Maximum number of bytes to examine from the string
- ps − Pointer to the conversion state object
Return Value
The function returns different values based on the conversion result −
- 0 − The character converted is the null character
- 1 to n − Number of bytes that make up the converted multibyte character
- (size_t)-2 − The next n bytes form an incomplete but valid multibyte character
- (size_t)-1 − Encoding error occurred, errno is set to EILSEQ
Example 1: Basic Conversion
Here's a simple example demonstrating the basic usage of mbrtowc() −
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>
void convertString(const char* str) {
mbstate_t state = {0};
wchar_t wc;
size_t result;
const char* ptr = str;
printf("Converting: %s\n", str);
while (*ptr) {
result = mbrtowc(&wc, ptr, strlen(ptr), &state);
if (result == 0) {
break;
} else if (result == (size_t)-1) {
printf("Encoding error\n");
break;
} else if (result == (size_t)-2) {
printf("Incomplete character\n");
break;
} else {
printf("Converted %zu bytes to wide character: %lc\n", result, wc);
ptr += result;
}
}
}
int main() {
setlocale(LC_ALL, "");
const char* text = "Hello";
convertString(text);
return 0;
}
Converting: Hello Converted 1 bytes to wide character: H Converted 1 bytes to wide character: e Converted 1 bytes to wide character: l Converted 1 bytes to wide character: l Converted 1 bytes to wide character: o
Example 2: UTF-8 Multibyte Conversion
This example demonstrates conversion of UTF-8 encoded characters −
#include <stdio.h>
#include <wchar.h>
#include <locale.h>
#include <string.h>
int main() {
setlocale(LC_ALL, "");
const char utf8_str[] = {0x48, 0x65, 0x6C, 0x6C, 0x6F, 0x20,
0xE2, 0x9C, 0x93, 0x00}; // "Hello ?"
mbstate_t state = {0};
wchar_t wc;
size_t result;
const char* ptr = utf8_str;
printf("Converting UTF-8 string:\n");
while (*ptr) {
result = mbrtowc(&wc, ptr, strlen(ptr), &state);
if (result == 0) {
break;
} else if (result > 0 && result != (size_t)-1 && result != (size_t)-2) {
printf("Converted %zu bytes, wide char code: %d\n", result, (int)wc);
ptr += result;
} else {
printf("Error or incomplete sequence\n");
break;
}
}
return 0;
}
Converting UTF-8 string: Converted 1 bytes, wide char code: 72 Converted 1 bytes, wide char code: 101 Converted 1 bytes, wide char code: 108 Converted 1 bytes, wide char code: 108 Converted 1 bytes, wide char code: 111 Converted 1 bytes, wide char code: 32 Converted 3 bytes, wide char code: 10003
Key Points
- Always initialize the
mbstate_tobject to zero before first use - Set appropriate locale using
setlocale()for proper multibyte support - Check return values to handle errors and incomplete sequences properly
- The function is thread-safe when each thread uses its own
mbstate_tobject
Conclusion
The mbrtowc() function provides a reliable way to convert multibyte character sequences to wide characters in C. It handles various encoding schemes and provides detailed feedback about the conversion process through its return values.
