Logo

dev-resources.site

for different kinds of informations.

Reading UTF-8 char by char in C

Published at
12/28/2024
Categories
c
utf8
Author
tallesl
Categories
2 categories in total
c
open
utf8
open
Author
7 person written this
tallesl
open
Reading UTF-8 char by char in C

Using wchar_t didn't quite worked out in my tests, so handling it on my own:

#include <stdint.h>
#include <stdio.h>
#include <stdlib.h>

// https://stackoverflow.com/a/44776334
int8_t utf8_length(char c) {
    // 4-byte character (11110XXX)
    if ((c & 0b11111000) == 0b11110000)
        return 4;

    // 3-byte character (1110XXXX)
    if ((c & 0b11110000) == 0b11100000)
        return 3;

    // 2-byte character (110XXXXX)
    if ((c & 0b11100000) == 0b11000000)
        return 2;

    // 1-byte ASCII character (0XXXXXXX)
    if ((c & 0b10000000) == 0b00000000)
        return 1;

    // Probably a 10XXXXXXX continuation byte
    return -1;
}

void main ()
{

    const char* filepath = "example.txt";

    FILE* file = fopen(filepath, "r");

    if (!file) {
        perror(filepath);
        exit(1);
    }

    char c;

    for(;;) {

        c = getc(file);

        if (c == EOF)
            break;

        putc(c, stdout);

        int8_t length = utf8_length(c);

        while (--length) {
            c = getc(file);
            putc(c, stdout);
        }

        getchar();
    }

    fclose (file);
}
Enter fullscreen mode Exit fullscreen mode

And here's my test file:

Hello, World! ๐ŸŒ๐Ÿš€
Hello
ยกHola!
ร‡a va?
ไฝ ๅฅฝ
ใ“ใ‚“ใซใกใฏ
์•ˆ๋…•ํ•˜์„ธ์š”
ยฉยฎโ„ขโœ“โœ—
๐Ÿ˜„๐Ÿ˜ข๐Ÿ˜Ž๐Ÿ”ฅโœจ
โ‚ฌ๐ˆ๐’€ญ
Enter fullscreen mode Exit fullscreen mode
c Article's
30 articles in total
Favicon
Top 5 Backend Programming Languages to Learn in 2024
Favicon
Week 2: Diving Deeper into Dynamic Memory, Structures, and Beyond in C Programming
Favicon
The 10 fastest programming languages in the world
Favicon
As 10 Linguagens de Programaรงรฃo mais velozes do mundo
Favicon
This turned out to be my best-performing technical article. Unfortunately I do not have the time to write more like it.
Favicon
Parsing command-line arguments in C
Favicon
Develop a weather application code using c language with date
Favicon
Working with Matter Team Membership Using the IntApp Walls API
Favicon
Reading UTF-8 char by char in C
Favicon
[Rust Self-Study] 1.0. Intro
Favicon
Gone back to learn C Programming 23 years later.
Favicon
Data access in code, using repositories, even with ORMs
Favicon
MockManager in unit tests - a builder pattern used for mocks
Favicon
Explaining donut like 5 years old Part-4 (Last)
Favicon
Explaining donut like 5 years old Part-3
Favicon
How to 100% CPU
Favicon
Explaining donut like 5 years old Part-3
Favicon
How do you print in c language?
Favicon
Tester c'est tricher, compiler c'est douter
Favicon
Pointers in C Programming - Lay Man's Analogy
Favicon
LogInsight
Favicon
Rust in Systems Programming: Why Devs Are Choosing Rust Over C and C++
Favicon
Discover File Splitter & Merger: A Revolutionary Tool for Managing Large Files
Favicon
Unused variables in C/C++: why and how?
Favicon
Jas - My x64 assembler
Favicon
How Does Deep Learning Work? Can You Write Simple Deep Learning Code at Home?
Favicon
Day 1 : Introduction of DSA
Favicon
Cybersecurity: The Shielding the Virtual Universe
Favicon
OKMX8MP-C GDB Remote Debugging Skills
Favicon
Publishing My First AUR Package: CPIG

Featured ones: