Implementing Unicode SMS support

Hi, CircuitMess team! I was checking the source code to see what itā€™d take to implement 16-bit SMS encoding support (I get a bunch of texts in cyrillic, they come out as asterisks, and by the time they are saved into a json on the SD card, they are already irreversibly mis-parsed). Looks like you guys donā€™t use SIM800 native text mode and instead have your own PDU decoder. However, I wasnā€™t able to figure out exactly which module I need to modify, because these two seem to have some overlap in functionality (plus the arduino_pdu_decoder one seems quite a bit different from the version elsewhere on GitHub):


Could you give me some pointers?
Also, Iā€™m guessing after I figure out this part, Iā€™ll also need to figure out the fonts somewhere?..

1 Like

Greetings,

Iā€™m just going to get straight to the point.
The SIM800 PDU parser works fine for UTF8 characters, but if you have a Unicode character in the SMS, the SIM800 module reverts back to reporting the raw SMS PDU code. We needed to make our own custom PDU parsing functions, mostly by modifying this projectā€™s code.

We refrained from using UCS2 (Unicode) encoding in our code, mostly because of the font limitations.

A nice place to start would be the MAKERphone::pduDecode function, mostly the parts where weā€™re decoding the PDU in a different way depending on the codingScheme flag (it shows what coding scheme the SMS is written in).

To save a Unicode SMS you should take a look at the block of code starting from line 6473 in MAKERphone.cpp. There we purposefully refrained from saving any Unicode chars and replacing them with asterisks.

The font, on the other hand, seems like a much more limiting factor, but there should be some Cyrillic fonts somewhere on the web.

Thanks for taking an interest in the project. I hope this gives you an idea of where to start.

Cheers

Sorry for disappearing, only recently got the time to look into this. Thanks for the write-up, with it I managed to at least successfully save incoming unicode messages to messages.json to read on PC.
Now, the font part is indeed trickier. To minimize changes, Iā€™d prefer to just generate a replacement for Font7.c with more characters. However, Adafuitā€™s fontconverter doesnā€™t accept multiple subsets. Without understanding the structure of .c/.h font I canā€™t make a reasonably small file that would cover both Latin and Cyrillic characters. On the other hand, taking all of the first 1327 characters (all Latin, Greek and Cyrillic blocks) nets a 50kb+ file. LittleVGLā€™s converter allows multiple subsets, but the output format seems different from Adafruitā€™s, and again, I donā€™t understand these files enough to tweak manually. Any help?