Uniface on GitHub
Product (releases and patches)
Reported Issues (old)
Fixes and Updates
In the sphere von ERiC/Elster there is the need to replace some characters with "simple" ones.e.g. A-acute: Á → AUnicode provides us with a hughe database of codepoints and there normalization.In UnifAce we have $string() to de-esacpe "&#nnnn;" But this is only one side of the coin How "normalize"/"simplifie" all codepoints in a given stringSo, is there a $anti_string() to get the codepoint-number from a given character?Or need I to loop over all codepoints to find the character?
; not really UnifAce, but pseudo-Uniface FOR all v_char in v_string IF(v_char not allowed) FOR all codepoints IF(v_char==$string("&#%%v_codepoint%%%;")) get the decomposing string replace charater with the first char in decompose string ENDIF ENDFOR ENDIF ENDFOR
Any other ideas to do this task?And C++ is not my favorite option.UnifAce ist a program language wich should handle this kind of things too Ingo
For sure UnifAce holds all necessary tables and algorithm But as UnifAce ist a 4GL, this information is hidden to us users
I did already load ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt into our database (for testing purpose) but this ist not really a solution to my problem.In the past (over twenty years ago) we did a this simple way by definig a string with the replacable charaters, but only a few one. In modern times, there is the need to handle much more characters.Background to all this is ERiC/Elster (https://www.elster.de/) where one have to transfer tax relevant information. The exchange format is XML with uniocde but only a limited number of characters.ERiC will check the XML against a XSD and throughs an error if there are not allowed characters.But unfortunatley don't show your which sentence holds the character in question. So it's imposible to show the enduser the person-, city- or street-name which he/she has to replace.So there is a need to check in our application wether a charactare is valid and not.And if there a few thousand sentence written into the XML, it would be a better solution to replace such characters before filling the XML.
"William à Beckett" is CP1252 but not in the charset of Elster.
So replace the "à" by "a"
"William a Beckett" could send via ERiC and should be understood by the tax office very easy task, isn't it ... Ingo
this C/C++ library has unicode normalization implemented.I believe the current unicode library, already embedded into Uniface, has probably a similar functionality.
Let's wait for an answer from ULab.
Yes, that is the solution that I thought about too, but ...But replace_chars are a few hundreds, have a look in the unicode definitions
So my idea was to loop over the chars in the string and then do a lookup in the unicode table to get the decomposition. And the key into this lookup table is a string like "U+00CC" or "00CC"So there is the need to get the uniocde code point out of a character in UnifAce.
Maybe I have to write the characters itself into a database table ...
Same idea as Iain...but he was faster!
It should be checked if the if($scan(string,tester) > 0) endifcould be stripped out.
have a list of 'decomposable' characters.
forlist /id tester,replacer in replace_chars
if($scan(string,tester) > 0)
v_string = $replace(v_string, 1, tester, replacer,-1)
Weigh this against the length of the string. If your data strings are 'shorter' than the list of replaceable characters, the other approach is (likely) faster.
© 2020 Uniface Privacy & Cookies | Privacy Statement | Legal