2
1
0

Hi Freaks

In the sphere von ERiC/Elster there is the need to replace some characters with "simple" ones.
e.g.   A-acute:   Á → A
Unicode provides us with a hughe database of codepoints and there normalization.
In UnifAce  we have $string() to de-esacpe  "&#nnnn;"
But this is only one side of the coin (smile)
How "normalize"/"simplifie" all codepoints in a given string
So, is there a $anti_string() to get the codepoint-number from a given character?
Or need I to loop over all codepoints to find the character?

; not really UnifAce, but pseudo-Uniface
FOR all v_char in v_string
IF(v_char not allowed)
FOR all codepoints
IF(v_char==$string("&#%%v_codepoint%%%;"))
get the decomposing string
  replace charater with the first char in decompose string
ENDIF
ENDFOR
ENDIF
ENDFOR

Any other ideas to do this task?
And C++ is not my favorite option.
UnifAce ist a program language wich should handle this kind of things too (smile)

Ingo







    CommentAdd your comment...

    5 answers

    1.  
      1
      0
      -1

      For sure UnifAce holds all necessary tables and algorithm (smile)
      But as UnifAce ist a 4GL, this information is hidden to us users (sad)

      I did already load ftp://ftp.unicode.org/Public/UNIDATA/UnicodeData.txt into our database (for testing purpose) but this ist not really a solution to my problem.

      In the past (over twenty years ago) we did a this simple way by definig a string with the replacable charaters, but only a few one. In modern times, there is the need to handle much more characters.
      Background to all this is ERiC/Elster (https://www.elster.de/) where one have to transfer tax relevant information. The exchange format is XML with uniocde but only a limited number of characters.
      ERiC will check the XML against a XSD and throughs an error if there are not allowed characters.
      But unfortunatley don't show your which sentence holds the character in question. So it's imposible to show the enduser the person-, city- or street-name which he/she has to replace.
      So there is a need to check in our application wether a charactare is valid and not.
      And if there a few thousand sentence written into the XML, it would  be a better solution to replace such characters before filling the XML.

      Example

      "William à Beckett" is CP1252 but not in the charset of Elster.

      So replace the "à" by "a"

      "William a Beckett"  could send via ERiC and should be understood by the tax office (smile)

      very easy task, isn't it ... (smile)

      Ingo

        CommentAdd your comment...
      1.  
        1
        0
        -1

        Hi Ingo,

        this C/C++ library has unicode normalization implemented.
        I believe the current unicode library, already embedded into Uniface, has probably a similar functionality.

        Let's wait for an answer from ULab.

        Gianni

          CommentAdd your comment...
        1.  
          1
          0
          -1

          Yes, that is the solution that I thought about too, but ...
          But  replace_chars  are a few hundreds, have a look in the unicode definitions (smile)

          So my idea was to loop over the chars in the string and then do a lookup in the unicode table to get the decomposition. And the key into this lookup table is a string like "U+00CC" or "00CC"
          So there is the need to get the uniocde code point out of a character in UnifAce.

          Maybe I have to write the characters itself into a database table ...

          Ingo


            CommentAdd your comment...
          1.  
            1
            0
            -1

            Same idea as Iain...but he was faster!

            It should be checked if the
                  if($scan(string,tester) > 0)
                  endif
            could be stripped out.

            Gianni


              CommentAdd your comment...
            1.  
              1
              0
              -1

              Umm, 

              have a list of 'decomposable' characters. 

              Á=A.;á=a etc. 


              forlist /id tester,replacer in replace_chars
              if($scan(string,tester) > 0)
              	v_string = $replace(v_string, 1, tester, replacer,-1)
              endif
              endfor



              Weigh this against the length of the string. If your data strings are 'shorter' than the list of replaceable characters, the other approach is (likely) faster.


              Iain

                CommentAdd your comment...