Differences between revisions 3 and 4
Revision 3 as of 2010-09-12 06:30:27
Size: 2505
Editor: AES-Static-002
Comment:
Revision 4 as of 2010-09-12 06:33:09
Size: 2493
Editor: AES-Static-002
Comment:
Deletions are marked like this. Additions are marked like this.
Line 9: Line 9:
 # Allow ZWNJ in breaking a cursive connection : That is, in the context based on the Joining_Type property, consisting of:  * Allow ZWNJ in breaking a cursive connection : That is, in the context based on the Joining_Type property, consisting of:
Line 11: Line 11:
   * A Left-Joining or Dual-Joining character, followed by zero or more Transparent characters, followed by a ZWNJ, followed by zero or more Transparent characters, followed by a Right-Joining or Dual-Joining character
    * This corresponds to the following regular expression (in Perl-style syntax): /$LJ $T* ZWNJ $T* $RJ/
  * A Left-Joining or Dual-Joining character, followed by zero or more Transparent characters, followed by a ZWNJ, followed by zero or more Transparent characters, followed by a Right-Joining or Dual-Joining character
  * This corresponds to the following regular expression (in Perl-style syntax): /$LJ $T* ZWNJ $T* $RJ/
Line 21: Line 21:
 # Allow ZWNJ in a conjunct context. That is, a sequence of the form:  * Allow ZWNJ in a conjunct context. That is, a sequence of the form:
Line 23: Line 23:
  * A Letter, followed by a Virama, followed by a ZWNJ
  * This corresponds to the following regular expression (in Perl-style syntax): /$L $V ZWNJ/
   * A Letter, followed by a Virama, followed by a ZWNJ
   * This corresponds to the following regular expression (in Perl-style syntax): /$L $V ZWNJ/
Line 31: Line 31:
 # Allow ZWJ in a conjunct context. That is, a sequence of the form:  * Allow ZWJ in a conjunct context. That is, a sequence of the form:
Line 33: Line 33:
   * A Letter, followed by a Virama, followed by a ZWJ
   * This corresponds to the following regular expression (in Perl-style syntax): /$L $V ZWJ/
where:
  * A Letter, followed by a Virama, followed by a ZWJ
  * This corresponds to the following regular expression (in Perl-style syntax): /$L $V ZWJ/ where:

Inroduction

Python 3 Supports Non-ASCII Identifiers as per PEP 3131. But this support is incomplete for certain languages where special characters such as ZWJ, ZWNJ are used extensively. Example for such languages are Malayalam, Kannada, Sinhala, Farsi etc.

Unicode standard on Using ZWJ/ZWNJ etc in Identifiers

ZWJ and ZWNJ are format control characters and unicode defines the usage of these characters in identifiers in TR31 in section 2.3 Layout and Format Control Characters

Unicode recommends allowing usage of ZWJ/ZWNJ or "the Join_Control characters" in Identifiers limited to 3 contexts.

  • Allow ZWNJ in breaking a cursive connection : That is, in the context based on the Joining_Type property, consisting of:
    • A Left-Joining or Dual-Joining character, followed by zero or more Transparent characters, followed by a ZWNJ, followed by zero or more Transparent characters, followed by a Right-Joining or Dual-Joining character
    • This corresponds to the following regular expression (in Perl-style syntax): /$LJ $T* ZWNJ $T* $RJ/
      • where:
        • $T = [:Joining_Type=Transparent:] $RJ = [ [:Joining_Type=Dual_Joining:][:Joining_Type=Right_Joining:] ] $LJ = [ [:Joining_Type=Dual_Joining:][:Joining_Type=Left_Joining:] ]
  • Allow ZWNJ in a conjunct context. That is, a sequence of the form:
    • A Letter, followed by a Virama, followed by a ZWNJ
    • This corresponds to the following regular expression (in Perl-style syntax): /$L $V ZWNJ/
      • where:
        • $L = [:General_Category=Letter:] $V = [:Canonical_Combining_Class=Virama:]
  • Allow ZWJ in a conjunct context. That is, a sequence of the form:
    • A Letter, followed by a Virama, followed by a ZWJ
    • This corresponds to the following regular expression (in Perl-style syntax): /$L $V ZWJ/ where:
      • $L= [:General_Category=Letter:] $V = [:Canonical_Combining_Class=Virama:]

Affected Languages

  • Malayalam
  • Kannada
  • Bengali
  • Farsi
  • Sinhala

References

ZwjAndZwnjAsIdentifiers (last edited 2010-09-21 03:52:17 by BaijuMuthukadan)

Unable to edit the page? See the FrontPage for instructions.