Stringex::Unidecoder

Constants

CODEPOINTS

Contains Unicode codepoints, loading as needed from YAML files

LOCAL_CODEPOINTS

Public Class Methods

decode(string) click to toggle source

Returns string with its UTF-8 characters transliterated to ASCII ones

You’re probably better off just using the added String#to_ascii

    # File lib/stringex/unidecoder.rb, line 16
16:       def decode(string)
17:         string.gsub(/[^\x00-\x7f]/) do |codepoint|
18:           if localized = local_codepoint(codepoint)
19:             localized
20:           else
21:             begin
22:               unpacked = codepoint.unpack("U")[0]
23:               CODEPOINTS[code_group(unpacked)][grouped_point(unpacked)]
24:             rescue
25:               # Hopefully this won't come up much
26:               # TODO: Make this note something to the user that is reportable to me perhaps
27:               "?"
28:             end
29:           end
30:         end
31:       end
default_locale() click to toggle source

Returns default locale for localized transliterations. NOTE: Will set @locale as well.

    # File lib/stringex/unidecoder.rb, line 72
72:       def default_locale
73:         @default_locale ||= "en"
74:         @locale = @default_locale
75:       end
default_locale=(new_locale) click to toggle source

Sets the default locale for localized transliterations. NOTE: Will set @locale as well.

    # File lib/stringex/unidecoder.rb, line 78
78:       def default_locale=(new_locale)
79:         @default_locale = new_locale
80:         # Seems logical that @locale should be the new default
81:         @locale = new_locale
82:       end
encode(codepoint) click to toggle source

Returns character for the given Unicode codepoint

    # File lib/stringex/unidecoder.rb, line 34
34:       def encode(codepoint)
35:         ["0x#{codepoint}".to_i(16)].pack("U")
36:       end
in_yaml_file(character) click to toggle source

Returns string indicating which file (and line) contains the transliteration value for the character

    # File lib/stringex/unidecoder.rb, line 40
40:       def in_yaml_file(character)
41:         unpacked = character.unpack("U")[0]
42:         "#{code_group(unpacked)}.yml (line #{grouped_point(unpacked) + 2})"
43:       end
local_codepoint(codepoint) click to toggle source

Returns the localized transliteration for a codepoint

    # File lib/stringex/unidecoder.rb, line 85
85:       def local_codepoint(codepoint)
86:         locale_hash = LOCAL_CODEPOINTS[locale] || LOCAL_CODEPOINTS[locale.is_a?(Symbol) ? locale.to_s : locale.to_sym]
87:         locale_hash && locale_hash[codepoint]
88:       end
locale() click to toggle source

Returns locale for localized transliterations

    # File lib/stringex/unidecoder.rb, line 56
56:       def locale
57:         if @locale
58:           @locale
59:         elsif defined?(I18n)
60:           I18n.locale
61:         else
62:           default_locale
63:         end
64:       end
locale=(new_locale) click to toggle source

Sets locale for localized transliterations

    # File lib/stringex/unidecoder.rb, line 67
67:       def locale=(new_locale)
68:         @locale = new_locale
69:       end
localize_from(hash_or_path_to_file) click to toggle source

Adds localized transliterations to Unidecoder

    # File lib/stringex/unidecoder.rb, line 46
46:       def localize_from(hash_or_path_to_file)
47:         hash = if hash_or_path_to_file.is_a?(Hash)
48:           hash_or_path_to_file
49:         else
50:           YAML.load_file(hash_or_path_to_file)
51:         end
52:         verify_local_codepoints hash
53:       end
with_default_locale(&block) click to toggle source

Runs a block with default locale

     # File lib/stringex/unidecoder.rb, line 100
100:       def with_default_locale(&block)
101:         with_locale default_locale, &block
102:       end
with_locale(new_locale, &block) click to toggle source

Runs a block with a temporary locale setting, returning the locale to the original state when complete

    # File lib/stringex/unidecoder.rb, line 91
91:       def with_locale(new_locale, &block)
92:         new_locale = default_locale if new_locale == :default
93:         original_locale = locale
94:         self.locale = new_locale
95:         block.call
96:         self.locale = original_locale
97:       end

Private Class Methods

code_group(unpacked_character) click to toggle source

Returns the Unicode codepoint grouping for the given character

     # File lib/stringex/unidecoder.rb, line 106
106:       def code_group(unpacked_character)
107:         "x%02x" % (unpacked_character >> 8)
108:       end
grouped_point(unpacked_character) click to toggle source

Returns the index of the given character in the YAML file for its codepoint group

     # File lib/stringex/unidecoder.rb, line 111
111:       def grouped_point(unpacked_character)
112:         unpacked_character & 255
113:       end
verify_local_codepoints(hash) click to toggle source

Checks LOCAL_CODEPOINTS’s Hash is in the format we expect before assigning it and raises instructive exception if not

     # File lib/stringex/unidecoder.rb, line 117
117:       def verify_local_codepoints(hash)
118:         pass_check = hash.is_a?(Hash) && hash.all?{|key, value|
119:           # Fuck a duck, eh?
120:           [Symbol, String].include?(key.class) && value.is_a?(Hash) &&
121:             value.keys.all?{|k| k.is_a?(String)} && value.values.all?{|v| v.is_a?(String)}
122:         }
123:         if pass_check
124:           hash.each do |k, v|
125:             LOCAL_CODEPOINTS[k] = v
126:           end
127:         else
128:           raise ArgumentError, "LOCAL_CODEPOINTS is not correctly defined. Please see the README for more information on how to correctly format this data."
129:         end
130:       end

Disabled; run with --debug to generate this.

[Validate]

Generated with the Darkfish Rdoc Generator 1.1.6.