diff options
Diffstat (limited to 'docs/docbook/projdoc/unicode.xml')
-rw-r--r-- | docs/docbook/projdoc/unicode.xml | 177 |
1 files changed, 0 insertions, 177 deletions
diff --git a/docs/docbook/projdoc/unicode.xml b/docs/docbook/projdoc/unicode.xml deleted file mode 100644 index 699f29f1bae..00000000000 --- a/docs/docbook/projdoc/unicode.xml +++ /dev/null @@ -1,177 +0,0 @@ -<chapter id="unicode"> -<chapterinfo> - &author.jelmer; - &author.jht; - <author> - <firstname>TAKAHASHI</firstname><surname>Motonobu</surname> - <affiliation> - <address><email>monyo@home.monyo.com</email></address> - </affiliation> - </author> - <pubdate>25 March 2003</pubdate> -</chapterinfo> - -<title>Unicode/Charsets</title> - -<sect1> -<title>Features and Benefits</title> - -<para> -Every industry eventually matures. One of the great areas of maturation is in -the focus that has been given over the past decade to make it possible for anyone -anywhere to use a computer. It has not always been that way, in fact, not so long -ago it was common for software to be written for exclusive use in the country of -origin. -</para> - -<para> -Of all the effort that has been brought to bear on providing native language support -for all computer users, the efforts of the <ulink url="http://www.openi18n.org/">Openi18n organization</ulink> is deserving of -special mention. -</para> - -<para> -Samba-2.x supported a single locale through a mechanism called -<emphasis>codepages</emphasis>. Samba-3 is destined to become a truly trans-global -file and printer-sharing platform. -</para> - -</sect1> - -<sect1> -<title>What Are Charsets and Unicode?</title> - -<para> -Computers communicate in numbers. In texts, each number will be -translated to a corresponding letter. The meaning that will be assigned -to a certain number depends on the <emphasis>character set (charset) -</emphasis> that is used. -</para> - -<para> -A charset can be seen as a table that is used to translate numbers to -letters. Not all computers use the same charset (there are charsets -with German umlauts, Japanese characters, and so on). Usually a charset contains -256 characters, which means that storing a character with it takes -exactly one byte. </para> - -<para> -There are also charsets that support even more characters, -but those need twice as much storage space (or more). These -charsets can contain <command>256 * 256 = 65536</command> characters, which -is more than all possible characters one could think of. They are called -multibyte charsets because they use more then one byte to -store one character. -</para> - -<para> -A standardized multibyte charset is <ulink url="http://www.unicode.org/">unicode</ulink>. -A big advantage of using a multibyte charset is that you only need one; there -is no need to make sure two computers use the same charset when they are -communicating. -</para> - -<para>Old Windows clients use single-byte charsets, named -<parameter>codepages</parameter>, by Microsoft. However, there is no support for -negotiating the charset to be used in the SMB/CIFS protocol. Thus, you -have to make sure you are using the same charset when talking to an older client. -Newer clients (Windows NT, 200x, XP) talk unicode over the wire. -</para> -</sect1> - -<sect1> -<title>Samba and Charsets</title> - -<para> -As of Samba-3.0, Samba can (and will) talk unicode over the wire. Internally, -Samba knows of three kinds of character sets: -</para> - -<variablelist> - <varlistentry> - <term><smbconfoption><name>unix charset</name></smbconfoption></term> - <listitem><para> - This is the charset used internally by your operating system. - The default is <constant>UTF-8</constant>, which is fine for most - systems, which covers all characters in all languages. The default in previous Samba releases was <constant>ASCII</constant>. - </para></listitem> - </varlistentry> - - <varlistentry> - <term><smbconfoption><name>display charset</name></smbconfoption></term> - <listitem><para>This is the charset Samba will use to print messages - on your screen. It should generally be the same as the <parameter>unix charset</parameter>. - </para></listitem> - </varlistentry> - - <varlistentry> - <term><smbconfoption><name>dos charset</name></smbconfoption></term> - <listitem><para>This is the charset Samba uses when communicating with - DOS and Windows 9x/Me clients. It will talk unicode to all newer clients. - The default depends on the charsets you have installed on your system. - Run <command>testparm -v | grep "dos charset"</command> to see - what the default is on your system. - </para></listitem> - </varlistentry> -</variablelist> - -</sect1> - -<sect1> -<title>Conversion from Old Names</title> - -<para>Because previous Samba versions did not do any charset conversion, -characters in filenames are usually not correct in the UNIX charset but only -for the local charset used by the DOS/Windows clients.</para> - -</sect1> - -<sect1> -<title>Japanese Charsets</title> - -<para>Samba does not work correctly with Japanese charsets yet. Here are -points of attention when setting it up:</para> - -<itemizedlist> - - <listitem><para>You should set <smbconfoption><name>mangling method</name><value>hash</value></smbconfoption></para></listitem> - - <listitem><para>There are various iconv() implementations around and not - all of them work equally well. glibc2's iconv() has a critical problem - in CP932. libiconv-1.8 works with CP932 but still has some problems and - does not work with EUC-JP.</para></listitem> - - <listitem><para>You should set <smbconfoption><name>dos charset</name><value>CP932</value></smbconfoption>, not - Shift_JIS, SJIS.</para></listitem> - - <listitem><para>Currently only <smbconfoption><name>UNIX charset</name><value>CP932</value></smbconfoption> - will work (but still has some problems...) because of iconv() issues. - <smbconfoption><name>UNIX charset</name><value>EUC-JP</value></smbconfoption> does not work well because of - iconv() issues.</para></listitem> - - <listitem><para>Currently Samba-3.0 does not support <smbconfoption><name>UNIX charset</name><value>UTF8-MAC/CAP/HEX/JIS*</value></smbconfoption>.</para></listitem> - -</itemizedlist> - -<para>More information (in Japanese) is available at: <ulink noescape="1" url="http://www.atmarkit.co.jp/flinux/special/samba3/samba3a.html">http://www.atmarkit.co.jp/flinux/special/samba3/samba3a.html</ulink>.</para> - -</sect1> - -<sect1> - <title>Common Errors</title> - - <sect2> - <title>CP850.so Can't Be Found</title> - - <para><quote>Samba is complaining about a missing <filename>CP850.so</filename> file.</quote></para> - - <para><emphasis>Answer:</emphasis> CP850 is the default <smbconfoption><name>dos charset</name></smbconfoption>. - The <smbconfoption><name>dos charset</name></smbconfoption> is used to convert data to the codepage used by your dos clients. - If you do not have any dos clients, you can safely ignore this message. </para> - - <para>CP850 should be supported by your local iconv implementation. Make sure you have all the required packages installed. - If you compiled Samba from source, make sure to configure found iconv.</para> - </sect2> -</sect1> - -</chapter> |