Warning: fopen(/www/unicode/UnknownUserAgents.log) [function.fopen]: failed to open stream: No such file or directory in /home/jpalonus/public_html/unicode/common.inc on line 209
COULD NOT OPEN LOG FILENP Unicode character table
Home

 

Interactive Unicode character table & reference

Unicode block:

Display characters in this font:
   Bold Italic

 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
<-- Click a character The character you clicked:
 
 
 
 

Click on the entity name or number above to copy it to the clipboard. Select the entity name or number above so you can copy it to the clipboard.
(Firefox doesn't allow you to copy text to the clipboard automatically from a button, so you must do it manually.)

Recent characters I've viewed:
Recent characters I've used:
People who used  also used:
Most popular characters used:
Most recent characters used:
The data in these tables were compiled from the NamesList.txt file provided by Unicode, Inc.
Copyright © 1991-2007 Unicode, Inc. All rights reserved. Distributed under the Terms of Use in http://www.unicode.org/copyright.html.

(Apr '08) This interactive Unicode character table provides an interactive view of Unicode's NamesList.txt file. This is an annotated listing of every printable character in every block that Unicode defines. (55,633 characters in all.) Every character is displayed as a button. This means that when an annotation references another character, you can click on it as well. If that character is in a different block, that block is loaded in when you click on it.

This was an outgrowth of my original CharTable project.

Unicode provides several files that are considered to be normative, machine-readable lists of the block names, sections, & characters. The NamesList.txt file isn't supposed to be treated like one. But this is the only file I've found that includes the extensive annotations for cross-references, similar characters, alternate names, and alternate ways of constructing some characters by combining others. (If there is a normative file of all the annotations that are in NamesList.txt, I'd like to know where it is.)

This project has three main components that perform these tasks: Parsing the NamesList.txt file, serving up the initial webpage, and serving up new blocks of characters & statistics.

Parsing NamesList.txt is a one-time operation. It populates several tables with the names, codes, & annotations for the Unicode blocks, the sections within the blocks, and the characters themselves.

index.php reads in a templated HTML file and mostly inserts the user's current font preferences, and also fills out the listbox with the Unicode block names as found in the DB.

When the page is loaded, it makes an AHAH call to the server to fill out the initial Unicode block. This is like AJAX, except we return a block of fully-formed HTML instead of XML. I do this for two main reasons:

  • For each block that gets requested, the data itself is static. It's the same data no matter how many times it gets chosen.
  • I want to be able to format the block's output according to the logic of the characters instead of simply showing the characters in regular rows in the order of their codes. For example, in the "C0 Controls and Basic Latin" block (standard ASCII characters 00-7F), A-Z and a-z should show up as two rows of 26 characters (like they do in the original character table project) even though these two groups aren't contiguous to each other in the code-space. Likewise, the various punctuation characters are scattered in between the letters & numbers, even though they should be grouped together. My plan is to gradually hand-format the most popular blocks like I did with the original character table. So I have to serve up pre-formatted HTML to the client.

I accomplish AHAH by simply sending back XML, with the HTML data enclosed inside an CDATA section. This way I get the best of both worlds: I can use the same callback function to process the statistical information as XML and to receive the pre-formatted HTML block. Works like a champ.

"People who chose this character also chose..."
Like the original character table, this project uses a brute force algorithm for tallying this "click similarity" statistic. When a user copies a character to the clipboard, I search their history for the last 10 characters they've copied that are different than this character. For each previous character, I increment a hit counter in a record in the "chars_x_chars" table: For each previous character, a record in the table means that the current character is related to the previous character. Think of a 55,336x55,336 matrix - the rows represent each possible "this" char, the columns each possible "previous" char. Each cell of the matrix would be a single combination of two characters that at least one person has copied at some time.

This implies that the algorithm uses memory at a rate of O(n2), where n=total number of characters (or SKUs if this was an ecommerce site). For this project, this means a 3 billion-element matrix of hit counters! However, the real-world memory requirements should be much less, for two reasons:

  1. In the original character table I had saved two records for each combination of characters, one for "this char is related to the previous char" and one for "the previous char is related to this char". But that's redundant. In other words, the resulting matrix of possible combinations is symmetric around the diagonal. So instead I save one record, setting the "this" char to whichever one has the smaller character code. In effect I'm only saving the bottom left half of the matrix. When I call up the click similarity for a character, I must make a slightly more complex query than before:

    SELECT * FROM chars_x_chars
        WHERE cxcThisChar='$nThisChar' || cxcOtherChar='$nThisChar'
        ORDER BY cxcNumHits DESC
        LIMIT 10

    This returns the 10 characters that are the most click-similar to $nThisChar with either higher or lower character codes than $nThisChar. So the theoretical max table size is now 1.5 billion records.

  2. Since we're storing this information in a database we only save the character combinations that have actually occurred at least once. We're effectively saving the matrix as a sparse array. So if most of the possible combinations of characters never get chosen by people, its impact on the table size should depend more on the total number of choices made instead of the universe of all possible character combinations. But as more people use the character table, more & more combinations will end up occurring at least once, and if the page becomes very popular the total size of the chars_x_chars table might conceivably approach a billion records, or at least hundreds of millions. I'm very curious to see what the real-world performance will be.

Combobox
As in the earlier character table project, I use my combobox class to let the user choose from a predefined list of popular web fonts while also being able to enter any font they wish to see.

Questions about a character? Can't find a font for your favorite language?

Post a question in the forum below