Archive for the ‘Chinese’ Category

Shanghai Library adds 2 million records to WorldCat…

Tuesday, September 16th, 2014

Shanghai Library adds 2 million records to WorldCat to share its collection with the world Compiled by Ming POON, Josephine SCHE, and Mi Chu WIENS (November, 2004).

From the post:

Shanghai Library, the largest public library in China and one of the largest libraries in the world, has contributed 2 million holdings to WorldCat, including some 770,000 unique bibliographic records, to share its collection worldwide.

These records, which represent books and journals published between 1911 and 2013, were loaded in WorldCat earlier this year. The contribution from Shanghai Library, an OCLC member since 1996, enhances the richness and depth of Chinese materials in WorldCat as well as the discoverability of these collections around the world.

“We are pleased to add Shanghai Library’s holdings to WorldCat, which is the global union catalog of library collections,” said Dr. Jianzhong Wu, Director, Shanghai Library “Shanghai is a renowned, global city, and the library should be as well. With WorldCat, we not only raise the visibility of our collection to a global level but we also share our national heritage and identity with other libraries and their users through the OCLC WorldShare Interlibrary Loan service.”

“The leadership of Shanghai Library has a bold global vision,” says Andrew H. Wang, Vice President, OCLC Asia Pacific. “The addition of Shanghai Library’s holdings and unique records enriches coverage of the Chinese collection in WorldCat for researchers everywhere.”

I don’t have a feel for how many unique Chinese bibliographic records are online but 770,000 sounds like a healthy addition.

You may also be interested in: Online Resources for Chinese Studies in North American Libraries.

Given the compilation date, 2004, I ran the W3C Link Checker on http://www.loc.gov/rr/asian/china-bib/.

You can review the results at: http://www.durusau.net/publications/W3CLinkChecker:http:_www.loc.gov_rr_asian_china-bib_.html

Summary of results:

Code Occurrences What to do
(N/A) 6 The link was not checked due to robots exclusion rules. Check the link manually, and see also the link checker documentation on robots exclusion.
(N/A) 2 The hostname could not be resolved. Check the link for typos.
403 1 The link is forbidden! This needs fixing. Usual suspects: a missing index.html or Overview.html, or a missing ACL.
404 61 The link is broken. Double-check that you have not made any typo, or mistake in copy-pasting. If the link points to a resource that no longer exists, you may want to remove or fix the link.
500 5 This is a server side problem. Check the URI.

(emphasis added)

At a minimum, the broken links need to be corrected but updating the listing to include new resources would make a nice graduate student project.

I don’t have the background or language skills with Chinese resources to embark on such a project but would be happy to assist anyone who undertakes the task.

Python-ZPar – Python Wrapper for ZPAR

Monday, September 8th, 2014

Python-ZPar – Python Wrapper for ZPAR by Nitin Madnani.

From the webpage:

python-zpar is a python wrapper around the ZPar parser. ZPar was written by Yue Zhang while he was at Oxford University. According to its home page: ZPar is a statistical natural language parser, which performs syntactic analysis tasks including word segmentation, part-of-speech tagging and parsing. ZPar supports multiple languages and multiple grammar formalisms. ZPar has been most heavily developed for Chinese and English, while it provides generic support for other languages. ZPar is fast, processing above 50 sentences per second using the standard Penn Teebank (Wall Street Journal) data.

I wrote python-zpar since I needed a fast and efficient parser for my NLP work which is primarily done in Python and not C++. I wanted to be able to use this parser directly from Python without having to create a bunch of files and running them through subprocesses. python-zpar not only provides a simply python wrapper but also provides an XML-RPC ZPar server to make batch-processing of large files easier.

python-zpar uses ctypes, a very cool foreign function library bundled with Python that allows calling functions in C DLLs or shared libraries directly.

Just in case you are looking for a language parser for Chinese or English.

It is only a matter of time before commercial opportunities are going to force greater attention on non-English languages. Forewarned is forearmed.

Biscriptal juxtaposition in Chinese

Tuesday, August 26th, 2014

Biscriptal juxtaposition in Chinese by Victor Mair.

From the post:

We have often seen how the Roman alphabet is creeping into Chinese writing, both for expressing English words and morphemes that have been borrowed into Chinese, but also increasingly for writing Mandarin and other varieties of Chinese in Pinyin (spelling). Here are just a few earlier Language Log posts dealing with this phenomenon:

“A New Morpheme in Mandarin” (4/26/11)

“Zhao C: a Man Who Lost His Name” (2/27/09)

“Creeping Romanization in Chinese” (8/30/12)

Now an even more intricate application of alphabetic usage is developing in internet writing, namely, the juxtaposition and intertwining of simultaneous phrases with contrasting meaning.

Highly entertaining post on the complexities of evolving language usage.

The sort of usage that hasn’t made it into a dictionary, yet, but still needs to be captured and shared.

Sam Hunting brought this to my attention.

Challenges of Chinese Natural Language Processing

Sunday, March 11th, 2012

Thinkudo Labs is posting a series on Chinese natural language processing.

I will be gathering those posts here for ease of reference.

Challenges of Chinese Natural Language Processing – Segmentation

Challenges of Chinese Natural Language Processing – Homograph
(If you are betting this was the post that caught my attention, you are right in one.)

You will need native Chinese speaker assistance for serious Chinese language processing but understanding some of the issues ahead of time won’t hurt.