Just another site

Archive for 十二月 2012

Thought this was cool: 有哪些比较基础的计算机书籍?

leave a comment »

谢邀,正好今年夏天的时候,在毕业生卖书的地摊上找到一本书,还挺好的,叫做《Computer Science Illuminated》(计算机科学概论,点亮你的计算机世界),作者是Nell Dale和John Lewis。



Laying the Groundwork(基础知识)

  • Chapter 1 The Big Picture(全景图),本书的层次,计算机和软件的历史。

The Information Layer(信息层面)

  • Chapter 2 Binary Values and Number Systems(二进制值和计数系统),二进制、八进制、十进制、十六进制的计算和转换。
  • Chapter 3 Data Representation(数据表示),模拟量和数字量,如何表示数据,数怎么表示,文字、声音、图像、视频怎么表示。

The Hardware Layer(硬件层面)

  • Chapter 4 Gates and Circuits(逻辑门和电路),门电路、晶体管、加法器、存储器的简单原理。
  • Chapter 5 Computing Components(计算部件),冯式结构、CPU指令周期、外存结构和非冯式结构。

The Programming Layer(编程层面)

  • Chapter 6 Problem Solving and Algorithm Design(问题解决和算法设计),设计简单算法、自顶向下、测试和面向对象。
  • Chapter 7 Low-Level Programming Languages(低级编程语言),机器语言和汇编。
  • Chapter 8 High-Level Programming Languages(高级编程语言),编译器和解释器、编程范式、函数式编程、常用的程序结构(IO、选择、循环、子程序、递归等)、类型系统。
  • Chapter 9 Abstract Data Types and Algorithms(抽象数据类型和算法),数组和链表、排序、二分查找、栈和队列、树。

The Operating System Layer(操作系统层面)

  • Chapter 10 Operating Systems (操作系统),操作系统的功能、内存管理、进程管理、CPU调度。
  • Chapter 11 File Systems and Directories(文件系统和目录),文件操作、目录树、磁盘结构。

The Application Layer(应用程序层面)

  • Chapter 12 Information Systems(信息系统),电子表格和数据库系统。
  • Chapter 13 Artificial Intelligence(人工智能),思考机器、知识表示、专家系统、神经网络、自然语言处理和机器人。
  • Chapter 14 Simulation and Other Application(模拟器和其他应用),模拟系统、CAD和嵌入式系统。

The Communication Layer(通信层面)

  • Chapter 15 Networks(网络),网络的结构和模式、网络协议和地址。
  • Chapter 16 The World Wide Web(万维网),使用网络(搜索引擎、即时通信等),HTML、交互式页面和XML。

In Conclusion(结论)

  • Chapter 17 Limitations of Computing(计算的局限)

— 完 —

下载知乎 iPhone 客户端:
from 知乎每日精选:

Written by cwyalpha

十二月 31, 2012 at 2:48 下午

发表在 Uncategorized

Thought this was cool: What is the difference between probability and statistics?

leave a comment »

In Statistics (mathematical science): Ben Golub added a question.

3 Answers

See question on Quora

from Ben Golub on Quora:

Written by cwyalpha

十二月 31, 2012 at 2:48 下午

发表在 Uncategorized

Thought this was cool: Simons Institute Big Data Program

leave a comment »

Michael Jordan sends the below:

The new Simons Institute for the Theory of Computing
will begin organizing semester-long programs starting in 2013.

One of our first programs, set for Fall 2013, will be on the “Theoretical Foundations
of Big Data Analysis”. The organizers of this program are Michael Jordan (chair),
Stephen Boyd, Peter Buehlmann, Ravi Kannan, Michael Mahoney, and Muthu Muthukrishnan.

See for more information on
the program.

The Simons Institute has created a number of “Research Fellowships” for young
researchers (within at most six years of the award of their PhD) who wish to
participate in Institute programs, including the Big Data program. Individuals
who already hold postdoctoral positions or who are junior faculty are welcome
to apply, as are finishing PhDs.

Please note that the application deadline is January 15, 2013. Further details
are available at .

Mike Jordan

from Machine Learning (Theory):

Written by cwyalpha

十二月 31, 2012 at 2:17 下午

发表在 Uncategorized

Thought this was cool: What is the funniest research paper you have ever read?

leave a comment »

“A Mathematical Model for the Determination of Total Area
Under Glucose Tolerance and Other Metabolic Curves”…

In 1993, a nutrition scientist at NYU claimed to have invented a novel and highly accurate method for determining the area under metabolic curves. She named the method after herself and dedicated it to her parents.

Tai’s model was developed to correct the deficiency of under- or overestimation of the total area under a metabolic curve. This formula also allows calculating the area under a curve with unequal units on the X-axis. The strategy of this mathematical model is to divide the total area under a curve into individual small segments such as squares, rectangles, and triangles, whose areas can be precisely determined according to existing geometric formulas. The area of the individual segments are then added to obtain the total area under the curve.

The paper was published in the peer-reviewed journal Diabetes Care, which is run by the American Diabetes Association and has a very respectable impact factor of 8.087. The paper currently has 173 citations, mostly from the diabetes literature.

Most people with high school math will realize that Tai is describing the trapezoidal rule, a classic method attributed to Newton, i.e. circa 1600s.

A year later, in response to what must have been significant amounts of backlash, Tai wrote an emotional letter to the journal defending herself:

While a doctoral candidate working on my dissertation at Columbia University in 1981, I needed to calculate total area under a curve. During a session with my statistical advisor, and after examining several alternative methods, I worked out the model in front of him. The concept behind it is obviously common sense, and one does not have to consult the trapezoid rule to figure it out. The trapezoid rule is really not Nobel Prize material, such as the double helix or jumping genes. I also used the formulas to calculate the areas of a square or a triangle without knowing whose rules were being followed. Fortunately, I do not have to answer that for you.

I never thought of publishing the model as a great discovery or accomplishment; it was not published until 14 years later, in 1994. Because of its accuracy and easy application, many colleagues at the Obesity Research Center of St Luke’s-Roosevelt Hospital Center and Columbia University began using it and addressed it as “Tai’s formula” to distinguish it from others. Later, because the investigators were unable to cite an unpublished work, I submitted it for publication at their requests. Therefore, my name was rubber-stamped on the model before its publication.

My intention in publishing the model is therefore to share, rather than to gain honor or glory with its publication, because there is none. Many other investigators probably thought about the same thing, but maybe they did not bother to follow up or produce a model (or the same model). You indicated that I probably did work this out on my own and I am grateful for your “probability,” because I did indeed do so with a witness present. Maybe I can address the model as my creation based on fact rather than your doubtful “probability.” Besides, if I do not address the model as “Tai’s,” other investigators who wish to cite it will.…

Read other related questions on Quora:

Read more answers on Quora.
from Quora:

Written by cwyalpha

十二月 31, 2012 at 1:56 下午

发表在 Uncategorized

Thought this was cool: 犹太人在教育上有什么值得我们学习和借鉴的地方?

leave a comment »


应该还有很多东西,只是打字打累了…不太会总结,不好意思…>_< 说白了就是兴趣最重要,一切的教育都应该更重质一点,最重要的还是让听者觉得很有趣,然后自发地去探讨这个方面的问题。

— 完 —

下载知乎 iPhone 客户端:
from 知乎每日精选:

Written by cwyalpha

十二月 31, 2012 at 12:24 下午

发表在 Uncategorized

Thought this was cool: 解读Cardinality Estimation算法(第二部分:Linear Counting)

leave a comment »


在这一篇文章中,我们讨论Linear Counting算法。


Linear Counting(以下简称LC)在1990年的一篇论文“A linear-time probabilistic counting algorithm for database applications”中被提出。作为一个早期的基数估计算法,LC在空间复杂度方面并不算优秀,实际上LC的空间复杂度与上文中简单bitmap方法是一样的(但是有个常数项级别的降低),都是O(N_{max}),因此目前很少单独使用LC。不过作为Adaptive Counting等算法的基础,研究一下LC还是比较有价值的。





















f(x) = {1 \over \sigma\sqrt{2\pi} }\,e^{- {{(x-\mu )^2 \over 2\sigma^2}}}










以上结论的推导在“A linear-time probabilistic counting algorithm for database applications”可以找到。





m > \frac{e^t-t-1}{(\epsilon t)^2}










m > 5(e^t-t-1)


m > \beta (e^t-t-1)

其中\beta = max(5, 1/(\epsilon t)^2)







这篇文章主要介绍了Linear Counting。LC算法虽然由于空间复杂度不够理想已经很少被单独使用,但是由于其在元素数量较少时表现非常优秀,因此常被用于弥补LogLog Counting在元素较少时误差较大的缺陷,实际上LC及其思想是组成HyperLogLog Counting和Adaptive Counting的一部分。

在下一篇文章中,我会介绍空间复杂度仅有O(log(log(N_{max})))的基数估计算法LogLog Counting。




IT 牛人博客聚合网站( 聚合
评论: 0
10000+ 本编程/Linux PDF/CHM 电子书下载

from IT牛人博客聚合网站:

Written by cwyalpha

十二月 31, 2012 at 12:24 下午

发表在 Uncategorized

Thought this was cool: Simon: Open-Source Speech Recognition: Simon 0.4.0

leave a comment »

Comments: “Simon: Open-Source Speech Recognition: Simon 0.4.0”


After years of hard work, the Simon team is proud to announce the new major release: Simon 0.4.0.

New in Simon 0.4

This new version of the open source speech recognition system Simon features a whole new recognition layer, context-awareness for improved accuracy and performance, a dialog system able to hold whole conversations with the user and more.

Revisiting Usability

A lot of work has gone into making Simon easier to use – both for existing and new users.

Perhaps most visibly, the main window of Simon has been reorganized to bring the most important options together in one screen.

Moreover, the newly introduced Simon base model format (.sbm) and the integration of a GHNS online repository of base models have removed the last big hurdle of the initial configuration.
One can now easily go from a fresh installation to a working setup in less than 5 minutes without any preparation. Don’t believe me? Check out the quick start below!

Simon 0.4.0: Quick Start

Many other, smaller changes sum up to one simple but important difference: Simon will overall require less user interaction while achieving more.


One of the major internal changes of Simon 0.4 is of course the included support for the BSD licensed CMU SPHINX. While we still also maintain full support for HTK and Julius, new models compiled with Simon will default to the SPHINX backend and the (proprietary) HTK is no longer required to build user-generated models.
Best of all: Simon will select the correct backend for your configuration transparently and automatically.


A major problem of open source speech recognition has always been the lack of freely available high quality speech models.

The Voxforge project has been working for years towards GPL acoustic models for a variety of languages. While their models are certainly not yet perfect, they offer a promising starting point.
The English Voxforge model is of course available as a Simon base model and can be downloaded and imported with Simon.

Additionally, starting with Simon 0.4, users will also have the option to contribute their gathered Simon training samples directly to the Voxforge server.
These recordings will then be used to train and improve the general acoustic models.


There is a simple rule of thumb in speech recognition: The smaller the application domain, the better the recognition accuracy. This was always one of the core principles of Simon.
In Simon 0.4, however, we went one step further: Simon can now re-configure itself on-the-fly as the current situation changes. Through so called “context conditions” Simon 0.4 can automatically activate and deactivate selected scenarios, microphones and even parts of your training corpus.

For example: Why listen for “Close tab” when your browser isn’t even open? Or why listen for anything at all when you’re actually in the next room listening to music? Yes, Simon is watching you.

Dialog System

Simon 0.4.0 also ships with the new dialog system featuring scripted variables (Javascript), integration with Plasma data engines, a templating system and – of course – text-to-speech output.


For users of KDE’s plasma workspace, we now provide the “Simonoid” plasmoid to start and monitor Simon – including the current recording volume.

The screenshot above shows two instances of the plasmoid: One added to the panel and another one to the desktop.

… and everything else

Please don’t be foold to think that the above is a complete list of all improvements. For example, we also have a new sample review tool called Afaras, integration with the Sequitur grapheme to phoneme framework, an Akonadi command plugin and many, many other noteworthy changes.
You’ll have to try out Simon to see for yourself!


To install Simon 0.4.0, you can either compile the official source tarball, install a binary package provider by our Linux distribution or use the installer for Windows.

If you are a packager and would like to package Simon 0.4, please do get in touch with us. Thank you.

from Hacker News 50:

Written by cwyalpha

十二月 31, 2012 at 11:23 上午

发表在 Uncategorized