CWYAlpha

Just another WordPress.com site

Archive for 八月 2011

Thought this was cool: Onavo Lite 手機3G上網費率節費工具,監控封鎖數據流量

leave a comment »


Onavo Lite -01Onavo Lite -02

不管是申請快要變成吃不飽的吃到飽上網專案,或者是使用有限制的行動上網資費方案,還是說出國旅遊時必須使用有限制的漫遊,我們應該都很關心自己使用手機(行動裝置)上網時,到底吃掉了多少數據流量?

而「Onavo」就是一款可以監控上網數據傳輸量的手機App,之前在iPhone推出時獲得很大的關注,因為它號稱還可以幫你壓縮流量,真正的幫你省錢!而今天看到「Onavo Lite for Android」版本推出,還沒有加入壓縮數據封包的功能,但已經可以幫Android手機用戶監控流量並即時禁止。

是的,這款Android上的Onavo Lite,除了監控統計上網流量外,更重要的就是可以設定「流量上限」,並且在快要超出使用量時自動禁止3G上網,或者可以讓你單獨禁止某個App的3G傳輸,從而達到節費的目的。

 

  • 設定每月手機上網資費方案與數據量上限

下面我們就以最新的Android版本做介紹。Onavo Lite使用上非常簡單,安裝這款App後,首先我們要設定自己的3G上網費率方案(這裡一定要設定)。

在設定中的「Set data plan」,可以設定基礎費率方案,這裡可以自訂每月的最大數據傳輸上限,並設定費率金額等資訊。

Onavo Lite -03Onavo Lite -04

 

  • 設定漫遊費率與數據上限

另外在「Set roaming plan」中則可以設定漫遊費率,Onavo Lite會自動偵測你是不是正在使用漫遊,並自動切換到你的漫遊統計設定,非常的聰明。

漫遊的設定可以先不用管它,Onavo Lite只需要你先設定前面的基本費率,就可以開始使用了。

Onavo Lite -05Onavo Lite -06

 

  • 設定超過流量上限時自動封鎖3G上網

接著在設定畫面還可以看到很多提醒功能(Notifications),包含當你快要到達每月流量上限前的提醒,或者是在你數據傳輸量到達上限99%時還可以自動封鎖3G上網功能,讓你不會超額付費。

此外這裡還可以設定針對「濫用流量」的App提供警告,「App installation notifications」可以在你安裝App時通知你這個App被認為會濫用流量,「App data-abuse notifications」則可以在某個App忽然使用過多流量時發出提醒。

Onavo Lite -07Onavo Lite -08

 

  • 自動計算3G上網傳輸量,自動排除無線網路傳輸量

其實前面雖然設定項目很多,但基本上維持預設值即可,真正需要我們設定的只有一開始的個人3G上網使用的資費方案而已。

設定完成後,Onavo Lite就會自動在背後監控我們的數據流量,並且隨時發出提醒,或是在到達上限時自動停止3G行動上網。

而我們可以點擊Onavo Lite按鈕,隨時進入主介面查看傳輸量統計報告。下圖就是我使用一天後的3G數據傳輸量。

另外很貼心的是,Onavo Lite很聰明的只會計算3G上網時的數據傳輸量,當我切換到WiFi無線網路時就會自動停止計算

Onavo Lite -09Onavo Lite -10

 

  • 單獨限制大流量App「只能使用」無線網路上網

Onavo Lite提供了很多種報告形式,例如點進「App Watch」後,可以看到每個App使用的3G數據傳輸量情形。

下圖中可以看到Pulse使用了非常大的流量(這裡我其實有點懷疑是BUG),那麼我要怎麼制止這個單獨App去吃掉我的數據流量呢?

很簡單,點進這個App後,點擊〔Restrict to Wi-Fi〕,就能單獨限制這個App只能使用無線網路上網了!

Onavo Lite -11Onavo Lite -12

 

  • 小結:

因為Onavo Lite Android版本剛剛推出,所以我也只嘗試使用了一天的時間,目前看起來是有用的,不僅可以統計數據傳輸量,而且也真的會在App濫用流量時發出警告,甚至是自動禁止3G上網。

如果你有手機上網節費的需求,或許可以試試看使用這個App來幫助你。

Onavo Lite -13Onavo Lite -14

from 電腦玩物: http://playpcesor.blogspot.com/2011/08/onavo-lite-3g.html

Written by cwyalpha

八月 31, 2011 at 1:08 下午

发表在 Uncategorized

Thought this was cool: SIGKDD 2011 Conference — Days 2/3/4 Summary

leave a comment »

<< My review of Day 1.

I am summarizing all of the days together since each talk was short, and I was too exhausted to write a post after each day. Due to the broken-up schedule of the KDD sessions, I group everything together instead of switching back and forth among a dozen different topics. By far the most enjoyable and interesting aspects of the conference were the breakout sessions.

Keynotes

KDD 2011 featured several keynote speeches that were spread out among three days and throughout the day. This year’s conference had a few big names.


Steven Boyd, Convex Optimization: From Embedded Real-Time to Large-Scale Distributed. The first keynote, by Steven Boyd, discussed convex optimization. The goal of convex optimization is to minimize some objective function given linear constraints. The caveat is that the objective function and all of the constraints must be convex (“non-negative curvature” as Boyd said). The goal of convex optimization is to turn the problem into a linear programming problem. We should care about convex optimization because it comes from some beautiful and complete theory like duality and optimality conditions. I must say, that whenever I am chastising statisticians, I often say that all they care about is “beautiful theory” so his comment was humorous to me. Convex optimization is a very intuitive way to think about regression and techniques such as the lasso. Convex optimization has tons of use cases including parameter estimation (MLE, MAP, least-squares, lasso, logistic SVM and modern L1 optimization). Boyd showed an example of convex optimization for disk head scheduling.

For more information about convex optimization, see the website for Convex Optimization by Boyd and Vandenberghe. The book is available for free as well as lecture slides etc. Even better, the second author is from UCLA! I did not realize that.


Peter Norvig, Internet Scale Data Analysis. It is always great to hear from Peter Norvig. At the very least, you may have seen his name on your Artificial Intelligence introductory textbook Artificial Intelligence: A Modern Approach. Norvig is also well known as the Director of Research at Google. He also spoke at SciPyCon 2009 and was wearing a similarly flashy shirt. Norvig discussed how to get around long latencies in a large scale system. Interestingly, his talk began with a discussion about Google’s interest in its carbon footprint because of course all of Google’s massive systems require a lot of power. The carbon output of 2500 queries is approximately equal to the carbon output in a beer. Norvig noted that most of Google’s most successful engineers are well-versed in distributed systems, and this should come as no surprise. He then introduced MapReduce and showed an example of how Google uses MapReduce to process map tiles for Google Maps. Norvig concluded by mentioning a variety of large systems used by Google including BigTable (column oriented store), and Pregel for graph processing. Pregel is vertex based, and thus programs “think like a vertex” where each vertex responds to actions transmitted over edges.

(There was a keynote by a fellow named David Haussler about cancer genomics. After an exhausting first two days, I skipped this talk as I needed to sleep…and I was not incredibly interested in the topic.)

Judea Pearl, The Mathematics of Causal Inference. Go Bruins! Judea Pearl is a professor at the UCLA Department of Computer Science and teaches a course on his field, Causality, each spring. His talk was essentially the same talk he gives at UCLA at the beginning of the quarter. I attempted to take his course in 2009, but quite frankly, I don’t get it and my mind cannot bend into that realm. I remember sitting in his class and wondering “what is wrong with me?” I love listening to Dr. Pearl speak only because of his sense of humor. Despite his age and the fact that he is slowing down, he had the crowd in hysterics as he struggled with the presentation technology and made intelligent jokes at every chance.

Pearl believes that humans do not communicate with probability, but causality (I do not agree with this entirely). I appreciated that he mentioned that it takes work to overcome the difference in thinking between probability and causality. In statistics, we use some data and a joint distribution to make inferences about some quantity or variable P. In causality, there is an intentional intervention that changes the joint distribution P into another joint distribution P’. Causality requires new language and mathematics (I do not see it). In order to use causality, one must introduce some untestable hypothesis. Pearl mentioned that some non-standard mathematical methods include counterfactuals and structural equation modeling. I do not know how I feel about any of this. For more information about Pearl’s Causality, check out his book.

Data Mining Competitions

One interesting event during KDD 2011 was the panel Lessons Learned from Contests in Data Mining. This panel featured Jeremy Howard (Kaggle), Yehuda Koren (Yahoo!),  Tie-Yan Liu (Microsoft), and Claudia Perlich (Media6Degrees). Both Kaggle and Yahoo run data mining competitions: Kaggle has its own series of competitions and Yahoo is a major sponsor of the KDD Cup competition. Perlich has participated and won many data mining competitions. Liu provided a different insight into data mining competitions as an industry observer.  

Jeremy Howard gave some insight into the history of data mining competitions. He credited KDD 97 with the formation of the first data mining competition. He announced to the crowd that companies spend 100 billion dollars every year on data mining products and services (not including in-house costs such as employment) and that there are approximately 2 million Data Scientists. The estimate of the number of Data Scientists was based on the number of times R was downloaded, and is an estimate based on David Smith’s (Revolution Computing) blog post. I love R, and every Data Scientist should use it, but there are several problems with this estimate. Not everyone that uses R is a Data Scientist; a large portion of R users are statisticians (“beautiful theory”), teachers, miscellaneous students etc. Second, not all Data Scientists use R. Some are even more creative and write their own tools or use little-adopted software packages. There are also a lot of Data Scientists that use Python instead of R. Howard also announced that over the next year, Kaggle with be starting 1000s of “invitation only” competitions. Personally, I do not care for this type of exclusion even though their intentions are good.

Yehuda Koren introduced the crowd to Yahoo’s involvement in data mining competitions. Yahoo is a major force behind the KDD Cup and the Heritage Foundation competition. Yahoo also won a progress award in the Netflix challenge. Koren then described how data mining competitions help the community. Competitions raise awareness and attract research to a field, end up involving the release of a cool dataset to the community, encourage contribution and education, and provide publicity for participants and winners. Contestants are attracted to competitions for various reasons including fun, competitiveness, fame, the desire to learn more, peer pressure and of course the monetary reward. As with every competition, data mining competitions have rules and Koren stated that rules are very difficult to enforce. I believe that data mining is vague as it is, so competitions would be just as vague. It is important to maximize participation by minimizing the reduction of participation while maximizing fairness and innovation. Some such “rules” include discouraging huge ensembles (which probably overfit anyway), submission frequency, team duplication, team size (the KDD Cup winning team had 25 members). Some obvious keys to success in data mining competitions are ensembles, hard work, team size, innovation vs. fancy models, quick coding and patience.

I felt that Tie-Yan Liu from Microsoft sort of served as the Simon Cowell of the panel, and I feel that his role was necessary. He provided industry insight that provided a bit of a reality check as to what data mining competitions accomplish and do not accomplish. Liu questions if the problems being solved in data mining competitions are really important problems. Part of the problem is that many datasets are censored as to protect privacy. Additionally, the really interesting problems cannot be opened to the public because they involve trade secrets. I consider myself an inclusive guy – I do not like the concept of winners and losers. I was elated that Liu brought up this point: “what about the losers?” Is it bad publicity to “lose” several (or all) competitions? The answer to this question varies person-to-person. I honestly believe that the goal of these competitions is of the open-source nature (fun, share, learn, solve) and not so much to cure cancer. They are great for college students, people that are interested in data science but do not have access to great data. For the rest of us, learning on our own using interesting data is probably better.

Claudia Perlich (Media6Degrees) discussed her experience participating in data mining competitions. She has won several contests. She commented on the distinction between sterile/cleaned data and real data as competitions can include either type. The concept of Occam’s Razor applies to data mining competitions; Perlich won most of her competitions using a linear model, but by using more complex and creative features. Perlich emphasizes that complex features are better than complex models.

Considering the Netflix Prize has been one of the biggest data mining competitions, I was disappointed that they were not represented on the panel since there were several researchers from Netflix at the conference.

Rather than write a few sentences for each topic, I will just bullet the goals of the research discussed in the sessions. Descriptions with a star (*) denote my favorite papers and are cited later.

Text Mining

I attended two of the three text mining sessions. I must say that I am quite topic-modeled and LDAed out! Latent Dirichlet Allocation (LDA) and several variations were part of every talk I heard. That was very exciting and reaffirms that I am in a hot field. Still, nobody has taken my dissertation topic yet (which I have remained quiet about).

  • Using explicit user feedback to improve LDA and display topics appropriately by combining topic labels, topic n-grams and capitalization/entity detection.* This talk was presented by David Andrzejwski (@davidandrzej). I finally got to meet him and I discussed my dissertation topic with him. I am always entertained by the fact that we all look much different than our Twitter avatars portray. :)
  • Using external metadata and topics (LDA) to predict user ratings on items using localized factor models.
  • Using preferences and relative emphasis of each factor (i.e. how important to you is free wireless Internet in a hotel room?) to predict rating scores.*
  • Determining the network process that created a piece of text: who copied from whom?
  • Using a topic model (LDA) with other features such as part-of-speech tag (noun, verb etc.), WordNet features, sentiment/polarity etc.*
  • Modeling how topics and interests grown over time and understanding the correlations between terms over time.*

Social Network Analysis and Graph Analysis

The Social Networks session conflicted with one of the Text Mining sessions, but since I knew there would be two more, I decided to attend this one instead. I also combined the two Graph Analysis sessions into this section since they are so related. The goals of the research presented in these talks were as follows:

  • To label venue (Foursquare venues etc.) types (restaurant, bar, park etc.) based on several attributes of the user: user’s friends, user’s weekly and daily schedule using label propagation.
  • To determine the connections/edges in a social network that are the most critical for propagation of data (an idea, tweet, viral marketing etc.)*
  • To use tagging (items on Amazon can be tagged with keywords by users) and reviews to predict the success of a new item.
  • To find a better metric for ranking search engine results by starting with a relevant subgraph rather than a random surfer model. Also models attention span of user.*
  • Classification of nodes, labeling of nodes and node link prediction using one unified algorithm (C3).*
  • Ranking using large graphs using a priori information about good/bad nodes and edges.*
  • The importance of bias in sampling from networks.*

User Modeling

This session I suspect was similar to the Web User Modeling session and focused on recommendation engines and rating prediction.

  • Using endorsements to measure user bias (retweets, likes, etc.) to perform real time sentiment analysis,
  • Estimating user reputation using thumbs-up vote rates on Yahoo News comments.
  • Selecting a set of reviews that encapsulates the most information about a product with the most diverse viewpoints.

Frequent Sets

I did some work with itemset mining at my last job and I was not incredibly interested in the Online Data and Streams session at the time so I attended this talk.

  • Using background knowledge about transactions to minimize redundancy.
  • Studying the effects of order on itemset mining.
  • Mining graphs as frequent itemsets from streams.

Classification

I got stuck in this session because the session I really wanted to attend “Web User Modeling” was full and there was nowhere to sit or stand. This session was more technical and theoretical. The only session that I really enjoyed was about a classifier called CHIRP. I did not follow the details, but this is a paper that I am interested in reading. The authors used a classifier based on Composite Hypercutes on Interated Random Projections to classify spaces that have complex topology (think of classifying items that appear in a bullseye/dartboard pattern).*

Unsupervised Learning

This talk was similar to the classification talk but more practical in my opinion.

  • Using decision trees for density estimation classifiers.
  • Clustering cell phone user behavior using “Earth Mover” distance.
  • Clustering of multidimensional data using mixure modeling with components of different distributions and copulas.*

Favorite Papers

Below is a short bibliograph of papers that were my favorite. There were also a few at the poster session (the first four) that I include here.

Wrapping Up

I had an awesome time at KDD and wish I could go next year, but it will be held in Beijing. I got to meet a lot of different people in the field that have the same passion for data and that was really cool. I got to meet with recruiters from a few different companies and get some swag from Yahoo and Google.

It was awesome being around such greatness. I ran into Peter Norvig several times, ran into Judea Pearl in the restroom (I already know him), as well as Christos Faloutsos (I am a huge fan) and Ross Quinlan. I stopped at the Springer booth and found a cool book about link prediction with Faloutsos as one of the authors. I went to buy it, handed the lady my credit card, and learned that it was $206 (AFTER conference discount)! Interestingly… Amazon has the same book for $165. I will probably order it anyway.

Here’s hoping that KDD returns to California (or the US) real soon!

<< My review of Day 1.

Candid Shots

Ross Quinlan enjoying a beer during the poster session. What a cool guy! Christos Faloutsos talking with a student during the poster session.

from Byte Mining: http://www.bytemining.com/2011/08/sigkdd-2011-conference-days-234-summary-3/

Written by cwyalpha

八月 31, 2011 at 11:08 上午

发表在 Uncategorized

Thought this was cool: 斯坦福CS221在线开课了

leave a comment »

内容有

Overview of AI, Search

Statistics, Uncertainty, and Bayes networks

Machine Learning

Hidden Markov models and Bayes filters

Markov Decision Porcesses and Reinforcement Learning

Adversarial planning (games) and belief space planning (POMDPs)

Logic and Logical Problem Solving

Image Processing and Computer Vision

Robotics and robot motion planning

Natural Language Processing and Information Retrieval

比较感兴趣的有机器学习、贝叶斯网络、隐马可夫链、图像处理

有想一起学习的么?

官网:http://robots.stanford.edu/cs221/online.html

HN: http://news.ycombinator.com/item?id=2941185

reddit讨论:http://www.reddit.com/r/aiclass

from hUrR DuRr: http://blog.est.im/archives/4185

Written by cwyalpha

八月 31, 2011 at 1:53 上午

发表在 Uncategorized

Thought this was cool: 增强现实技术(AR)及扩展应用

leave a comment »

这是一篇写给我自己看的科普文章。

在开始说增强现实(AR)之前,需要先说说虚拟现实(VR)

虚拟现实是从英文Virtual Reality 一词翻译过来的,简称VR。VR 技术是采用以计算机技术为核心的技术,生成逼真的视、听、触觉等一体化的虚拟环境,用户借助必要的设备以自然的方式与虚拟世界中的物体进行交互,相互影响,从而产生亲临真实环境的感受和体验。

典型的VR 系统主要由计算机、应用软件系统、输入输出设备、用户和数据库等组成。计算机负责虚拟世界的生成和人机交互的实现;输入输出设备负责识别用户各种形式的输入并实时生成相应的反馈信息;应用软件系统负责虚拟世界中物体的几何模型、物理模型、行为模型的建立,三维虚拟立体声的生成,模型管理及实时显示等;数据库主要用于存放整个虚拟世界中所有物体的各个方面的信息。

VR技术与三维动画技术的本质区别在于其交互性上。三维动画技术是依靠计算机预先处理好的路径上所能看见的静止照片连续播放而形成的,不具有任何交互性,即不是用户想看什么地方就能看到什么地方,用户只能按照设计师预先固定好的一条线路去看某些场景,用户是被动的;而VR 技术则通过计算机实时计算场景,根据用户的需要把整个空间中所有的信息真实地提供给用户,用户可依自己的路线行走,计算机会产生相应的场景,真正做到“想得到,就看得到”。

根据VR 技术对沉浸程度的高低和交互程度的不同,将VR 系统划分了4 种类型:沉浸式VR 系统、桌面式VR 系统、增强式VR 系统、分布式VR 系统。

而增强式VR 系统简称增强现实(Augmented Reality),就是我们经常说的在手机上应用比较多的AR了。它既允许用户看到真实世界,同时也能看到叠加在真实世界上的虚拟对象,它是把真实环境和虚拟环境结合起来的一种系统。AR中真实物体和虚拟物体与用户环境必须无缝结合在一起,而且真实物体和虚拟物体之间还要能够进行交互,这样才能实现真正的虚实融合。因此增强现实系统具有虚实结合、实时交互、三维定向的新特点。

好,下面来重点说说增强现实(AR)

一个AR系统需要有显示技术、跟踪和定位技术、界面和可视化技术、标定技术构成。

跟踪和定位技术与标定技术共同完成对位置与方位的检测,并将数据报告给AR 系统,实现被跟踪对象在真实世界里的坐标与虚拟世界中的坐标统一,达到让虚拟物体与用户环境无缝结合的目标。为了生成准确定位,AR系统需要进行大量的标定,测量值包括摄像机参数、视域范围、传感器的偏移、对象定位以及变形等。

 相对与智能手机而言,AR就是根据当前位置(GPS),和视野朝向(指南针)及手机朝向(方向传感器/陀螺仪),在实景中(摄像头)投射出相关信息并在显示设备(屏幕)里展示。其实现的重点在于投影矩阵的获取。

当然,在实际开发的时候其实android系统已经将投影矩阵封装的比较好了,可以通过接口直接获取投影矩阵,然后将相关的坐标转换算成相应的坐标就可以了。

移动增强现实系统应实时跟踪手机在真实场景中的位置及姿态,并根据这些信息计算出虚拟物体在摄像机中的坐标,实现虚拟物体画面与真实场景画面精准匹配,所以,registration(即手机的空间位置和姿态)的性能是增强现实的关键。移动AR的运作原理可以以下面这个图示简单来说明。

OK,科普的事情做完了,下面再来说说AR到底有些啥搞头

1、找到想去的地方

1.1找到我想去的那家店

Yelp Monocle
 利用 iPhone 的摄像头和数字罗盘把 Yelp 评分和实时的街景结合起来,这样您就可以找出五星级酒吧而不是误入那些自以为酷的小酒馆了。

1.2找到我的车

当你把车停在路边,并掏出手机启动Car Finder时,软件会记录下你的 GPS 信息。之后,当你想在茫茫车海中寻找它时,Car Finder 会利用摄像头,GPS,指南针,陀螺仪等多个传感器把你引导至正确的位置。

1.3指引路径

Wikitude Drive增强现实导航应用,用户看到的不是地图,而是前方街道的实时视图,以及叠加在视频上方的导航数据。现在已经在欧洲、澳大利亚、北美市场得到了应用。

2、“碰”到虚拟的东西

AR SOCCER虚拟颠球,就是你把屏幕对着一块干净的地板,然后屏幕上会出现一个足球,现在你可以用脚去踹它。

3、令现实按我的想法变化

3.1实景翻译

Word Lens是一个实景翻译的应用,不过目前只提供英语和西班牙语的相互翻译。

3.2虚拟试衣

使用Zugara的虚拟试衣间非常简单。你需要一台带摄像头的电脑和一点空间,后退到离摄像头4-5英尺的地方挥一挥手,你选中的衣物会自动“穿”到你身上。如果你觉得没有“穿好”,你还可以通过微调衣物的位置使其看起来与你更贴合。

3.3拍照

在使用“Farrago AR”时,用户可以轻松通过移动设备的触摸屏对图片内出现的物体进行旋转、调整大小、修正等精细操作。而且,“Farrago AR”友好的用户界面使得用户可以轻松创造出2D或者3D的图片外物体。

4、把虚拟叠加到现实上

伦敦博物馆出了一个增强现实的Apps,称其为“时光机器”,把手机对准当前所在的位置,那么系统会帮你匹配当前位置几十年前的样子。

Layar Reality Browser 把摄像头对准周遭的景物,Layar 会把各种数据找出来——巴士站、滑板公园以及房地产价格等等。

Wikitude被称为“世界浏览器”,它可以帮你探索周围的环境,查找地标的资料。只要你举起手机并打开摄像头,屏幕上就会出现一些标记——包括维基百科词条,带 GEO 标签的 Tweet,ATM 的位置。

Star Walk,增强现实的“天文互动指南”,利用GPS,指南针和陀螺仪教你辨别星座。 还有很多类似的游戏,比如星球大战AR版、AR Invaders(打飞碟)、AR Missile(导弹毁人)、ARBasketball(实景篮球)…..最后,说一下几个基于AR做平台的

1、高通

高通放出了android平台和ios平台上上的augmented reality develop kit。基于这个SDK开发人员更容易的运用智能设施中的摄像头,开端出实在内容与虚构内容联结的软件利用。

2、Layar

Layar旨在打造的一个开放的增强现实的平台,任何第三方都可以通过Layar的开发接口来打造基于Layar的自己的增强现实应用。目前其官方网站上列出的应用有2029个,其应用类型包括教育,游戏,建筑,艺术,交通,游戏等等;你也可以把Layar看做专门为AR应用搭建的App Store,因为里面的应用有免费的,也有收费的。

最后,说点扯淡的事情

移动硬件设备的发展使得人们在移动设备上的交互有了突破性的进展,以NFC、AR、裸眼3D等为代表的一大批应用形态的涌现给这个领域带来了最够的新奇与动力,创造了全新的用户体验。而在移动产品的设计上,如何利用手机的硬件性能来创造突破性的产品将是未来决定移动产品设计师能力以及移动产品成败的关键了。

相关文章:
不畏弹窗遮望眼
被Google Chrome打败了
【摘】博客作者就像棋盘上的小兵
王小丫就是这样崩溃滴
无觅

0

   

0

IT 牛人博客聚合网站(udpwork.com) 聚合
|
评论: 0

from IT牛人博客聚合网站: http://www.udpwork.com/item/5728.html

Written by cwyalpha

八月 30, 2011 at 3:23 下午

发表在 Uncategorized

Thought this was cool: Programming Computer Vision with Python

leave a comment »

推荐一本书:

Programming Computer Vision with Python

还没写完,可以直接下载电子版的书稿。

Tags: ,

Related posts

from 增强视觉 | 计算机视觉 增强现实: http://www.cvchina.info/2011/08/30/programming-computer-vision-with-python/

Written by cwyalpha

八月 30, 2011 at 6:24 上午

发表在 Uncategorized

Thought this was cool: 比较描述子

leave a comment »

局部特征描述子可以分为两类,(个人看法,欢迎批评):一是基于“绝对”值的,二是基于比较的。

基于绝对值的是指诸如SiftSurfGLOH之类的描述子。一般的思路是将灰度,梯度等量化,构造直方图。这类描述子的判别性高,直观,但是有个通病就是计算复杂度高。

基于比较的是指诸如FernsBRIEFOrbOSIDBRISK之类的描述子。一般的思路是通过比较预先训练的,或者随机点对的特征值大小,来构造描述子。这类描述子一般都是为了提高计算速度而设计的。这类描述子不关心原始特征的绝对大小,只关心原始特征的ranking。(值得一提的是为什么将Ferns也归在此类,Ferns并没有一个显式的特征描述,甚至没有一个距离度量,但是我相信Ferns之所以有用还是基于pairwise pixel comparisons 的判别能力。注1

我之所以将描述子如此分类,是受到ICCV11的这篇文章的启发:

The Power of Comparative Reasoning

文中提出了一个非常简单的feature compression的方法WTA(不是WTF^_^):

image

大意就是将feature以随机的方式重新排列(permutation),之后取前K维中的最大值所在的位置为此次的hash值。

如此重复m次,就将原始的feature压缩为一个长度为m的hash code(Signature)。(听起来像不像是min-hashing,事实上作者也声称min-hashing是WTA的一个特例 — 当特征为01串,K=2时)。WTA甚至有一个多项式核的扩展。

根据如此的分类法,一个很自然的问题是,基于绝对值的,和基于比较的描述子孰优孰劣?

这基本上没个定论,个人感觉,从追求高判别性(高precision和recall),就要使用基于值的描述子,追求计算速度,就要使用基于比较的的描述子。

但是这么说肯定很多人不同意,比如:

Feng Tang在OSID 【cvpr09】中声称基于绝对值的描述子只对线性的光照变化就有不变性,但是基于比较的,拥有对更广泛的单调光照变化具有不变性,而不严格要求线性变化。

We shall show later that our local feature descriptor is invariant to any brightness change if the brightness change functions for all the local patches are monotonically increasing.

而且一个明显的趋势就是,这些年来,第二类的描述子(基于比较的)是越来越多了,而且大有性能超越第一类的趋势(起码从文章上看如此。)。

至于以后的发展,就让我们拭目以待吧。

注1:Several papers talk about using pairwise pixel comparisons or related quantities for pattern matching. [10] builds an approximation to cosine distance based on concomitant rank orders. The basic strength of these methods comes from using pixel pair representations as features. Although there have been several papers using these features, they are often not emphasized as the core part of the system or there hasn’t been any theoretical justification on why they should be used.  — From The Power of Comparative Reasoning

Tags: , , , , , , , , , , ,

Related posts

from 增强视觉 | 计算机视觉 增强现实: http://www.cvchina.info/2011/08/30/%e6%af%94%e8%be%83%e6%8f%8f%e8%bf%b0%e5%ad%90/

Written by cwyalpha

八月 30, 2011 at 6:24 上午

发表在 Uncategorized

Thought this was cool: reculike的几点改动

leave a comment »

reculike.com 上线后感谢大家的支持,有了一定的访问量。不过目前的访问量还不支持能够算出好的推荐结果,所以希望大家能多反馈。

最近对reculike做了一些改动,总结如下

1. 用户的主要显性反馈为两种。每篇paper下面可以让用户bookmark,表示用户对这篇paper感兴趣,准备记录下来,以后有时间仔细研究。另外,在paper的页面,用户可以recommend一篇paper,表示用户觉得自己对这篇文章很熟悉,觉得很好,希望推荐给别人。目前,用户如果要recommend文章,就一定要写推荐语。

这两种行为代表了一种专家和普通用户的互动。今后在这方面还有一些后续的功能。比如,一个普通用户可能bookmark一篇文章,表示他对文章有兴趣,那么这个时候,如果有专家recommend这篇文章,系统就会在首页上告诉这个用户有专家recommend这篇文章了,那么如果这个用户对这篇文章有疑问,可以向这位专家请教。因此,可以通过paper来联系用户,实现用户的互动。

2. 在首页显示了用户的bookmark过的paper,用户recommend过的paper,和系统给用户的推荐paper。默认显示推荐的paper,但用户可以通过点击上面的链接来切换不同的paper列表。

目前系统还很粗糙,欢迎大家使用。有什么问题可以在sina微博上 @xlvector

您可能也喜欢:

RecULike 论文推荐系统初步上线

比较著名的web2.0网站

关于推荐系统算法只占10%的讨论

HTML DOM

N-最短路径分词算法

无觅

from xlvector – Recommender System: http://xlvector.net/blog/?p=796

Written by cwyalpha

八月 30, 2011 at 6:23 上午

发表在 Uncategorized