CWYAlpha

Just another WordPress.com site

Thought this was cool: 用python写一个载入和处理Movielens数据的示例程序

leave a comment »


movielens是一个开源的训练和测试推荐系统的数据包。

测试自己的推荐系统的准备工作就是把这个测试数据载入程序,并且处理成自己程序设定的格式。

以下是python源码:(路径都是我电脑中的文件路径,大家需要修改成自己的路径)

#!/usr/bin/python

#这个程序先把100k那个包里u.item的电影id和电影名称提取出来,并生成一个电影字典movies,然后

#将u.data中的用户id和电影id以及对应的rating提取出来,生成一个用户的打分字典。

def loadMovieLens_100k(path=’C:/Users/Administrator/Desktop/ci_code/ml-100k’):

  # Get movie titles

  movies={}

  for line in open(path+’/u.item’):

    (id,title)=line.split(‘|’)[0:2]

    movies[id]=title

  

  # Load data

  prefs={}

  for line in open(path+’/u.data’):

    (user,movieid,rating,ts)=line.split(‘\t’)

    prefs.setdefault(user,{})

    prefs[user][movies[movieid]]=float(rating)

  return prefs

#这个程序用同样的方法生成了1m那个包的评分字典

def loadMovieLens_1M(path=’C:/Users/Administrator/Desktop/ci_code/ml-1m’):

   #get ratings

   prefs={}

   for line in open(path+’/ratings.dat’):

       (user,movie,rating,ts)=line.split(‘::’)

       prefs.setdefault(user,{})

       prefs[user][movie]=float(rating)

   return prefs

#将电影和流派提取出来,写成一个电影-流派的矩阵字典

def loadMovieTag(path=’C:/Users/Administrator/Desktop/ci_code/ml-1m’):

    #get movie-tag matrix

    movies={}

    for line in open(path+’/movies.dat’):

        (id,title,tag)=line.split(‘::’)

movies[id]=tag

   # movies=sorted(movies.items(),key=lambda d:d[0],reverse=False)

    return movies

#将电影和流派提取出来,写成一个用户-电影-流派的字典

def loadUserTag(path=’C:/Users/Administrator/Desktop/ci_code/ml-1m’):

    #get movie-tag matrix

    movies={}

    for line in open(path+’/movies.dat’):

        (movie,title,tag)=line.split(‘::’)

movies[movie]=tag

    users={}

    for line in open(path+’/ratings.dat’):

        (userid,movieid)=line.split(‘::’)[0:2]

users.setdefault(userid,{})

users[userid][movieid]=movies[movieid]

   # users=sorted(users.items(),key=lambda d:d[0],reverse=False)#sort the users dict as the key of dict

    return users 

#将电影–流派字典输出成文件

def outputMovieTag():

    movie_tag=loadMovieTag()

    f=file(‘tag.dat’,’w+’)

    for num in movie_tag:

        f.write(‘%s::’ %num)

        for tag in movie_tag[num]:

            f.write(‘%s’ %tag)

    f.close()

outputMovieTag()

#将用户-电影–流派字典输出成文件

def outputUserTag():

    user_tag=loadUserTag()

    f=file(‘user_tag.dat’,’w+’)

    for user in user_tag:

        f.write(‘userid:%s\n’ %user)#user id

for movie in user_tag[user]:

       f.write(‘\t%s: ‘ %movie)

f.write(‘%s’ %user_tag[user][movie])

f.write(‘\n’)

    f.close()

outputUserTag()

 

from 阿俊的博客: http://somemory.com/myblog/?post=11

Written by cwyalpha

五月 22, 2012 在 3:24 上午

发表在 Uncategorized

发表评论

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / 更改 )

Twitter picture

You are commenting using your Twitter account. Log Out / 更改 )

Facebook photo

You are commenting using your Facebook account. Log Out / 更改 )

Google+ photo

You are commenting using your Google+ account. Log Out / 更改 )

Connecting to %s

%d 博主赞过: