摘要:进入大数据时代后,研究用户行为不再像以前引用平均随机抽样代表全体,数据时代需要研究全体用户数据,因此,给现在研究过程中带来了数据存储、数据处理、数据计算困难等挑战。

本文基于hadoop云平台研究用户行为数据的存储和用户行为挖掘。设计并实现分布式、高可靠、高可用性的数据存储模块,解决现在数据量大存储困难的问题。提出基于MapReduce的分布式并行分词算法,调用集群的所有计算节点,对海量的中文文本进行分词计算,相比较传统中文分词能够提高三倍以上的分词效率,并能够解决现阶段海量文本分词困难的现状。本文将hadoop云平台结合微博用户行为数据进行分析,首先对重庆地区的微博信息进行分词,然后分析挖掘重庆每天各区县关于“感冒”、“肺炎”、“发热”、“咳嗽”的词汇统计,很好的解决微博内容稀疏,价值隐藏深,挖掘困难等问题,实现重庆相关部门对本地医疗的监控和预警。设计数据挖掘结果展示模块,基于Mysql+jdbc+http+Ajax多维度多方位全面的展示微博用户行为分析结果。52407

毕业论文关键词: HDFS;Hadoop;MapReduce;用户行为分析;微博用户

Research on the behavior of Micro-blog users based on hadoop 

Abstract:  After entering the era of big data, the study of user behavior is no longer as previously referenced average random sampling on behalf of all, the era of data need to study all user data. Therefore, now the research process to brought challenges of data storage, data processing, data and calculate difficulty.

In this paper, based on the Hadoop cloud platform to study the user behavior data storage and user behavior mining. Design and implementation of distributed, high reliability, high availability of data storage module, to solve the problem of large amount of data storage. Is proposed based on the MapReduce distributed parallel word segmentation algorithm, called cluster of all computing nodes, the massive Chinese text segmentation calculation, compared with the traditional Chinese word segmentation can improve more than three times the segmentation efficiency, and can solve the present stage massive text segmentation difficult situation. The Hadoop cloud platform combined with micro Bo user behavior data analysis, first of all to the Chongqing area of the microblog information segmentation, and analysis of mining districts and counties of Chongqing daily vocabulary statistics about "cold", "pneumonia", "fever", "cough", very good solve the microblogging content sparse, deep hidden value, mining is difficult problem, relevant departments of Chongqing on the local medical surveillance and early warning. Design data mining results display module, based on the Mysql+jdbc+http+Ajax multi-dimensional multi-dimensional comprehensive display of micro-blog user behavior analysis results.

Keywords: Research on the behavior of user; HDFS; Hadoop; MapReduce; Micro-blog users

 目录

摘要 i

Abstract ii

目录 iii

1 引言 1

1.1 研究背景 1

1.2 国内外研究现状 1

1.2.1 大数据国内外研究现状 1

1.2.2 用户行为分析研究现状 3

1.3 主要工作 5

1.4 论文组织结构 5

2 大数据技术HADOOP的研究

上一篇:jsp《计算机通信及网络》课程试题库设计
下一篇:jsp+sqlserver医院管理系统设计与实现

《信息技术课程标准》微课的设计与制作

《读书交流分享》APP的设计与开发

《信息技术课程标准》系列微课的设计与制作

《大学生课堂教学管理与...

教育技术学专业技能学习网站的设计

基于Android的电子拍卖系统设计与开发

基于Web应用的致胜公司企业内部培训系统设计

大淘宝网的虚假交易研究

日语论文中日酒文化对比研究

激光模拟训练器材国内外研究现状

新疆农林高校學生昆虫生...

发酵米粉优势菌株的发酵特性研究

淮安市高校足球运动损伤问卷调查表

肢体语言在小学英语教学中的应用浅谈

浅谈农村大气环境保护的制度构建【1868字】

2021年什么行业赚钱,适合...

个案管理茬老年糖尿病患...