Agenda for Opening Ceremony of

 The Center for Cloud Computing and Big Data & Workshop

 Room 201, The Math Building,

 Software Engineering Institute, East China Normal University

 3663 Zhongshan Road (N), Shanghai, China 200062

 April 15 – 16, 2013

                                                     Agenda Download

 Monday, April 15, 2013

 

Time

Event/Topic

08:30-09:00

Check-in

09:00-10:00

1.      Guest Introduction

2.      Research Center Introduction – Prof. Aoying Zhou, East China Normal University

3.      Plaque Unveiling – Prof. Ruqian Lu, Mr. Haomin Guo

4.      Academic Committee Chair Address the Ceremony – Prof. Xuemin Lin, The University of New South Wales/East China Normal University

5.      University President Address the Ceremony – Prof. Ziqianq Zhu, Vice President of ECNU

6.      Director of 085 Platform Address the Ceremony – Prof. Jifeng He

10:00-10:30

Group Photo & Tea/Coffee Break

10:30-11:15

Big Data and Capital Market - Shuo Bai, CTO of Shanghai Stock Exchange

11:15-12:00

The Berkeley Lab Model: A Collaborative Approach to Computer Science Research - Michael J. Franklin, Thomas M. Siebel Professor of Computer Science and Director, Algorithms, Machines and People Laboratory (AMPLab) University of California, Berkeley  Slides

12:00-14:00

Lunch

14:00-14:45

Cloud computing and Big Data: The Challenges - Ming-Chien Shan, SAP Fellow, Vice President at SAP   Slides

14:45-15:30

Lifelong learning in Big data - Nicholas Zhang, Leader of HCI&SYS Group in Noah’ Ark Lab, Huawei    Slides

15:30-16:00

Break

16:00-16:45

OceanBase: A distributed shared-nothing relational database – Zhenkun Yang, Senior Researcher, Alibaba  Slides

16:45-17:30

Location Based Service and Big Data - Hai Cui, Chief Technology Officer, AutoNavi  Slides

18:00

Banquet

Tuesday, April 16, 2013

Time

Event/Topic

08:30-09:15

Scalable Mixed-membership Network Modeling with Triangular Representations – Junming Yin, Carnegie Mellon University  Slides

09:15-10:00

Management and Analysis of Social Media Data: A Case Study Based on Sina Weibo – Weining Qian, East China Normal University   Slides

10:00-10:30

Tea/Coffee Break

10:30-11:15

Big Data in Healthcare - Feng Cao, Senior Research Staff Member in IBM Research China   Slides

11:15-12:00

In-memory Data Management and Analytics – Minqi Zhou, East China Normal University   Slides

12:00-14:00

Lunch

14:00-17:30

Panel Discussion

 Introduction to the Center for Cloud Computing and Big Data @ECNU

      The establishment of the Center for Cloud Computing and Big Data was approved by East China Normal University (ECNU) on June 19th, 2012.  The mission is to build an international platform for big data-related research and development. The center will continue the tradition on collaboration among partners, practice the notion of collaborative innovation, focus on China proprietary applications, and conduct the world-class research. By unifying the mechanisms of academic partnership, industrial sponsorship and visiting professorship, the center strives to comprehensively boost the capability of R&D and application promotion, developing big data technologies and systems for customers in China.

 The supporting institution of the center is Shanghai 085 Platform at ECNU and the affiliated Institute of Massive Computing. The center will engage in academic partnership development, including academic resources sharing, personnel exchange and research collaboration. The center will be devoted to developing strategic collaboration with industry to investigate market demands, conduct research on joint projects, and transfer scientific and technological achievements. The center will invite prestigious scholars from universities and research institutes from all over the world to visit the campus for research exchange. The center will organize seminars and workshops for faculty and students from the partners.

 Big Data and Capital Market

 Shuo Bai

 CTO of Shanghai Stock Exchange,

 President of the Shanghai Stock Communication Co. Ltd

  Abstract

 New techniques on big data processing originated from the sprint volume limitation bring new opportunities for the traditional data processing on the capital market. High-speed data processing and multi-sourced integration provide the technical foundation for the business form revolution of the capital market. The new techniques of data processing create new value, but also have new risks being used by data game/confrontation. We expect: a great platform for processing structured big data; a knowledge-oriented big data processing technology for unstructured data to improve user experience; a domain characteristics model for integrating structured and unstructured data based on the real capital market. The capital market can make a unique contribution for the development of big data. For the big data industry, the capital market provides a platform of technology incubation, financing, data valuation and pricing.

 Bio

Shuo BaiCTO of Shanghai Stock Exchange, president of the Shanghai Stock Communication Co. Ltd. He received his Ph. D. in Computer Science and Theory from Beijing University. He acted as the principle investigator for developing the core business systems in Shanghai Stock Exchange such as the third generation surveillances, including the next-generation trading system, the enterprise data warehouse and so on. Currently, he is responsible for the running of the IT infrastructures, the development of the IT systems and the long-term planning of the IT infrastructure in Shanghai Stock Exchange. He served as the project leader for two projects from the National Science and Technology Support Program. He has worked in the Institute of Computing Technology, Chinese Academy of Sciences (CAS) and managed more than ten national projects. He also worked in the Ministry of Information Industry and was responsible for planning and managing the projects related to information security. Several high-tech achievements did successful industry-oriented technology transfers. He joined the planning, guidance and reviewing for several national adolescents extracurricular activities on science and technology. He was the first technology advisor for Intel International Science and Engineering Fair who is with the competition team. As one of originators of China Computer Network Emergency Response Technical Coordination Center (CNCERT/CC), he makes a great contribution to its establishment and initial operations. Now, he serves as an executive director of the Chinese Information Processing, a member of the leading group on the informatization to stock futures, vice chairman of stock sub-committee in China Financial Standardization Technical Committee, and also a Ph.D supervisor of Institute of Computing Technology, CAS, Institute of Information Engineering, CAS and University of Chinese Academy of Sciences (CAS).

 The Berkeley Lab Model - A Collaborative Approach to Computer Science Research

 Prof. Michael Franklin

 Thomas M. Siebel Professor of Computer Science

 Director of Algorithms, Machines and People Laboratory (AMPLab)

 University of California, Berkeley

  Abstract: The Berkeley AMPLab is developing a new Big Data analysis software stack by deeply integrating machine learning and data analytics at scale (Algorithms), cloud and cluster computing (Machines) and crowdsourcing (People) to make sense of massive data. In just a bit over two years, the lab has already had substantial impact on the industry, through the release of increasingly popular systems software such as Mesos, Spark, and Shark, and applications such as the Carat smartphone collaborative energy debugging app. At the same time, the lab has garnered Best Paper and Best Demo awards at major academic conferences and has placed graduating students in top university and industry positions. While timing has certainly played a role in this success - AMPLab was getting up to speed just as "Big Data" was becoming a hot topic - the lab also owes a huge debt to a unique model of research developed over many years in the EECS department at Berkeley. Earlier projects following variants of this model include impactful efforts such as the RISC and RAID projects, the SPICE CAD software project, the RADLab effort on cloud computing and the PARLab project on parallel computation. This model turns out to be very effective for attacking large-scale, multi-faceted computing problems such as Big Data analytics. The key aspects of the model include: organizing for collaboration, industry/academia partnerships, setting ambitious goals, having driving applications, and open source software and development. In this talk I will give a bit of history on the development of this model, and then describe how the AMPLab implements the key aspects listed above. While the model has aspects that may be difficult to replicate outside of the Berkeley context, I hope that it can provide some inspiration and lessons that can be useful in the development of a new center such as the one being created at ECNU.

  Bio: Michael J. Franklin is the Thomas M. Siebel Professor of Computer Science and Director of the Algorithms, Machines and People Lab (AMPLab) at UC Berkeley. His research focuses on new approaches for data management and data analysis, including data stream processing and continuous analytics, scalable query processing, large-scale sensing environments, data integration, and hybrid human/computer data processing systems.  He was the founder and CTO of Truviso, Inc. a real-time data analytics company recently acquired by Cisco Systems. He is an ACM Fellow and winner of the ACM SIGMOD Test of Time Award. His recent awards include the Best Paper awards at ICDE 2013 and NSDI 2012, a "Best of VLDB 2012" selection, Best Demo awards at VLDB 2011 and SIGMOD 2012 and an Outstanding Advisor Award from the Computer Science Graduate Student Association at Berkeley. He is currently serving as a committee member on the U.S. National Academy of Sciences study on Analysis of Massive Data. Prof. Franklin is currently on sabbatical at the Center for Big Data and Cloud Computing at East China Normal University in Shanghai.

 Cloud computing and Big Data: The Challenges

 Ming-Chien Shan

 Fellow and Vice President at SAP

 Abstract: There are four pillars jointly form the base of future computing system: they are cloud computing, big data computing, in-memory computing and smart computing. In this talk, we will briefly review each of them, highlighting their challenges, business values, market status and current landscapes. We will also discuss the way of leveraging each other to provide even more effective and efficient solutions. 

  Bio: Ming-Chien Shan received his Ph.D. degree in computer science from University of California, Berkeley in 1980. He is a Vice President at SAP, a visiting professor and co-director of the Center for Cloud Computing and Big data at East China Normal University. He served as one of the directors of AIS/SIGPAM (Special Interest Group on Process Automation and Management), associate editor-in-chief of IEEE Transactions on Services Computing, member of editorial board of International Journal for Informatics, Journal of Database Management, and International Journal of Business Process Integration and Management. In recent years, he acted as chair or program committee member of many conferences, including the conference chair of VLDB-TES'2002-2005, RIDE-2EC'2002 and WEC'2004, program committee chair of GRC'2006, APSCC'2006, ICWS'2008, and SERVICES'2010, program committee member of VLDB'2002, ICWS'2004, SCC'2006, SERVICES'2010-2012, etc. He is serving as an audit committee member on the U.S. National Science Foundation on University Research Scholarship and the Research Grants Council of Hong-Kong government. His research interests mainly include database technology and services computing. He has published more than 100 research papers and been granted more than 80 U.S. software patents.

 Life Long learning in Big data

 Nicholas Zhang

 Leader of HCI&SYS Group in Noah’ Ark Lab, Huawei

  Abstract: A major challenge in today's world is the Big Data problem, which manifests itself in Web and Mobile domains as rapidly changing and heterogeneous data streams.  A data-mining system must be able to cope with the influx of changing data in a continual manner.  This calls for Lifelong Machine Learning, which in contrast to the traditional one-shot learning, should be able to identify the learning tasks at hand and adapt to the learning problems in a sustainable manner.  A foundation for lifelong machine learning is transfer learning, whereby knowledge gained in a related but different domain may be transferred to benefit learning for a current task.  To make effective transfer learning, it is important to maintain a continual and sustainable channel in the life time of a user in which the data are annotated. We will address

 1. Human factors in lifelong learning

 2. Transfer learning, a case study in understanding human activities

  Bio: Nicholas Zhang has over 16 years research and system development experience in network, distributed system and communication system architectures. He has contributed more than 90 patents. He had been a product development team leader of smart devices in 2002 to pioneer new consumer business for Huawei. He then started to lead the research on future Internet and cooperative communication in 2005. In 2009, he became in charge of the advance network technology research department, leading research on future network, distributed computing, database system, and data analysis. His recent research focuses on Human Machine Interaction and Systems in data analysis and intelligence system.

 OceanBaseA Distributed shared-nothing relational database

 Zhenkun Yang

 Expert Researcher, Alibaba

 Abstract: While big data of petabytes are attracting more and more eye-balls, relational database is still the BASE of our society today. But the scalability of traditional RDBMS has relied exclusively on single big server which is highly complex, proprietary and disproportionately expensive. In this talk, I will share with you OceanBase (https://github.com/alibaba/oceanbase ), an open source, distributed shared-nothing storage system for structured data built at Alibaba. OceanBase supports transaction (ACID) as well as many features of the relational model and SQL. It can easily scale to trillions of records and hundreds of terabytes. It also enables continuous availability while running on hundreds of inexpensive commodity serves. By eliminating random disk write, it matches modern solid state disk (SSD) perfectly. OceanBase has provided relational database services for dozens of projects in the product system of Alibaba.

  Bio: YANG Zhenkun( This email address is being protected from spambots. You need JavaScript enabled to view it. ) is a Senior Researcher with Alibaba. In recent years, his research interests are distributed storage and computing system. He is now the chief architect of OceanBase (https://github.com/alibaba/oceanbase), an open source distributed shared-nothing relational database at Alibaba. Before joined Alibaba, he has been a Senior Scientist with Baidu.com, a Lead Researcher with Microsoft Research Asiaetc. He received his bachelor and master degrees from the Department of Mathematics, Peking University. After he got his PhD degree from the Department of Computer Science, Peking University, he became a faculty of the Institute of Computer Science and Technology, Peking University and a full professor at computer science in 1997. He received the Cheung Kong Scholar Award, Peking University in 1999. He was awarded the First Class Award of the National Science and Technology Progress of China in 1995 (the 4th person). He also won the First Class Award of Science and Technology Progress of Beijing Municipality in 1996, National Youth Science and Technology Award of China in 1998, Qiushi Eminent Award of the Chinese Academy of Science and Technology in 1998, and Wusi Youth Award of Beijing Municipality in 2000.

 Location Based Service and Big Data

 Hai Cui

 Chief Technology Officer, AutoNavi

 Abstract: With the rapid evolution in computing technology and ubiquitous availability of Smartphone and Connect Vehicle with map and navigation capability, LBS (Location Based Service) is undergoing technology revolution.  From traditional mapping & survey technologies, to Google street car, then to crowd sourcing and terabyte amount of UGC (User Generated Contents) data from hundreds of millions smartphone users, LBS core technologies are deeply involved with concepts and applications of best practices in big data acquisition and computing.  This report will discuss how the new technology in big data can help to lead innovation in LBS services.

  Bio: Hai Cui is a senior executive in information technology industry and a researcher in computer science. He currently serves as Chief Technology Officer in AutoNavi, a NASDAQ listed pubic company of digital map, navigation and location based service (LBS) provider. In his role, Hai Cui oversees company technology strategy, runs technical committee for research and development groups, and contributes to the success of the company’s expansion into Mobile Internet space. AutoNavi holds No.1 position among LBS service providers for both Automotive and Mobile Internet market segments in China. Previously, he was head of NHN China Technology Development Center, focusing on building platform technologies for large scale Internet services and managing all lines of R&D organizations in China. NHN is a global leading Internet service provider with world No.4 search engine Naver. The company operates the largest search engine, web portal, and Internet game services in South Korea. Prior to this, Hai Cui held various technical and management positions in Microsoft Corp. in both company headquarter in United States and its subsidiaries in China. He was devoted to core technology research and development in Internet and Mobile Internet space, in areas of mobile phone OS, web browser, cloud computing and large scale Internet service, and mobile and location services, for many years. He holds multiple world-wide patents. Hai Cui received his Bachelor of Science degree in Computer Science from Fudan University in China, and his Master of Science degree in Computer Science and Engineering from Michigan State University in United States respectively.

 Scalable Mixed-membership Network Modeling with Triangular Representations

 Junming Yin

 Lane Fellow in the Lane Center for Computational Biology at Carnegie Mellon University

 Abstract: In this work, we argue for representing networks as a bag of triangular motifs, particularly for important network problems that current model-based approaches handle poorly due to computational bottlenecks incurred by using edge representations. Such approaches require both 1-edges and 0-edges (missing edges) to be provided as input, and as a consequence, approximate inference algorithms for these models usually require Ω(N^2) time per iteration, precluding their application to larger real-world networks. In contrast, triangular modeling requires less computation, while providing equivalent or better inference quality. Using this representation, we develop a novel mixed-membership network model and approximate inference algorithm suitable for large networks. Empirically, we demonstrate that our approach, when compared to that of an edge-based model, has faster runtime and improved accuracy for mixed-membership community detection. We conclude with a large-scale demonstration on an N ≈ 280, 000-node network, which is infeasible for network models with Ω(N^2) inference cost. This is a joint work with Qirong Ho and Eric P. Xing.

 Bio: Junming Yin is currently a Lane Fellow in the Lane Center for Computational Biology at Carnegie Mellon University. He received his Bachelor's degree in CS from Fudan University, and his M.A. degree in Statistics and Ph.D. degree in EECS from UC Berkeley. His research interests lie in the area of statistical machine learning, with an emphasis on the development of scalable modeling and efficient algorithms, and their applications to large-scale and high-dimensional complex data.

 Management and Analysis of Social Media Data: A Case Study Based on Sina Weibo

 Weining Qian

 Professor

 Software Engineering Institute, East China Normal University

  Abstract: There are structured data, text data, and network-structured data in social media. Managing and mining social media data efficiently is a key issue for collective behavior sensing and analysis, personalized recommendation, opinion mining and mood analysis. Based on a dataset collected from Sina Weibo, we studied the problems of modeling of information diffusion, dynamics of events’ popularity, and spamming detection. A benchmark for social media data analytics is designed. The challenges to of management and real-time visual analysis of social media data is also discussed in this talk.

 BioWeining QIAN is currently a professor in Computer Science at East China Normal University, Shanghai, P.R. China. He received his BS, MS and PhD in Computer Science from Fudan University in 1998, 2001 and 2004, respectively. He is serving as Committee Member of CCF-TCDB, and Editorial Board Member of International Journal of Semantic and Infrastructure Services. He served as the Co-Chair of WISE 2012 Challenge. His research interests include Web data management, and management and mining of massive data sets.

 Big Data in Healthcare

 Feng Cao

 Senior Research Staff in IBM Research China

  Abstract: The healthcare system is undergoing a seismic change, in which providers are moving from a reactive model—one that is focused on treatment—to a model focused on prevention, in which outcomes are expected and rewarded. Healthcare data is diverse (structured/un-structured), resides in many places (EMR, call details, sensor data, social media), and is growing in volume (pedabytes, exabytes). It is difficult to access, highly sensitive and restricted due to privacy laws. In this talk, we will introduce big Data challenges and opportunities in healthcare, and share with you some use cases of Big Data in healthcare. We will also talk Watson's DeepQA technique and system architecture to support high performance computing and its potential applications in healthcare domain.

 Bio: Dr. Feng Cao, Senior Research Staff Member, is the manager of Business Intelligence & Semantic Search in IBM Research - Chinaco-chair of patent review board and now the leader of next grand challenge project. He and his team focus on WASTON and its innovation applications. Their research results have been successfully applied in China main land, Taiwan, KoreaUS and etc, selected into the final list of Wall Street Asian Best Innovation Award 2010, IBM Century Celebration Global Reference Cases 2011 shown in global 15 countries simultaneously, and demonstrated in CeBit (the No.1 Business Technology Exhibition, Global Conferences and Networking events) in Germany, Asian HIMSS, MIE in Europe and IOD in US. He got over 20 rank one papers. The citation of one paper is over 240 times. He got multiple patents filed in US and applied in products. He is the author of Smarter City: Vision and Practice, the chair of IBM Medical Informatics Day 2012 and gave keynotes in IPDPS 2012.

 In-memory Data Management and Analytics

 Minqi Zhou

 Associate Professor

 Software Engineering Institute, East China Normal University

 Abstract: As the requirements on data processing in most of the applications (e.g., business intelligence, public opinion analysis and etc.) have shifted from batch processing to the human real-time interactive analysis, as well as the evolution of the architecture of the computers (e.g., high capacity of memory, multi-processor, many cores, high speed network and etc.), the data in large volume become capable to be resided in the memory and processed in human real-time. Knowing the history of the in-memory data management system as well as the pros and cons of the current massive data processing stack (e.g., Hadoop), do will help us to improve our new in-memory data management system. Considering the volatility of the in-memory data and the crash of the cluster nodes, it is the first issue to achieve fault-tolerance, high availability, recoverability of the in-memory data. Furthermore, the massive computational resource across the cluster (e.g., multi-computer, multi-processor, many-core, hyper-threading and etc.) make the communication wall become a real problem, and the multi-granularity parallel processing and schedule is hopeful to solve this problem, and eventually make the human real-time interactive analysis real.

 Bio: Minqi Zhou, Associate Professor at East China Normal University. He received his bachelor and master degrees from Nanjing University Sciences & Technology for Thermal Engineering and Control Theory, in year 2003 and 2005 respectively, and Ph.D. degree from Fudan University in Computer Science in 2009. In 2003 and 2007, he visited The University of Queensland, Australia and the SAP China Lab respectively. He has acted as the PC member for several conferences, including ICDE’2011, IUCS‘2011, WISA’2011 and etc. His main research interests include: data management in the distributed systems, data-intensive computing, in-memory computing and computational advertising.