RSS
热门关键字:  数据挖掘  人工智能  数据仓库  搜索引擎  数据挖掘导论

Google data archives not compatible with Crystal Reports … oh noz!

来源: 作者: 时间:2008-01-19 点击:

MapReduce [ the algorithm that google uses for massively parallel computation) … is: 数据挖掘研究院

1. A giant step backward in the programming paradigm for large-scale data intensive applications 数据挖掘研究院

2. A sub-optimal implementation, in that it uses brute force instead of indexing

数据挖掘研究院

3. Not novel at all — it represents a specific implementation of well known techniques developed nearly 25 years ago 数据挖掘研究院

4. Missing most of the features that are routinely included in current DBMS 数据挖掘实验室

5. Incompatible with all of the tools DBMS users have come to depend on
数据挖掘实验室

In related news, cars that run off of nothing but sunlight and air are 数据挖掘研究院

1. A giant step backward

数据挖掘实验室

2. A sub-optimal implementation, in that they don’t use gasoline 数据挖掘研究院

3. Not novel at all — we had solar powered and compressed air powered cars 25 years ago

数据挖掘实验室

4. Missing most of the features that are routinely included in current gasoline powered cars

数据挖掘研究院

5. Incompatible with all of the tools that gasoline-engine mechanics use

数据挖掘研究院

Holy !@#$, these guys are dense.
数据挖掘研究院

The database community has learned the following three lessons from the 40 years that have unfolded since IBM first released IMS in 1968.

数据挖掘研究院

* Schemas are good.

* Separation of the schema from the application is good. 数据挖掘研究院

* High-level access languages are good. 数据挖掘实验室

MapReduce has learned none of these lessons and represents a throw back to the 1960s, before modern DBMSs were invented.

Look, I get their points. I like relational databases myself (or, rather, SQL-style databases, which a true database theorist will point out are not “true” relational databases). 数据挖掘研究院

…but arguing with success is kind of hard. I assert that it is objective truth that no relational database can possibly do the things that MapReduce does. 数据挖掘研究院

To say that MapReduce stinks, because it “learned none of these lessons” is bunk. The Google guys are not dimwits. They clearly made a decision to trade off some features for others.

The feature of winning the search engine wars and making people into billionaires is a pretty good one, IMO. 数据挖掘研究院

 

数据挖掘实验室

The MapReduce community seems to feel that they have discovered an entirely new paradigm for processing large data sets. In actuality, the techniques employed by MapReduce are more than 20 years old.
数据挖掘研究院

Huh? They do? 数据挖掘实验室

MapReduce obviously has tons of ancestors, including vector processing. 数据挖掘研究院

Who says that it’s a new concept?

  数据挖掘研究院

4. MapReduce is missing features 数据挖掘研究院

All of the following features are routinely provided by modern DBMSs, and all are missing from MapReduce:

* Bulk loader — to transform input data in files into a desired format and load it into a DBMS 数据挖掘研究院

* Indexing — as noted above

* Updates — to change the data in the data base 数据挖掘研究院

* Transactions — to support parallel update and recovery from failures during update 数据挖掘研究院

* Integrity constraints — to help keep garbage out of the data base

* Referential integrity — again, to help keep garbage out of the data base

* Views — so the schema can change without having to rewrite the application program 数据挖掘研究院

In summary, MapReduce provides only a sliver of the functionality found in modern DBMSs.

Oh noz! 数据挖掘研究院

A clever five year old (?) tool is less polished and complete that some dusty hidebound, thirty year old alternative concept. 数据挖掘实验室

In related news, few of the kids getting admitted to MIT and CalTech this year have 401(k)s as well funded as typical fifty year old engineers. 数据挖掘研究院

  数据挖掘研究院

5. MapReduce is incompatible with the DBMS tools

A modern SQL DBMS has available all of the following classes of tools:

* Report writers (e.g., Crystal reports) to prepare reports for human visualization

* Business intelligence tools (e.g., Business Objects or Cognos) to enable ad-hoc querying of large data warehouses

* Data mining tools (e.g., Oracle Data Mining or IBM DB2 Intelligent Miner) to allow a user to discover structure in large data sets 数据挖掘研究院

* Replication tools (e.g., Golden Gate) to allow a user to replicate data from on DBMS to another

数据挖掘实验室

* Database design tools (e.g., Embarcadero) to assist the user in constructing a data base.
数据挖掘研究院

True. 数据挖掘实验室

On the other hand, Modern SQL DBMS are incompatible with Google, Google Maps, Orkut, etc. 数据挖掘研究院

I’m sure that the Google execs are ** so ** upset that Crystal reports doesn’t run on their data.

An “interesting” article by David J. DeWitt and Michael Stonebraker.

数据挖掘研究院

If you wondered what getting put out to pasture by a bounch of young turks sounds like, this is it.

最新评论共有 0 位网友发表了评论
发表评论
评论内容:不能超过250字,需审核,请自觉遵守互联网相关政策法规。
匿名?