Blog

There’s a good chance that you’re not using Hadoop right now. Hopefully you’ve moved past your fear of the unknown and you’re ready to embrace something new. If not, you should go read some Marmaduke and come back when you’re ready.

We’re a Windows Shop

That excuse doesn’t fly anymore. Earlier this year, Hortonworks released version 1.1 of their Hortonworks Development Platform for Windows. Microsoft have also released HDInsight (Hadoop as a service) in Azure as well as a developer preview of a local version of HDInsight.

If Windows was the reason you weren’t using Hadoop, you need to come up with a new reason.

I Don’t Know Java

Fair enough – you don’t know Java. Hadoop is written in Java and MapReduce jobs, the core of Hadoop, are typically written in Java as well. I don’t know C++, but I can still use SQL Server. What’s up with that?

There are a number of high level abstractions on top of Hadoop that make it easier to work with big data – Hive and Pig provide data manipulation languages. When you need them, MapReduce jobs can be written in many languages and frameworks. Besides, Java looks enough like C# that you should be able to pick up the broader points in a weekend.

Heck, I’ve even put together an introduction to Hadoop to help get you started.

We Don’t Have Petabytes of Data

Neither did Lands of America, but they were able to take advantage of Hadoop to solve their data processing problems and get additional insight into their business. Although the data volume was under 1 terabyte, the processing problems were very real, but they were solved by scaling out the processing across a Hadoop cluster.

What’s the Real Reason?

So what’s the real reason? Sound off in the comments and let me know why you’re not using Hadoop.

↑ Back to top
  1. SQL server seems to work for what we’re doing. Switching to something else would be very expensive (in terms of time). How would I justify spending that much time when we aren’t having any problems?

    Not that I have anything against Hadoop, but I am not using SAP, Oracle, or Azure for similar reasons.

    • Jeff – that’s a great question. Have you asked the business data users if there’s any questions they’d like to ask the data?

      When I was a DBA, I used to think everything was perfect – until I asked the users a question. “If our sales system could tell you anything about our clients, what would you want to know?” At the time, I thought I was going to go into BI, so I wanted to go create reports that would answer those kinds of questions. That moment, when they started freeing their mind and thinking out of the box, suddenly they saw the data as an asset that wasn’t being used enough.

  2. In its current form Hadoop takes a lot of looking after.

    Impala and Parquet look interesting but we’d play cautious and let someone else prove the technology. Our job is to provide business benefit and trail blazing tech is great for the techies CV but not necessarily for the business.

    Hadoop is basically a distributed file system with the map reduce framework. The stuff that makes it really useful to the masses is the plethora of other stuff around it. Pig, Hive, Beeswax, Hue, Avro, SQOOP etc, etc, etc. There’s a lot of it and its a wee bit fragile at present.

    Its not really something that likes a lot of users. You could let people used to writing SQL loose on HIVE but they’d soon get cheesed off with the performance, the lack thereof.

    If I had task to scrape content from unstructured sources that SQL Server couldn’t handle such as PDFs and image files then I think that is a valid use case for the technology. We did consider using it to store web and infrastructure logs but there are specific tools designed with this use case in mind.

    • I hear what you’re saying – a reasonable part of working with Hadoop is about finding the right tools for the job and setting user expectations.

      As IT professionals, isn’t it our job to find new ways to derive business value? Sometimes that’s through trail blazing tech and sometimes it’s through tried and true methods. Waiting does mean that you know you’re working with a rock solid, dull, solution. Waiting also means that someone else probably didn’t wait and is ready to eat your breakfast.

      Have you run into specific problems keeping Hadoop up and running or are have you just heard this through the “ops via HackerNews” grapevine?

      • I found out the hard way that Hadoop is quite labour intensive and initially thought it was just me stumbling around Linux. I later spoke to people in my network who have been using it for some time (2-3 years) and they confirmed that it can be a bit of a pig.

        The one that floored be was trying to update one particular part of the ecosystem to get SQOOP working properly i.e. for both import/export to/from SQL Server. Jeez that blew everything to bits!

        As far as delivering business value I’m finding more and more that its not about the tech. Its about delivering a convincing argument for a particular course of action. Sometimes that course of action is laughably simple but its benefits are massive. The chance to throw in lots of tech is obviously good for my CV and my inner geek but for my company…..not so sure?

  3. Pingback: (SFTW) SQL Server Links 16/08/13 • John Sansom

  4. Highlights in our training:

    * terribly full course material with real time eventualities.

    * we tend to area unit providing category with extremely qualified trainer.

    * we are going to give category and demo session at student versatile timings.

    * In training case studies and real time eventualities lined.

    * we are going to provide 24*7 technical supports.

    * every topic coverage with real time solutions.

    * we tend to area unit providing traditional track, weekend, means categories.

    * we are going to provide each recorded session for play later.

    * 123 Trainings Hadoop online training

    * we tend to area unit giving placement support by multiple consultancies in Asian country, USA, Australia, and UK etc.

    * we tend to area unit providing certification oriented trainings with 100 percent pass guarantee.Hadoop online training

    * we are going to provide full support whereas attending the interviews and speak to Maine any time when completion of the course.

    BIGDATA: daily we have a tendency to produce two.5 peta bytes of data – therefore ninetieth of the information within the world wide nowadays has been created within the last two years alone. This a lot of data comes from everywhere: like sensors wont to gather climate data, a post to social media sites and digital footage and videos and get dealing records, and cell phoneGPS signals to call many. This data is BIGDATA.Hadoop online training

    HADOOP: may be a biggest frame work to method petabyets of data during a quicker and economical manner. Hadoop supports each structured and unstructured data. Hadoop online training

    Whereas data Warehouse and presently fashionable metal Systems supports solely structured data. That too dig data from immense amount of information is basically causes high latency within the ancient data warehouse.

    HDFS: may be a distributed filing system in Hadoop Frame work.

    The HDFS design allows organizations to store bulk volumes of structured and unstructured data.

    Example: for unstructured data is, Email messages, email server logs, face book messages, blog information log, images, videos, audios etc.

    Map scale back…> Map Reduce may be a framework, to distribute the add to tasks across multiple nodes…., and allows the system to method all tasks parallel and collect leads to smart speed.

    PIG: may be a dataflow language in Hadoop surroundings and it writes hidden Map scale back code once the pig decreased code compiled. (Ex: rather than writing a hundred lines of JAVA Map scale back Code, you’ll win it by simplified script of PIG in ten Lines)

    HIVE: is data Warehouse in Hadoop frame work

    HIVEQL (Hive question Language) is employed, almost like Sql of RDBMS however slight variations area unit there.

    HBASE: Is columnar databases is Hadoop Frame Work

    SQOOP… Used for information connections, same vogue we have a tendency to export data from Hadoop to databases additionally.

    NO SQL: may be a stunning thought, to figure with bulk data aggregations. Bcoz, in NoSql we have a tendency to store rows as columns.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

css.php