IBM Supercomputer, Watson
Watson uses two thousand eight hundred and eighty processor cores to perform its calculations. This is made possible by the use of ninety IBM Power 750 Express servers which contain eight quad core processors. So each server contains thirty two processor cores and there are ninety servers to combine to make the two thousand eight hundred and eighty processor cores that make up Watson’s brain .
So what would you define Watson to be? A midrange computer? A mainframe computer? Or a Supercomputer? I would define Watson to be a Supercomputer since it was designed for one purpose, can do more than just rapid mathematical computations. Watson has thousands of central processing units. It can perform computation-intensive applications and has a massive amount of online and offline storage . The ability to coordinate all of these processors into one functioning logarithmic unit required a group of engineers from IBM to develop a specialized kernel-based virtual machine implementation with the ability to process eighty Tera-flops per seconds . The software that allowed all of this to occur is called Apache Hadoop. Hadoop is an open source framework software that is used to organize and manage grid computing environments. Since the theoretical limit of processors with current technology is set at a central processing unit (CPU) clock speed of three giga-hertz, a software model to enhance parallel processing for supercomputers had to be developed. With the use of Hadoop the programmers at IBM were able to more easily write applications for Watson that benefitted and took advantage of parallel processing to increase the speed at which problems could be solved and questions could be answered. The main reason why this makes things faster is the fact that one question can be researched in multiple paths at one time using parallel processing paths .
This large amount of processors gives Watson the ability to answer a question that is designed to be answered by the specific programming that has been entered in three seconds, if you doubled the amount of processors you could but the time to within a second and a half, tripling the number of processors could make it possible to answer a question in under a second. Would this be theoretically necessary? Most likely not, but it is a demonstration of what the IBM Watson Supercomputer is capable of. The Watson Supercomputer utilized ninety IBM Power 750 Express Servers that cost around $35,000 each. This meant that Watson cost over $3.15 million before the data storage, I/O networking and installation was factored in. Is this possible for many large corporations, hospitals and research institutions? Most likely yes, but what if smaller companies or small clinics and hospitals wanted to take advantage of this technology? Studies showed that with the use of the programming that runs the Watson Supercomputer you could use as few as one IBM Power 750 Express Servers and only have to wait longer for the answer that you would receive. For the use of one server is could take up to 2 hours, but with the use of nine servers you could have an answer in 30 seconds. This provides the scalability to make the Watson Supercomputer concept a viable alternative to many companies problem solving needs .
Watson was built with sixteen terabytes of random access memory (RAM). This RAM is spread throughout the ninety servers that compose Watson, this means that each server was equipped with around 18 gigabytes of RAM . But only specialized software developed by IBM could make it possible for all of this RAM spread across ninety different servers work together. Again IBM the software that allowed all of this to occur is called Apache Hadoop. Hadoop is an open source framework software that is used to organize and manage grid computing environments. Hadoop was designed with an integrated distributed file system that allowed all operating data to be loaded into memory . The use of this massive amount of RAM gives Watson the ability to have the data required to ask questions at his/her fingertips. If the computer is pulling the data from memory or RAM the access time to find and transfer the data is measured in micro seconds or almost millionths of a second. If the computer is required to pull data from a hard drive or even a network drive the access time would be measured in milliseconds for a hard drive or thousandths of a second. The access time for a network drive could be measured in the seconds according to the amount of data being requested.
If you look at the data presented, even having access to data in microseconds can allow a computational process to be completed within a fraction of the time required . This is what allowed Watson to demolish the two greatest Jeopardy champions ever. Where does Watson store his/her data that he/she loads upon boot up? It is designed with a data storage solution that is a modified IBM SONAS cluster with a total of 21.6TB of raw capacity. Upon boot up, Watson loads all of the data stored into the 16 gigabytes of RAM that it has. This data for the Jeopardy consisted of the Wikipedia database (It was a valid source for IBM), millions of books, song lyrics, and other writings. Reports are that all of this data totaled only one terabyte, which they (IBM) claim you could fit on a universal serial bus (USB) drive that you could buy at your local electronics store . This data is what Watson was programmed to load for this particular application, the commercial uses for Watson are being pushed for the financial industry and hospitals . So would doubling the amount of RAM that Watson has allow it to operate even more efficiently? This is going to be dependent on the application that the system is being used for.
The total amount of required data can all be loaded into memory is the only question that you need to be able to answer yes to each and every time. This is because any outside requests for data slows Watson down and makes him/her less efficient . IBM Watson used Juniper switches running at ten gigabits per second Ethernet (10GbE) speeds. During the Jeopardy experiment Watson was not connected to the internet. Instead Watson used the Ethernet links for the IBM POWER7 servers to talk to each other, and to access files over the Network File System (NFS) protocol to the internal customized SONAS storage I/O nodes . What is a Juniper switch? It consists of one IBM J16E (EX8216) switch populated with fifteen ten gigabit per second line cards and one gigabit per second line card, as well as three IBM J48E (EX4200) switches installed into a virtual chassis configuration. These switches are all running Juniper’s Junos network operating system. This operating systems enables up to ten IBM J48E switches to be configured in a single virtual chassis configuration. A virtual chassis is a flexible, high scaling switch solution that allows several switches to form one unit as if it were within a single chassis. The switches operate together over a 128 gigabit per second backplane with a scalability of up to 480 access ports, 2.4 terabit per second fabric, transferring over 2 billion packets-per-second (pps) to as many as 6,000 servers in a single domain, If the CPU cores of Watson are the calculating part of the brain with the RAM being its memory, then the Juniper switch would be its nervous system getting the data from one place to another be measured in nanoseconds or billionths of a second .
What is a SONAS? Scale-Out Network-Attached Storage (SONAS) is a clustered NAS system that has separate interface and storage nodes. SONAS uses an IP network with the NFS, Common Internet File System (CIFS) and Secure Copy Protocol (SCP). The SONAS clusters are designed to be built and marketed into nodes that range from twenty-seven to four hundred and eighty terabytes using either Serial Advance Technology Attachment (SATA) or Serial Attached Small Computer Interface (SAS) drives .
The main application that tackles taking a human question, evaluating the meaning and producing an answer for Watson is DeepQA. This is a large scale problem solver that utilizes parallel processing to look for answers in multiple paths. To produce a solution DeepQA uses an analysis, hypothesis, filtering, and scoring technique to eliminate false data and produce the most relevant answer to a question. The things that DeepQA cannot perform are listed below: 1. Questions that are not self-explanatory, they require explanation on how to solve them. 2. Solving questions that contain a video or audio clue to solve it. 3. If a question contains multiple clues that would have to be solved to answer the initial question. 4. When the answer is a combined interpretation of two separate clues or if the answers rhyme and this analysis would be required to solve the question. 5. Questions that have multiple answers and one answer is more correct than the others. The inability to perform these functions shows that the artificial intelligence that has been developed by IBM for the Watson Supercomputer is not able to function the same as a human being as the function must be programmed for the specific environment that Watson is going to operate in and it cannot interrupt questions that are not forthright, contain audio or video, require interpretation or thinking between the lines per say . Some of the things that Watson can do are built on the foundations of content acquisition, question analysis, hypothesis generation, hypothesis evidence and scoring, and ranking and confidence estimation. Without all of these components then the ability for Watson to do what he/she does would be impossible .
Content acquisition allows Watson to identify sources and scope nuggets to identify more information. Question analysis attempts to understand what is being asked and direct the other components to perform an initial analyses to help determine how to process the question. Hypothesis generation produces candidate answers by searching the results of the question analysis and chopping out nuggets of data that are then reanalyzed by the process to validate it as an answer and produce a score which is used in the evidence and ranking process. The final ranking and confidence estimation is performed by the algorithmic formula that basis the score on previously learned possibility’s for a suspect answer being correct or incorrect . The Watson supercomputer runs on SUSE Linux Enterprise Server 11 O / S that provides advanced memory management, multiple processor type support, and unmatched performance on systems with multicore processors, native Portable Operating System Interface (POSIX) Thread Library (NPTL), advanced multi-pathing and I/O capabilities. Tested benchmarks and testing have shown that SUSE Linux out performs any other operating system on the Power7 Server built by IBM . Other items that make SUSE such a valuable choice for Watson is the fact that this is a commercial venture that needs to interoperate with other operating systems often used in today’s data centers. SUSE is able to operate with both Windows and UNIX seamlessly for a mixed infrastructure. The ability to operate many different applications would be critical to allow Watson to become a commercial success. With the installation of SUSE Linux Enterprise Server you have the ability to run mission-critical databases, e-commerce applications and ERP systems, e-mail, file and print servers and web servers. This would make Watson more than an expensive question and answer machine .
What is Unstructured Information Management applications (UIMA)? These are software systems that look at large, unstructured volumes of information to interpret the data that is relevant to the end user. This is important in the Watson supercomputer as it breaks down the parts of the question that is submitted to the DeepQA processing application. This is performed by breaking the data into metadata components linked to extensible markup language descriptor files .
Burd, S. D. (2011). Systems Architecture. Mason: Cengage Learning. Ferrucci, D., Brown, E., Chu-Carroll, J., Fan, J., Gondek, D., Kalyanpur, A. A., . . . Welty, C. (2010). Building Watson:. AI MAGAZINE, 59-79. Get more done with the most versatile Linux platform. (2013, September 07). Retrieved from SUSE.com: https://www.suse.com/products/server/features/server-os.html Heaton, J. (2011, March 12). The Free and Open Software Behind IBM’s Jeopardy Champion Watson. Retrieved from Heaton Research: http://www.heatonresearch.com/content/free-and-open-software-behind-ibm%E2%80%99s-jeopardy-champion-watson IBM Watson runs SUSE Linux Enterprise Server. (2013, September 09). Retrieved from SUSE.com: https://www.suse.com/promo/ibm-watson.html King, L. (2011, February 11). LizKing. Retrieved from Juniper.net: http://forums.juniper.net/t5/reTHINKing-the-Network/What-is-The-IBM-Watson-Supercomputer/ba-p/76364 Mellor, C. (2010, February 10). SONAS offering from IBM covers much ground. Retrieved from TheRegister.co.uk: http://www.theregister.co.uk/2010/02/10/ibm_sonas/ Merian, L. (2011, February 21). Can anyone afford an IBM Watson supercomputer? (Yes). Retrieved from ComputerWorld.com: http://www.computerworld.com/s/article/9210381/Can_anyone_afford_an_IBM_Watson_supercomputer_Yes_?taxonomyId=67&pageNumber=2 Pearson, T. (2011, February 18). Inside System Storage. Retrieved from IBM.com: https://www.ibm.com/developerworks/community/blogs/InsideSystemStorage/entry/ibm_watson_how_to_build_your_own_watson_jr_in_your_basement7?lang=en Webster, J. (2011, February 15). What IBM’s Watson says to storage systems developers. Retrieved from CNET.com: http://news.cnet.com/8301-21546_3-20032014-10253464.html Welcome to the Apache UIMA project. (2013, September 07). Retrieved from Apache.org: http://uima.apache.org/