The Algorithm Design Manual
Second Edition
Steven S. Skiena
The Algorithm Design Manual
Second Edition
123
Steven S. Skiena Department of Computer Science State University of New York at Stony Brook New York, USA skiena@cs.sunysb.edu
ISBN: 978-1-84800-069-8 e-ISBN: 978-1-84800-070-4 DOI: 10.1007/978-1-84800-070-4
British Library Cataloguing in Publication Data A catalogue record for this book is available from the British Library
Library of Congress Control Number: 2008931136
c⃝ Springer-Verlag London Limited 2008, Corrected printing 2012 Apart from any fair dealing for the purposes of research or private study, or criticism or review, as permitted under the Copyright, Designs and Patents Act 1988, this publication may only be reproduced, stored or trans- mitted, in any form or by any means, with the prior permission in writing of the publishers, or in the case of reprographic reproduction in accordance with the terms of licenses issued by the Copyright Licensing Agency. Enquiries concerning reproduction outside those terms should be sent to the publishers. The use of registered names, trademarks, etc., in this publication does not imply, even in the absence of a specific statement, that such names are exempt from the relevant laws and regulations and therefore free for general use. The publisher makes no representation, express or implied, with regard to the accuracy of the information contained in this book and cannot accept any legal responsibility or liability for any errors or omissions that may be made.
Printed on acid-free paper
Springer Science+Business Media springer.com
Preface
Most professional programmers that I’ve encountered are not well prepared to tackle algorithm design problems. This is a pity, because the techniques of algorithm design form one of the core practical technologies of computer science. Designing correct, efficient, and implementable algorithms for real-world problems requires access to two distinct bodies of knowledge:
• Techniques – Good algorithm designers understand several fundamental al- gorithm design techniques, including data structures, dynamic programming, depth-first search, backtracking, and heuristics. Perhaps the single most im- portant design technique is modeling, the art of abstracting a messy real-world application into a clean problem suitable for algorithmic attack.
• Resources – Good algorithm designers stand on the shoulders of giants. Rather than laboring from scratch to produce a new algorithm for every task, they can figure out what is known about a particular problem. Rather than re-implementing popular algorithms from scratch, they seek existing imple- mentations to serve as a starting point. They are familiar with many classic algorithmic problems, which provide sufficient source material to model most any application.
This book is intended as a manual on algorithm design, providing access to combinatorial algorithm technology for both students and computer professionals. It is divided into two parts: Techniques and Resources. The former is a general guide to techniques for the design and analysis of computer algorithms. The Re- sources section is intended for browsing and reference, and comprises the catalog of algorithmic resources, implementations, and an extensive bibliography.
vi PREFACE
To the Reader
I have been gratified by the warm reception the first edition of The Algorithm De- sign Manual has received since its initial publication in 1997. It has been recognized as a unique guide to using algorithmic techniques to solve problems that often arise in practice. But much has changed in the world since the The Algorithm Design Manual was first published over ten years ago. Indeed, if we date the origins of modern algorithm design and analysis to about 1970, then roughly 30% of modern algorithmic history has happened since the first coming of The Algorithm Design Manual. Three aspects of The Algorithm Design Manual have been particularly beloved: (1) the catalog of algorithmic problems, (2) the war stories, and (3) the electronic component of the book. These features have been preserved and strengthened in this edition:
• The Catalog of Algorithmic Problems – Since finding out what is known about an algorithmic problem can be a difficult task, I provide a catalog of the 75 most important problems arising in practice. By browsing through this catalog, the student or practitioner can quickly identify what their problem is called, what is known about it, and how they should proceed to solve it. To aid in problem identification, we include a pair of “before” and “after” pictures for each problem, illustrating the required input and output specifications. One perceptive reviewer called my book “The Hitchhiker’s Guide to Algorithms” on the strength of this catalog.
The catalog is the most important part of this book. To update the catalog for this edition, I have solicited feedback from the world’s leading experts on each associated problem. Particular attention has been paid to updating the discussion of available software implementations for each problem.
• War Stories – In practice, algorithm problems do not arise at the beginning of a large project. Rather, they typically arise as subproblems when it becomes clear that the programmer does not know how to proceed or that the current solution is inadequate.
To provide a better perspective on how algorithm problems arise in the real world, we include a collection of “war stories,” or tales from our experience with real problems. The moral of these stories is that algorithm design and analysis is not just theory, but an important tool to be pulled out and used as needed.
This edition retains all the original war stories (with updates as appropriate) plus additional new war stories covering external sorting, graph algorithms, simulated annealing, and other topics.
• Electronic Component – Since the practical person is usually looking for a program more than an algorithm, we provide pointers to solid implementa- tions whenever they are available. We have collected these implementations
PREFACE vii
at one central website site (http://www.cs.sunysb.edu/∼algorith) for easy re- trieval. We have been the number one “Algorithm” site on Google pretty much since the initial publication of the book.
Further, we provide recommendations to make it easier to identify the correct code for the job. With these implementations available, the critical issue in algorithm design becomes properly modeling your application, more so than becoming intimate with the details of the actual algorithm. This focus permeates the entire book.
Equally important is what we do not do in this book. We do not stress the mathematical analysis of algorithms, leaving most of the analysis as informal ar- guments. You will not find a single theorem anywhere in this book. When more details are needed, the reader should study the cited programs or references. The goal of this manual is to get you going in the right direction as quickly as possible.
To the Instructor
This book covers enough material for a standard Introduction to Algorithms course. We assume the reader has completed the equivalent of a second programming course, typically titled Data Structures or Computer Science II. A full set of lecture slides for teaching this course is available online at http://www.algorist.com . Further, I make available online audio and video lectures using these slides to teach a full-semester algorithm course. Let me help teach your course, by the magic of the Internet! This book stresses design over analysis. It is suitable for both traditional lecture courses and the new “active learning” method, where the professor does not lecture but instead guides student groups to solve real problems. The “war stories” provide an appropriate introduction to the active learning method. I have made several pedagogical improvements throughout the book. Textbook- oriented features include:
• More Leisurely Discussion – The tutorial material in the first part of the book has been doubled over the previous edition. The pages have been devoted to more thorough and careful exposition of fundamental material, instead of adding more specialized topics.
• False Starts – Algorithms textbooks generally present important algorithms as a fait accompli, obscuring the ideas involved in designing them and the subtle reasons why other approaches fail. The war stories illustrate such de- velopment on certain applied problems, but I have expanded such coverage into classical algorithm design material as well.
• Stop and Think – Here I illustrate my thought process as I solve a topic- specific homework problem—false starts and all. I have interspersed such
viii PREFACE
problem blocks throughout the text to increase the problem-solving activity of my readers. Answers appear immediately following each problem.
• More and Improved Homework Problems – This edition of The Algorithm Design Manual has twice as many homework exercises as the previous one. Exercises that proved confusing or ambiguous have been improved or re- placed. Degree of difficulty ratings (from 1 to 10) have been assigned to all problems.
• Self-Motivating Exam Design – In my algorithms courses, I promise the stu- dents that all midterm and final exam questions will be taken directly from homework problems in this book. This provides a “student-motivated exam,” so students know exactly how to study to do well on the exam. I have carefully picked the quantity, variety, and difficulty of homework exercises to make this work; ensuring there are neither too few or too many candidate problems.
• Take-Home Lessons – Highlighted “take-home” lesson boxes scattered throughout the text emphasize the big-picture concepts to be gained from the chapter.
• Links to Programming Challenge Problems – Each chapter’s exercises will contain links to 3-5 relevant “Programming Challenge” problems from http://www.programming-challenges.com. These can be used to add a pro- gramming component to paper-and-pencil algorithms courses.
• More Code, Less Pseudo-code – More algorithms in this book appear as code (written in C) instead of pseudo-code. I believe the concreteness and relia- bility of actual tested implementations provides a big win over less formal presentations for simple algorithms. Full implementations are available for study at http://www.algorist.com .
• Chapter Notes – Each tutorial chapter concludes with a brief notes section, pointing readers to primary sources and additional references.
Acknowledgments
Updating a book dedication after ten years focuses attention on the effects of time. Since the first edition, Renee has become my wife and then the mother of our two children, Bonnie and Abby. My father has left this world, but Mom and my brothers Len and Rob remain a vital presence in my life. I dedicate this book to my family, new and old, here and departed. I would like to thank several people for their concrete contributions to this new edition. Andrew Gaun and Betson Thomas helped in many ways, particularly
dealing with a variety of manuscript preparation issues. David Gries offered valu- able feedback well beyond the call of duty. Himanshu Gupta and Bin Tang bravely
in building the infrastructure for the new http://www.cs.sunysb.edu/∼algorith and
PREFACE ix
taught courses using a manuscript version of this edition. Thanks also to my Springer-Verlag editors, Wayne Wheeler and Allan Wylde. A select group of algorithmic sages reviewed sections of the Hitchhiker’s guide, sharing their wisdom and alerting me to new developments. Thanks to:
Ami Amir, Herve Bronnimann, Bernard Chazelle, Chris Chu, Scott Cotton, Yefim Dinitz, Komei Fukuda, Michael Goodrich, Lenny Heath, Cihat Imamoglu, Tao Jiang, David Karger, Giuseppe Liotta, Albert Mao, Silvano Martello, Catherine McGeoch, Kurt Mehlhorn, Scott A. Mitchell, Naceur Meskini, Gene Myers, Gonzalo Navarro, Stephen North, Joe O’Rourke, Mike Paterson, Theo Pavlidis, Seth Pettie, Michel Pocchiola, Bart Preneel, Tomasz Radzik, Edward Reingold, Frank Ruskey, Peter Sanders, Joao Setubal, Jonathan Shewchuk, Robert Skeel, Jens Stoye, Torsten Suel, Bruce Watson, and Uri Zwick.
Several exercises were originated by colleagues or inspired by other texts. Re- constructing the original sources years later can be challenging, but credits for each problem (to the best of my recollection) appear on the website. It would be rude not to thank important contributors to the original edition. Ricky Bradley and Dario Vlah built up the substantial infrastructure required for the WWW site in a logical and extensible manner. Zhong Li drew most of the catalog figures using xfig. Richard Crandall, Ron Danielson, Takis Metaxas, Dave Miller, Giri Narasimhan, and Joe Zachary all reviewed preliminary versions of the first edition; their thoughtful feedback helped to shape what you see here. Much of what I know about algorithms I learned along with my graduate students. Several of them (Yaw-Ling Lin, Sundaram Gopalakrishnan, Ting Chen, Francine Evans, Harald Rau, Ricky Bradley, and Dimitris Margaritis) are the real heroes of the war stories related within. My Stony Brook friends and algorithm colleagues Estie Arkin, Michael Bender, Jie Gao, and Joe Mitchell have always been a pleasure to work and be with. Finally, thanks to Michael Brochstein and the rest of the city contingent for revealing a proper life well beyond Stony Brook.
Caveat
It is traditional for the author to magnanimously accept the blame for whatever deficiencies remain. I don’t. Any errors, deficiencies, or problems in this book are somebody else’s fault, but I would appreciate knowing about them so as to deter- mine who is to blame.
Steven S. Skiena Department of Computer Science Stony Brook University Stony Brook, NY 11794-4400 http://www.cs.sunysb.edu/∼skiena April 2008
Contents
I Practical Algorithm Design 1
1 Introduction to Algorithm Design 3
1.1 Robot Tour Optimization . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Selecting the Right Jobs . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Reasoning about Correctness . . . . . . . . . . . . . . . . . . . . 11
1.4 Modeling the Problem . . . . . . . . . . . . . . . . . . . . . . . . 19
1.5 About the War Stories . . . . . . . . . . . . . . . . . . . . . . . . 22
1.6 War Story: Psychic Modeling . . . . . . . . . . . . . . . . . . . . 23
1.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Algorithm Analysis 31
2.1 The RAM Model of Computation . . . . . . . . . . . . . . . . . . 31
2.2 The Big Oh Notation . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.3 Growth Rates and Dominance Relations . . . . . . . . . . . . . . 37
2.4 Working with the Big Oh . . . . . . . . . . . . . . . . . . . . . . 40
2.5 Reasoning About Efficiency . . . . . . . . . . . . . . . . . . . . . 41
2.6 Logarithms and Their Applications . . . . . . . . . . . . . . . . . 46
2.7 Properties of Logarithms . . . . . . . . . . . . . . . . . . . . . . . 50
2.8 War Story: Mystery of the Pyramids . . . . . . . . . . . . . . . . 51
2.9 Advanced Analysis (*) . . . . . . . . . . . . . . . . . . . . . . . . 54
2.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3 Data Structures 65
3.1 Contiguous vs. Linked Data Structures . . . . . . . . . . . . . . . 66
xii CONTENTS
3.2 Stacks and Queues . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.3 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72
3.4 Binary Search Trees . . . . . . . . . . . . . . . . . . . . . . . . . . 77
3.5 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 83
3.6 War Story: Stripping Triangulations . . . . . . . . . . . . . . . . 85
3.7 Hashing and Strings . . . . . . . . . . . . . . . . . . . . . . . . . 89
3.8 Specialized Data Structures . . . . . . . . . . . . . . . . . . . . . 93
3.9 War Story: String ’em Up . . . . . . . . . . . . . . . . . . . . . . 94
3.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4 Sorting and Searching 103
4.1 Applications of Sorting . . . . . . . . . . . . . . . . . . . . . . . . 104
4.2 Pragmatics of Sorting . . . . . . . . . . . . . . . . . . . . . . . . . 107
4.3 Heapsort: Fast Sorting via Data Structures . . . . . . . . . . . . 108
4.4 War Story: Give me a Ticket on an Airplane . . . . . . . . . . . 118
4.5 Mergesort: Sorting by Divide-and-Conquer . . . . . . . . . . . . 120
4.6 Quicksort: Sorting by Randomization . . . . . . . . . . . . . . . 123
4.7 Distribution Sort: Sorting via Bucketing . . . . . . . . . . . . . . 129
4.8 War Story: Skiena for the Defense . . . . . . . . . . . . . . . . . 131
4.9 Binary Search and Related Algorithms . . . . . . . . . . . . . . . 132
4.10 Divide-and-Conquer . . . . . . . . . . . . . . . . . . . . . . . . . . 135
4.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
5 Graph Traversal 145
5.1 Flavors of Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . . 146
5.2 Data Structures for Graphs . . . . . . . . . . . . . . . . . . . . . 151
5.3 War Story: I was a Victim of Moore’s Law . . . . . . . . . . . . 155
5.4 War Story: Getting the Graph . . . . . . . . . . . . . . . . . . . . 158
5.5 Traversing a Graph . . . . . . . . . . . . . . . . . . . . . . . . . . 161
5.6 Breadth-First Search . . . . . . . . . . . . . . . . . . . . . . . . . 162
5.7 Applications of Breadth-First Search . . . . . . . . . . . . . . . . 166
5.8 Depth-First Search . . . . . . . . . . . . . . . . . . . . . . . . . . 169
5.9 Applications of Depth-First Search . . . . . . . . . . . . . . . . . 172
5.10 Depth-First Search on Directed Graphs . . . . . . . . . . . . . . 178
5.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 184
6 Weighted Graph Algorithms 191
6.1 Minimum Spanning Trees . . . . . . . . . . . . . . . . . . . . . . 192
6.2 War Story: Nothing but Nets . . . . . . . . . . . . . . . . . . . . 202
6.3 Shortest Paths . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 205
6.4 War Story: Dialing for Documents . . . . . . . . . . . . . . . . . 212
6.5 Network Flows and Bipartite Matching . . . . . . . . . . . . . . 217
6.6 Design Graphs, Not Algorithms . . . . . . . . . . . . . . . . . . . 222
6.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
CONTENTS xiii
7 Combinatorial Search and Heuristic Methods 230
7.1 Backtracking . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 231
7.2 Search Pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . . 238
7.3 Sudoku . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 239
7.4 War Story: Covering Chessboards . . . . . . . . . . . . . . . . . . 244
7.5 Heuristic Search Methods . . . . . . . . . . . . . . . . . . . . . . 247
7.6 War Story: Only it is Not a Radio . . . . . . . . . . . . . . . . . 260
7.7 War Story: Annealing Arrays . . . . . . . . . . . . . . . . . . . . 263
7.8 Other Heuristic Search Methods . . . . . . . . . . . . . . . . . . 266
7.9 Parallel Algorithms . . . . . . . . . . . . . . . . . . . . . . . . . . 267
7.10 War Story: Going Nowhere Fast . . . . . . . . . . . . . . . . . . . 268
7.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 270
8 Dynamic Programming 273
8.1 Caching vs. Computation . . . . . . . . . . . . . . . . . . . . . . 274
8.2 Approximate String Matching . . . . . . . . . . . . . . . . . . . . 280
8.3 Longest Increasing Sequence . . . . . . . . . . . . . . . . . . . . . 289
8.4 War Story: Evolution of the Lobster . . . . . . . . . . . . . . . . 291
8.5 The Partition Problem . . . . . . . . . . . . . . . . . . . . . . . . 294
8.6 Parsing Context-Free Grammars . . . . . . . . . . . . . . . . . . 298
8.7 Limitations of Dynamic Programming: TSP . . . . . . . . . . . . 301
8.8 War Story: What’s Past is Prolog . . . . . . . . . . . . . . . . . . 304
8.9 War Story: Text Compression for Bar Codes . . . . . . . . . . . 307
8.10 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 310
9 Intractable Problems and Approximation Algorithms 316
9.1 Problems and Reductions . . . . . . . . . . . . . . . . . . . . . . 317
9.2 Reductions for Algorithms . . . . . . . . . . . . . . . . . . . . . . 319
9.3 Elementary Hardness Reductions . . . . . . . . . . . . . . . . . . 323
9.4 Satisfiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 328
9.5 Creative Reductions . . . . . . . . . . . . . . . . . . . . . . . . . 330
9.6 The Art of Proving Hardness . . . . . . . . . . . . . . . . . . . . 334
9.7 War Story: Hard Against the Clock . . . . . . . . . . . . . . . . . 337
9.8 War Story: And Then I Failed . . . . . . . . . . . . . . . . . . . . 339
9.9 P vs. NP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 341
9.10 Dealing with NP-complete Problems . . . . . . . . . . . . . . . . 344
9.11 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 350
10 How to Design Algorithms 356
II The Hitchhiker’s Guide to Algorithms 361
11 A Catalog of Algorithmic Problems 363
xiv CONTENTS
12 Data Structures 366
12.1 Dictionaries . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 367
12.2 Priority Queues . . . . . . . . . . . . . . . . . . . . . . . . . . . . 373
12.3 Suffix Trees and Arrays . . . . . . . . . . . . . . . . . . . . . . . . 377
12.4 Graph Data Structures . . . . . . . . . . . . . . . . . . . . . . . . 381
12.5 Set Data Structures . . . . . . . . . . . . . . . . . . . . . . . . . . 385
12.6 Kd-Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 389
13 Numerical Problems 393
13.1 Solving Linear Equations . . . . . . . . . . . . . . . . . . . . . . . 395
13.2 Bandwidth Reduction . . . . . . . . . . . . . . . . . . . . . . . . 398
13.3 Matrix Multiplication . . . . . . . . . . . . . . . . . . . . . . . . . 401
13.4 Determinants and Permanents . . . . . . . . . . . . . . . . . . . . 404
13.5 Constrained and Unconstrained Optimization . . . . . . . . . . . 407
13.6 Linear Programming . . . . . . . . . . . . . . . . . . . . . . . . . 411
13.7 Random Number Generation . . . . . . . . . . . . . . . . . . . . 415
13.8 Factoring and Primality Testing . . . . . . . . . . . . . . . . . . . 420
13.9 Arbitrary-Precision Arithmetic . . . . . . . . . . . . . . . . . . . 423
13.10 Knapsack Problem . . . . . . . . . . . . . . . . . . . . . . . . . . 427
13.11 Discrete Fourier Transform . . . . . . . . . . . . . . . . . . . . . 431
14 Combinatorial Problems 434
14.1 Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 436
14.2 Searching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 441
14.3 Median and Selection . . . . . . . . . . . . . . . . . . . . . . . . . 445
14.4 Generating Permutations . . . . . . . . . . . . . . . . . . . . . . . 448
14.5 Generating Subsets . . . . . . . . . . . . . . . . . . . . . . . . . . 452
14.6 Generating Partitions . . . . . . . . . . . . . . . . . . . . . . . . . 456
14.7 Generating Graphs . . . . . . . . . . . . . . . . . . . . . . . . . . 460
14.8 Calendrical Calculations . . . . . . . . . . . . . . . . . . . . . . . 465
14.9 Job Scheduling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 468
14.10 Satisfiability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 472
15 Graph Problems: Polynomial-Time 475
15.1 Connected Components . . . . . . . . . . . . . . . . . . . . . . . 477
15.2 Topological Sorting . . . . . . . . . . . . . . . . . . . . . . . . . . 481
15.3 Minimum Spanning Tree . . . . . . . . . . . . . . . . . . . . . . . 484
15.4 Shortest Path . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 489
15.5 Transitive Closure and Reduction . . . . . . . . . . . . . . . . . . 495
15.6 Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 498
15.7 Eulerian Cycle/Chinese Postman . . . . . . . . . . . . . . . . . . 502
15.8 Edge and Vertex Connectivity . . . . . . . . . . . . . . . . . . . . 505
15.9 Network Flow . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 509
15.10 Drawing Graphs Nicely . . . . . . . . . . . . . . . . . . . . . . . . 513
CONTENTS xv
15.11 Drawing Trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 517
15.12 Planarity Detection and Embedding . . . . . . . . . . . . . . . . 520
16 Graph Problems: Hard Problems 523
16.1 Clique . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 525
16.2 Independent Set . . . . . . . . . . . . . . . . . . . . . . . . . . . . 528
16.3 Vertex Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 530
16.4 Traveling Salesman Problem . . . . . . . . . . . . . . . . . . . . . 533
16.5 Hamiltonian Cycle . . . . . . . . . . . . . . . . . . . . . . . . . . 538
16.6 Graph Partition . . . . . . . . . . . . . . . . . . . . . . . . . . . . 541
16.7 Vertex Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . . . 544
16.8 Edge Coloring . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 548
16.9 Graph Isomorphism . . . . . . . . . . . . . . . . . . . . . . . . . . 550
16.10 Steiner Tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 555
16.11 Feedback Edge/Vertex Set . . . . . . . . . . . . . . . . . . . . . . 559
17 Computational Geometry 562
17.1 Robust Geometric Primitives . . . . . . . . . . . . . . . . . . . . 564
17.2 Convex Hull . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 568
17.3 Triangulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 572
17.4 Voronoi Diagrams . . . . . . . . . . . . . . . . . . . . . . . . . . . 576
17.5 Nearest Neighbor Search . . . . . . . . . . . . . . . . . . . . . . . 580
17.6 Range Search . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 584
17.7 Point Location . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 587
17.8 Intersection Detection . . . . . . . . . . . . . . . . . . . . . . . . 591
17.9 Bin Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 595
17.10 Medial-Axis Transform . . . . . . . . . . . . . . . . . . . . . . . . 598
17.11 Polygon Partitioning . . . . . . . . . . . . . . . . . . . . . . . . . 601
17.12 Simplifying Polygons . . . . . . . . . . . . . . . . . . . . . . . . . 604
17.13 Shape Similarity . . . . . . . . . . . . . . . . . . . . . . . . . . . . 607
17.14 Motion Planning . . . . . . . . . . . . . . . . . . . . . . . . . . . 610
17.15 Maintaining Line Arrangements . . . . . . . . . . . . . . . . . . . 614
17.16 Minkowski Sum . . . . . . . . . . . . . . . . . . . . . . . . . . . . 617
18 Set and String Problems 620
18.1 Set Cover . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 621
18.2 Set Packing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 625
18.3 String Matching . . . . . . . . . . . . . . . . . . . . . . . . . . . . 628
18.4 Approximate String Matching . . . . . . . . . . . . . . . . . . . . 631
18.5 Text Compression . . . . . . . . . . . . . . . . . . . . . . . . . . . 637
18.6 Cryptography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 641
18.7 Finite State Machine Minimization . . . . . . . . . . . . . . . . . 646
18.8 Longest Common Substring/Subsequence . . . . . . . . . . . . . 650
18.9 Shortest Common Superstring . . . . . . . . . . . . . . . . . . . . 654
xvi CONTENTS
19 Algorithmic Resources 657
19.1 Software Systems . . . . . . . . . . . . . . . . . . . . . . . . . . . 657
19.2 Data Sources . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 663
19.3 Online Bibliographic Resources . . . . . . . . . . . . . . . . . . . 663
19.4 Professional Consulting Services . . . . . . . . . . . . . . . . . . . 664
Bibliography 665
Index 709
Part I
Practical Algorithm Design
1
Introduction to Algorithm Design
What is an algorithm? An algorithm is a procedure to accomplish a specific task. An algorithm is the idea behind any reasonable computer program. To be interesting, an algorithm must solve a general, well-specified problem. An algorithmic problem is specified by describing the complete set of instances it must work on and of its output after running on one of these instances. This distinction, between a problem and an instance of a problem, is fundamental. For example, the algorithmic problem known as sorting is defined as follows:
Problem: Sorting Input: A sequence of n keys a1, . . . , an. Output: The permutation (reordering) of the input sequence such that a′ 1 ≤ a′ 2 ≤ · · · ≤ a′ n−1 ≤ a′ n.
An instance of sorting might be an array of names, like {Mike, Bob, Sally, Jill, Jan}, or a list of numbers like {154, 245, 568, 324, 654, 324}. Determining that you are dealing with a general problem is your first step towards solving it. An algorithm is a procedure that takes any of the possible input instances and transforms it to the desired output. There are many different algorithms for solving the problem of sorting. For example, insertion sort is a method for sorting that starts with a single element (thus forming a trivially sorted list) and then incrementally inserts the remaining elements so that the list stays sorted. This algorithm, implemented in C, is described below:
S.S. Skiena, The Algorithm Design Manual, 2nd ed., DOI: 10.1007/978-1-84800-070-4 1, c⃝ Springer-Verlag London Limited 2008
4 1. INTRODUCTION TO ALGORITHM DESIGN
I N S E R T I O N S O R T I N S E R T I O N S O R T I N S E R T I O N S O R T I N S E R T I O N S O R T E I N S R T I O N S O R T E I N S R T I O N S O R T E I N R S T I O N S O R T E I N R S T I O N S O R T E I N R S T I O N S O R T E I I N R S T O N S O R T E I I N O R S T N S O R T E I I N O R S T N S O R T E I I N N O R S T S O R T E I I N N O R S T S O R T E I I N N O R S S T O R T E I I N N O R S S T O R T
E I I N N O O R R S S T T
E I I N N O O R S S T R T E I I N N O O R R S S T T
Figure 1.1: Animation of insertion sort in action (time flows down)
insertion_sort(item s[], int n) { int i,j; /* counters */
for (i=1; i<n; i++) { j=i; while ((j>0) && (s[j] < s[j-1])) { swap(&s[j],&s[j-1]); j = j-1; } } }
An animation of the logical flow of this algorithm on a particular instance (the letters in the word “INSERTIONSORT”) is given in Figure 1.1
Note the generality of this algorithm. It works just as well on names as it does on numbers, given the appropriate comparison operation (<) to test which of the two keys should appear first in sorted order. It can be readily verified that this algorithm correctly orders every possible input instance according to our definition of the sorting problem. There are three desirable properties for a good algorithm. We seek algorithms that are correct and efficient, while being easy to implement. These goals may not be simultaneously achievable. In industrial settings, any program that seems to give good enough answers without slowing the application down is often acceptable, regardless of whether a better algorithm exists. The issue of finding the best possible answer or achieving maximum efficiency usually arises in industry only after serious performance or legal troubles. In this chapter, we will focus on the issues of algorithm correctness, and defer a discussion of efficiency concerns to Chapter 2. It is seldom obvious whether a given
1.1 ROBOT TOUR OPTIMIZATION 5
0 0
1
2
3
4 5
6
7
8
Figure 1.2: A good instance for the nearest-neighbor heuristic
algorithm correctly solves a given problem. Correct algorithms usually come with a proof of correctness, which is an explanation of why we know that the algorithm must take every instance of the problem to the desired result. However, before we go further we demonstrate why “it’s obvious” never suffices as a proof of correctness, and is usually flat-out wrong.
1.1 Robot Tour Optimization
Let’s consider a problem that arises often in manufacturing, transportation, and testing applications. Suppose we are given a robot arm equipped with a tool, say a soldering iron. In manufacturing circuit boards, all the chips and other components must be fastened onto the substrate. More specifically, each chip has a set of contact points (or wires) that must be soldered to the board. To program the robot arm for this job, we must first construct an ordering of the contact points so the robot visits (and solders) the first contact point, then the second point, third, and so forth until the job is done. The robot arm then proceeds back to the first contact point to prepare for the next board, thus turning the tool-path into a closed tour, or cycle. Robots are expensive devices, so we want the tour that minimizes the time it takes to assemble the circuit board. A reasonable assumption is that the robot arm moves with fixed speed, so the time to travel between two points is proportional to their distance. In short, we must solve the following algorithm problem:
Problem: Robot Tour Optimization Input: A set S of n points in the plane. Output: What is the shortest cycle tour that visits each point in the set S?
You are given the job of programming the robot arm. Stop right now and think up an algorithm to solve this problem. I’ll be happy to wait until you find one. . .
6 1. INTRODUCTION TO ALGORITHM DESIGN
Several algorithms might come to mind to solve this problem. Perhaps the most popular idea is the nearest-neighbor heuristic. Starting from some point p0, we walk first to its nearest neighbor p1. From p1, we walk to its nearest unvisited neighbor, thus excluding only p0 as a candidate. We now repeat this process until we run out of unvisited points, after which we return to p0 to close off the tour. Written in pseudo-code, the nearest-neighbor heuristic looks like this:
NearestNeighbor(P) Pick and visit an initial point p0 from P p = p0 i = 0 While there are still unvisited points i = i + 1 Select pi to be the closest unvisited point to pi−1 Visit pi Return to p0 from pn−1
This algorithm has a lot to recommend it. It is simple to understand and imple- ment. It makes sense to visit nearby points before we visit faraway points to reduce the total travel time. The algorithm works perfectly on the example in Figure 1.2. The nearest-neighbor rule is reasonably efficient, for it looks at each pair of points (pi, pj) at most twice: once when adding pi to the tour, the other when adding pj. Against all these positives there is only one problem. This algorithm is completely wrong. Wrong? How can it be wrong? The algorithm always finds a tour, but it doesn’t necessarily find the shortest possible tour. It doesn’t necessarily even come close. Consider the set of points in Figure 1.3, all of which lie spaced along a line. The numbers describe the distance that each point lies to the left or right of the point labeled ‘0’. When we start from the point ‘0’ and repeatedly walk to the nearest unvisited neighbor, we might keep jumping left-right-left-right over ‘0’ as the algo- rithm offers no advice on how to break ties. A much better (indeed optimal) tour for these points starts from the leftmost point and visits each point as we walk right before returning at the rightmost point. Try now to imagine your boss’s delight as she watches a demo of your robot arm hopscotching left-right-left-right during the assembly of such a simple board. “But wait,” you might be saying. “The problem was in starting at point ‘0’. Instead, why don’t we start the nearest-neighbor rule using the leftmost point as the initial point p0? By doing this, we will find the optimal solution on this instance.” That is 100% true, at least until we rotate our example 90 degrees. Now all points are equally leftmost. If the point ‘0’ were moved just slightly to the left, it would be picked as the starting point. Now the robot arm will hopscotch up-down- up-down instead of left-right-left-right, but the travel time will be just as bad as before. No matter what you do to pick the first point, the nearest-neighbor rule is doomed to work incorrectly on certain point sets.
1.1 ROBOT TOUR OPTIMIZATION 7
-1 0 1 3 11 -21 -5
-1 0 1 3 11 -21 -5
Figure 1.3: A bad instance for the nearest-neighbor heuristic, with the optimal solution
Maybe what we need is a different approach. Always walking to the closest point is too restrictive, since it seems to trap us into making moves we didn’t want. A different idea might be to repeatedly connect the closest pair of endpoints whose connection will not create a problem, such as premature termination of the cycle. Each vertex begins as its own single vertex chain. After merging everything together, we will end up with a single chain containing all the points in it. Con- necting the final two endpoints gives us a cycle. At any step during the execution of this closest-pair heuristic, we will have a set of single vertices and vertex-disjoint chains available to merge. In pseudocode:
ClosestPair(P) Let n be the number of points in set P. For i = 1 to n − 1 do d = ∞ For each pair of endpoints (s, t) from distinct vertex chains if dist(s, t) ≤ d then sm = s, tm = t, and d = dist(s, t) Connect (sm, tm) by an edge Connect the two endpoints by an edge
This closest-pair rule does the right thing in the example in Figure 1.3. It starts by connecting ‘0’ to its immediate neighbors, the points 1 and −1. Subsequently, the next closest pair will alternate left-right, growing the central path by one link at a time. The closest-pair heuristic is somewhat more complicated and less efficient than the previous one, but at least it gives the right answer in this example. But this is not true in all examples. Consider what this algorithm does on the point set in Figure 1.4(l). It consists of two rows of equally spaced points, with the rows slightly closer together (distance 1 − e) than the neighboring points are spaced within each row (distance 1 + e). Thus the closest pairs of points stretch across the gap, not around the boundary. After we pair off these points, the closest
8 1. INTRODUCTION TO ALGORITHM DESIGN
1−e
1+e
1+e
1−e
(l)
1+e
1−e
1+e
1−e
(r)
Figure 1.4: A bad instance for the closest-pair heuristic, with the optimal solution
remaining pairs will connect these pairs alternately around the boundary. The total path length of the closest-pair tour is 3(1 − e) + 2(1 + e) +
(1 − e)2 + (2 + 2e)2. Compared to the tour shown in Figure 1.4(r), we travel over 20% farther than necessary when e ≈ 0. Examples exist where the penalty is considerably worse than this. Thus this second algorithm is also wrong. Which one of these algorithms per- forms better? You can’t tell just by looking at them. Clearly, both heuristics can end up with very bad tours on very innocent-looking input. At this point, you might wonder what a correct algorithm for our problem looks like. Well, we could try enumerating all possible orderings of the set of points, and then select the ordering that minimizes the total length:
OptimalTSP(P) d = ∞ For each of the n! permutations Pi of point set P If (cost(Pi) ≤ d) then d = cost(Pi) and Pmin = Pi Return Pmin
Since all possible orderings are considered, we are guaranteed to end up with the shortest possible tour. This algorithm is correct, since we pick the best of all the possibilities. But it is also extremely slow. The fastest computer in the world couldn’t hope to enumerate all the 20! =2,432,902,008,176,640,000 orderings of 20 points within a day. For real circuit boards, where n ≈ 1, 000, forget about it. All of the world’s computers working full time wouldn’t come close to finishing the problem before the end of the universe, at which point it presumably becomes moot. The quest for an efficient algorithm to solve this problem, called the traveling salesman problem (TSP), will take us through much of this book. If you need to know how the story ends, check out the catalog entry for the traveling salesman problem in Section 16.4 (page 533).
1.2 SELECTING THE RIGHT JOBS 9
The President’s Algorist Halting State "Discrete" Mathematics Calculated Bets
Programming Challenges
Steiner’s Tree Process Terminated
Tarjan of the Jungle The Four Volume Problem
Figure 1.5: An instance of the non-overlapping movie scheduling problem
Take-Home Lesson: There is a fundamental difference between algorithms, which always produce a correct result, and heuristics, which may usually do a good job but without providing any guarantee.
1.2 Selecting the Right Jobs
Now consider the following scheduling problem. Imagine you are a highly-in- demand actor, who has been presented with offers to star in n different movie projects under development. Each offer comes specified with the first and last day of filming. To take the job, you must commit to being available throughout this entire period. Thus you cannot simultaneously accept two jobs whose intervals overlap. For an artist such as yourself, the criteria for job acceptance is clear: you want to make as much money as possible. Because each of these films pays the same fee per film, this implies you seek the largest possible set of jobs (intervals) such that no two of them conflict with each other. For example, consider the available projects in Figure 1.5. We can star in at most four films, namely “Discrete” Mathematics, Programming Challenges, Calculated Bets, and one of either Halting State or Steiner’s Tree. You (or your agent) must solve the following algorithmic scheduling problem:
Problem: Movie Scheduling Problem Input: A set I of n intervals on the line. Output: What is the largest subset of mutually non-overlapping intervals which can be selected from I?
You are given the job of developing a scheduling algorithm for this task. Stop right now and try to find one. Again, I’ll be happy to wait.
There are several ideas that may come to mind. One is based on the notion that it is best to work whenever work is available. This implies that you should start with the job with the earliest start date – after all, there is no other job you can work on, then at least during the begining of this period.
10 1. INTRODUCTION TO ALGORITHM DESIGN
(l) (r)
War and Peace
Figure 1.6: Bad instances for the (l) earliest job first and (r) shortest job first heuristics.
EarliestJobFirst(I) Accept the earlest starting job j from I which does not overlap any previously accepted job, and repeat until no more such jobs remain.
This idea makes sense, at least until we realize that accepting the earliest job might block us from taking many other jobs if that first job is long. Check out Figure 1.6(l), where the epic “War and Peace” is both the first job available and long enough to kill off all other prospects. This bad example naturally suggests another idea. The problem with “War and Peace” is that it is too long. Perhaps we should start by taking the shortest job, and keep seeking the shortest available job at every turn. Maximizing the number of jobs we do in a given period is clearly connected to banging them out as quickly as possible. This yields the heuristic:
ShortestJobFirst(I) While (I ̸= ∅) do Accept the shortest possible job j from I. Delete j, and any interval which intersects j from I.
Again this idea makes sense, at least until we realize that accepting the shortest job might block us from taking two other jobs, as shown in Figure 1.6(r). While the potential loss here seems smaller than with the previous heuristic, it can readily limit us to half the optimal payoff. At this point, an algorithm where we try all possibilities may start to look good, because we can be certain it is correct. If we ignore the details of testing whether a set of intervals are in fact disjoint, it looks something like this:
ExhaustiveScheduling(I) j = 0 Smax = ∅ For each of the 2n subsets Si of intervals I
then j = size(Si) and Smax = Si. Return Smax
But how slow is it? The key limitation is enumerating the 2n subsets of n things. The good news is that this is much better than enumerating all n! orders
If (Si is mutually non-overlapping) and (size(Si) > j)
1.3 REASONING ABOUT CORRECTNESS 11
of n things, as proposed for the robot tour optimization problem. There are only about one million subsets when n = 20, which could be exhaustively counted within seconds on a decent computer. However, when fed n = 100 movies, 2100 is much much greater than the 20! which made our robot cry “uncle” in the previous problem. The difference between our scheduling and robotics problems are that there is an algorithm which solves movie scheduling both correctly and efficiently. Think about the first job to terminate—i.e. the interval x which contains the rightmost point which is leftmost among all intervals. This role is played by “Discrete” Mathematics in Figure 1.5. Other jobs may well have started before x, but all of these must at least partially overlap each other, so we can select at most one from the group. The first of these jobs to terminate is x, so any of the overlapping jobs potentially block out other opportunities to the right of it. Clearly we can never lose by picking x. This suggests the following correct, efficient algorithm:
OptimalScheduling(I) While (I ̸= ∅) do Accept the job j from I with the earliest completion date. Delete j, and any interval which intersects j from I.
Ensuring the optimal answer over all possible inputs is a difficult but often achievable goal. Seeking counterexamples that break pretender algorithms is an important part of the algorithm design process. Efficient algorithms are often lurk- ing out there; this book seeks to develop your skills to help you find them.
Take-Home Lesson: Reasonable-looking algorithms can easily be incorrect. Al- gorithm correctness is a property that must be carefully demonstrated.
1.3 Reasoning about Correctness
Hopefully, the previous examples have opened your eyes to the subtleties of algo- rithm correctness. We need tools to distinguish correct algorithms from incorrect ones, the primary one of which is called a proof. A proper mathematical proof consists of several parts. First, there is a clear, precise statement of what you are trying to prove. Second, there is a set of assump- tions of things which are taken to be true and hence used as part of the proof. Third, there is a chain of reasoning which takes you from these assumptions to the statement you are trying to prove. Finally, there is a little square ( ) or QED at the bottom to denote that you have finished, representing the Latin phrase for “thus it is demonstrated.” This book is not going to emphasize formal proofs of correctness, because they are very difficult to do right and quite misleading when you do them wrong. A proof is indeed a demonstration. Proofs are useful only when they are honest; crisp arguments explaining why an algorithm satisfies a nontrivial correctness property.
12 1. INTRODUCTION TO ALGORITHM DESIGN
Correct algorithms require careful exposition, and efforts to show both cor- rectness and not incorrectness. We develop tools for doing so in the subsections below.
1.3.1 Expressing Algorithms
Reasoning about an algorithm is impossible without a careful description of the sequence of steps to be performed. The three most common forms of algorithmic notation are (1) English, (2) pseudocode, or (3) a real programming language. We will use all three in this book. Pseudocode is perhaps the most mysterious of the bunch, but it is best defined as a programming language that never complains about syntax errors. All three methods are useful because there is a natural tradeoff between greater ease of expression and precision. English is the most natural but least precise programming language, while Java and C/C++ are precise but diffi- cult to write and understand. Pseudocode is generally useful because it represents a happy medium. The choice of which notation is best depends upon which method you are most comfortable with. I usually prefer to describe the ideas of an algorithm in English, moving to a more formal, programming-language-like pseudocode or even real code to clarify sufficiently tricky details. A common mistake my students make is to use pseudocode to dress up an ill- defined idea so that it looks more formal. Clarity should be the goal. For example, the ExhaustiveScheduling algorithm on page 10 could have been better written in English as:
ExhaustiveScheduling(I) Test all 2n subsets of intervals from I, and return the largest subset consisting of mutually non-overlapping intervals.
Take-Home Lesson: The heart of any algorithm is an idea. If your idea is not clearly revealed when you express an algorithm, then you are using too low-level a notation to describe it.
1.3.2 Problems and Properties
We need more than just an algorithm description in order to demonstrate cor- rectness. We also need a careful description of the problem that it is intended to solve. Problem specifications have two parts: (1) the set of allowed input instances, and (2) the required properties of the algorithm’s output. It is impossible to prove the correctness of an algorithm for a fuzzily-stated problem. Put another way, ask the wrong problem and you will get the wrong answer. Some problem specifications allow too broad a class of input instances. Suppose we had allowed film projects in our movie scheduling problem to have gaps in
1.3 REASONING ABOUT CORRECTNESS 13
Then the schedule associated with any particular film would consist of a given set of intervals. Our star would be free to take on two interleaving but not overlapping projects (such as the film above nested with one filming in August and October). The earliest completion algorithm would not work for such a generalized scheduling problem. Indeed, no efficient algorithm exists for this generalized problem.
Take-Home Lesson: An important and honorable technique in algorithm de- sign is to narrow the set of allowable instances until there is a correct and efficient algorithm. For example, we can restrict a graph problem from general graphs down to trees, or a geometric problem from two dimensions down to one.
There are two common traps in specifying the output requirements of a problem. One is asking an ill-defined question. Asking for the best route between two places on a map is a silly question unless you define what best means. Do you mean the shortest route in total distance, or the fastest route, or the one minimizing the number of turns? The second trap is creating compound goals. The three path-planning criteria mentioned above are all well-defined goals that lead to correct, efficient optimiza- tion algorithms. However, you must pick a single criteria. A goal like Find the shortest path from a to b that doesn’t use more than twice as many turns as neces- sary is perfectly well defined, but complicated to reason and solve. I encourage you to check out the problem statements for each of the 75 catalog problems in the second part of this book. Finding the right formulation for your problem is an important part of solving it. And studying the definition of all these classic algorithm problems will help you recognize when someone else has thought about similar problems before you.
1.3.3 Demonstrating Incorrectness
The best way to prove that an algorithm is incorrect is to produce an instance in which it yields an incorrect answer. Such instances are called counter-examples. No rational person will ever leap to the defense of an algorithm after a counter- example has been identified. Very simple instances can instantly kill reasonable- looking heuristics with a quick touch´e. Good counter-examples have two important properties:
• Verifiability – To demonstrate that a particular instance is a counter-example to a particular algorithm, you must be able to (1) calculate what answer your algorithm will give in this instance, and (2) display a better answer so as to prove the algorithm didn’t find it.
Since you must hold the given instance in your head to reason about it, an important part of verifiability is. . .
production (i.e., filming in September and November but a hiatus in October).
14 1. INTRODUCTION TO ALGORITHM DESIGN
• Simplicity – Good counter-examples have all unnecessary details boiled away. They make clear exactly why the proposed algorithm fails. Once a counter- example has been found, it is worth simplifying it down to its essence. For example, the counter-example of Figure 1.6(l) could be made simpler and
Hunting for counter-examples is a skill worth developing. It bears some simi- larity to the task of developing test sets for computer programs, but relies more on inspiration than exhaustion. Here are some techniques to aid your quest:
• Think small – Note that the robot tour counter-examples I presented boiled down to six points or less, and the scheduling counter-examples to only three intervals. This is indicative of the fact that when algorithms fail, there is usually a very simple example on which they fail. Amateur algorists tend to draw a big messy instance and then stare at it helplessly. The pros look carefully at several small examples, because they are easier to verify and reason about.
• Think exhaustively – There are only a small number of possibilities for the smallest nontrivial value of n. For example, there are only three interesting ways two intervals on the line can occur: (1) as disjoint intervals, (2) as overlapping intervals, and (3) as properly nesting intervals, one within the other. All cases of three intervals (including counter-examples to both movie heuristics) can be systematically constructed by adding a third segment in each possible way to these three instances.
• Hunt for the weakness – If a proposed algorithm is of the form “always take the biggest” (better known as the greedy algorithm), think about why that might prove to be the wrong thing to do. In particular, . . .
• Go for a tie – A devious way to break a greedy heuristic is to provide instances where everything is the same size. Suddenly the heuristic has nothing to base its decision on, and perhaps has the freedom to return something suboptimal as the answer.
• Seek extremes – Many counter-examples are mixtures of huge and tiny, left and right, few and many, near and far. It is usually easier to verify or rea- son about extreme examples than more muddled ones. Consider two tightly bunched clouds of points separated by a much larger distance d. The optimal TSP tour will be essentially 2d regardless of the number of points, because what happens within each cloud doesn’t really matter.
Take-Home Lesson: Searching for counterexamples is the best way to disprove the correctness of a heuristic.
better by reducing the number of overlapped segments from five to two.
1.3 REASONING ABOUT CORRECTNESS 15
1.3.4 Induction and Recursion
Failure to find a counterexample to a given algorithm does not mean “it is obvious” that the algorithm is correct. A proof or demonstration of correctness is needed. Often mathematical induction is the method of choice. When I first learned about mathematical induction it seemed like complete magic. You proved a formula like n i=1 i = n(n + 1)/2 for some basis case like 1 or 2, then assumed it was true all the way to n − 1 before proving it was true for general n using the assumption. That was a proof? Ridiculous! When I first learned the programming technique of recursion it also seemed like complete magic. The program tested whether the input argument was some basis case like 1 or 2. If not, you solved the bigger case by breaking it into pieces and calling the subprogram itself to solve these pieces. That was a program? Ridiculous! The reason both seemed like magic is because recursion is mathematical induc- tion. In both, we have general and boundary conditions, with the general condition breaking the problem into smaller and smaller pieces. The initial or boundary con- dition terminates the recursion. Once you understand either recursion or induction, you should be able to see why the other one also works. I’ve heard it said that a computer scientist is a mathematician who only knows how to prove things by induction. This is partially true because computer scientists are lousy at proving things, but primarily because so many of the algorithms we study are either recursive or incremental. Consider the correctness of insertion sort, which we introduced at the beginning of this chapter. The reason it is correct can be shown inductively:
• The basis case consists of a single element, and by definition a one-element array is completely sorted.
• In general, we can assume that the first n − 1 elements of array A are com- pletely sorted after n − 1 iterations of insertion sort.
• To insert one last element x to A, we find where it goes, namely the unique spot between the biggest element less than or equal to x and the smallest element greater than x. This is done by moving all the greater elements back by one position, creating room for x in the desired location.
One must be suspicious of inductive proofs, however, because very subtle rea- soning errors can creep in. The first are boundary errors. For example, our insertion sort correctness proof above boldly stated that there was a unique place to insert x between two elements, when our basis case was a single-element array. Greater care is needed to properly deal with the special cases of inserting the minimum or maximum elements. The second and more common class of inductive proof errors concerns cavallier extension claims. Adding one extra item to a given problem instance might cause the entire optimal solution to change. This was the case in our scheduling problem (see Figure 1.7). The optimal schedule after inserting a new segment may contain
16 1. INTRODUCTION TO ALGORITHM DESIGN
Figure 1.7: Large-scale changes in the optimal solution (boxes) after inserting a single interval (dashed) into the instance
none of the segments of any particular optimal solution prior to insertion. Boldly ignoring such difficulties can lead to very convincing inductive proofs of incorrect algorithms.
Take-Home Lesson: Mathematical induction is usually the right way to verify the correctness of a recursive or incremental insertion algorithm.
Stop and Think: Incremental Correctness
Problem: Prove the correctness of the following recursive algorithm for increment- ing natural numbers, i.e. y → y + 1:
Increment(y) if y = 0 then return(1) else if (y mod 2) = 1 then return(2 · Increment(⌊y/2⌋)) else return(y + 1)
Solution: The correctness of this algorithm is certainly not obvious to me. But as it is recursive and I am a computer scientist, my natural instinct is to try to prove it by induction. The basis case of y = 0 is obviously correctly handled. Clearly the value 1 is returned, and 0 + 1 = 1. Now assume the function works correctly for the general case of y = n−1. Given this, we must demonstrate the truth for the case of y = n. Half of the cases are easy, namely the even numbers (For which (y mod 2) = 0), since y + 1 is explicitly returned. For the odd numbers, the answer depends upon what is returned by Increment(⌊y/2⌋). Here we want to use our inductive assumption, but it isn’t quite right. We have assumed that increment worked correctly for y = n − 1, but not for a value which is about half of it. We can fix this problem by strengthening our assumption to declare that the general case holds for all y ≤ n−1. This costs us nothing in principle, but is necessary to establish the correctness of the algorithm.
1.3 REASONING ABOUT CORRECTNESS 17
Now, the case of odd y (i.e. y = 2m + 1 for some integer m) can be dealt with as:
2 · Increment(⌊(2m + 1)/2⌋) = 2 · Increment(⌊m + 1/2⌋) = 2 · Increment(m) = 2(m + 1) = 2m + 2 = y + 1
and the general case is resolved.
1.3.5 Summations
Mathematical summation formulae arise often in algorithm analysis, which we will study in Chapter 2. Further, proving the correctness of summation formulae is a classic application of induction. Several exercises on inductive proofs of summations
n
i=1 f(i) = f(1) + f(2) + . . . + f(n)
There are simple closed forms for summations of many algebraic functions. For example, since n ones is n,
n
i=1 1 = n
n
i=1 i =
n/2
i=1 (i + (n − i + 1)) = n(n + 1)/2
Recognizing two basic classes of summation formulae will get you a long way in algorithm analysis:
S(n, p) =
n
i ip = Θ(np+1)
appear as exercises at the end of this chapter. To make these more accessible, I review the basics of summations here. Summation formulae are concise expressions describing the addition of an ar- bitrarily large set of numbers, in particular the formula
The sum of the first n even integers can be seen by pairing up the ith and (n−i+1)th integers:
• Arithmetic progressions – We will encounter the arithmetic progression S(n) = n i i = n(n + 1)/2 in the analysis of selection sort. From the big picture perspective, the important thing is that the sum is quadratic, not that the constant is 1/2. In general,
18 1. INTRODUCTION TO ALGORITHM DESIGN
for p ≥ 1. Thus the sum of squares is cubic, and the sum of cubes is quartic (if you use such a word). The “big Theta” notation (Θ(x)) will be properly explained in Section 2.2.
For p < −1, this sum always converges to a constant, even as n → ∞. The interesting case is between results in . . .
• Geometric series – In geometric progressions, the index of the loop effects the exponent, i.e.
G(n, a) =
n
i=0 ai
How we interpret this sum depends upon the base of the progression, i.e. a. When a < 1, this converges to a constant even as n → ∞.
This series convergence proves to be the great “free lunch” of algorithm anal- ysis. It means that the sum of a linear number of things can be constant, not linear. For example, 1+1/2+1/4+1/8+. . . ≤ 2 no matter how many terms we add up.
When a > 1, the sum grows rapidly with each new term, as in 1 + 2 + 4 + 8 + 16 + 32 = 63. Indeed, G(n, a) = Θ(an+1) for a > 1.
Stop and Think: Factorial Formulae
Problem: Prove that n i=1 i × i! = (n + 1)! − 1 by induction.
Solution: The inductive paradigm is straightforward. First verify the basis case (here we do n = 1, although n = 0 would be even more general):
1
i=1 i × i! = 1 = (1 + 1)! − 1 = 2 − 1 = 1
Now assume the statement is true up to n. To prove the general case of n + 1, observe that rolling out the largest term
n+1
i=1 i × i! = (n + 1) × (n + 1)! +
n
i=1 i × i!
reveals the left side of our inductive assumption. Substituting the right side gives us
n+1
i=1 i × i! = (n + 1) × (n + 1)! + (n + 1)! − 1
= (an+1 − 1)/(a − 1)
1.4 MODELING THE PROBLEM 19
= (n + 1)! × ((n + 1) + 1) − 1 = (n + 2)! − 1
This general trick of separating out the largest term from the summation to reveal an instance of the inductive assumption lies at the heart of all such proofs.
1.4 Modeling the Problem
Modeling is the art of formulating your application in terms of precisely described, well-understood problems. Proper modeling is the key to applying algorithmic de- sign techniques to real-world problems. Indeed, proper modeling can eliminate the need to design or even implement algorithms, by relating your application to what has been done before. Proper modeling is the key to effectively using the “Hitch- hiker’s Guide” in Part II of this book. Real-world applications involve real-world objects. You might be working on a system to route traffic in a network, to find the best way to schedule classrooms in a university, or to search for patterns in a corporate database. Most algorithms, however, are designed to work on rigorously defined abstract structures such as permutations, graphs, and sets. To exploit the algorithms literature, you must learn to describe your problem abstractly, in terms of procedures on fundamental structures.
1.4.1 Combinatorial Objects
Odds are very good that others have stumbled upon your algorithmic problem before you, perhaps in substantially different contexts. But to find out what is known about your particular “widget optimization problem,” you can’t hope to look in a book under widget. You must formulate widget optimization in terms of computing properties of common structures such as:
• Permutations – which are arrangements, or orderings, of items. For example, {1, 4, 3, 2} and {4, 3, 2, 1} are two distinct permutations of the same set of four integers. We have already seen permutations in the robot optimization prob- lem, and in sorting. Permutations are likely the object in question whenever your problem seeks an “arrangement,” “tour,” “ordering,” or “sequence.”
• Subsets – which represent selections from a set of items. For example, {1, 3, 4} and {2} are two distinct subsets of the first four integers. Order does not matter in subsets the way it does with permutations, so the subsets {1, 3, 4} and {4, 3, 1} would be considered identical. We saw subsets arise in the movie scheduling problem. Subsets are likely the object in question whenever your problem seeks a “cluster,” “collection,” “committee,” “group,” “packaging,” or “selection.”
20 1. INTRODUCTION TO ALGORITHM DESIGN
Sol
Steve Len Jim Lisa Jeff Richard Rob Laurie
Morris Eve Sid
Stony Brook
Orient Point
Montauk
Shelter Island
Sag Harbor
Riverhead
Islip
Greenport
Figure 1.8: Modeling real-world structures with trees and graphs
• Trees – which represent hierarchical relationships between items. Figure
1.8(a) shows part of the family tree of the Skiena clan. Trees are likely the object in question whenever your problem seeks a “hierarchy,” “dominance relationship,” “ancestor/descendant relationship,” or “taxonomy.”
• Graphs – which represent relationships between arbitrary pairs of objects. Figure 1.8(b) models a network of roads as a graph, where the vertices are cities and the edges are roads connecting pairs of cities. Graphs are likely the object in question whenever you seek a “network,” “circuit,” “web,” or “relationship.”
• Points – which represent locations in some geometric space. For example, the locations of McDonald’s restaurants can be described by points on a map/plane. Points are likely the object in question whenever your problems work on “sites,” “positions,” “data records,” or “locations.”
• Polygons – which represent regions in some geometric spaces. For example, the borders of a country can be described by a polygon on a map/plane. Polygons and polyhedra are likely the object in question whenever you are working on “shapes,” “regions,” “configurations,” or “boundaries.”
• Strings – which represent sequences of characters or patterns. For example, the names of students in a class can be represented by strings. Strings are likely the object in question whenever you are dealing with “text,” “charac- ters,” “patterns,” or “labels.”
These fundamental structures all have associated algorithm problems, which are presented in the catalog of Part II. Familiarity with these problems is important, because they provide the language we use to model applications. To become fluent in this vocabulary, browse through the catalog and study the input and output pic- tures for each problem. Understanding these problems, even at a cartoon/definition level, will enable you to know where to look later when the problem arises in your application.
1.4 MODELING THE PROBLEM 21
A L G O R I T H M
4+{1,4,2,3}
9+{1,2,7} {1,2,7,9}
{4,1,5,2,3}
L G O R I T H M A
Figure 1.9: Recursive decompositions of combinatorial objects. (left column) Permutations, subsets, trees, and graphs. (right column) Point sets, polygons, and strings
Examples of successful application modeling will be presented in the war stories spaced throughout this book. However, some words of caution are in order. The act of modeling reduces your application to one of a small number of existing problems and structures. Such a process is inherently constraining, and certain details might not fit easily into the given target problem. Also, certain problems can be modeled in several different ways, some much better than others. Modeling is only the first step in designing an algorithm for a problem. Be alert for how the details of your applications differ from a candidate model, but don’t be too quick to say that your problem is unique and special. Temporarily ignoring details that don’t fit can free the mind to ask whether they really were fundamental in the first place.
Take-Home Lesson: Modeling your application in terms of well-defined struc- tures and algorithms is the most important single step towards a solution.
1.4.2 Recursive Objects
Learning to think recursively is learning to look for big things that are made from smaller things of exactly the same type as the big thing. If you think of houses as sets of rooms, then adding or deleting a room still leaves a house behind. Recursive structures occur everywhere in the algorithmic world. Indeed, each of the abstract structures described above can be thought about recursively. You just have to see how you can break them down, as shown in Figure 1.9:
• Permutations – Delete the first element of a permutation of {1, . . . , n} things and you get a permutation of the remaining n − 1 things. Permutations are recursive objects.
22 1. INTRODUCTION TO ALGORITHM DESIGN
• Subsets – Every subset of the elements {1, . . . , n} contains a subset of {1, . . . , n − 1} made visible by deleting element n if it is present. Subsets are recursive objects.
• Trees – Delete the root of a tree and what do you get? A collection of smaller trees. Delete any leaf of a tree and what do you get? A slightly smaller tree. Trees are recursive objects.
• Graphs – Delete any vertex from a graph, and you get a smaller graph. Now divide the vertices of a graph into two groups, left and right. Cut through all edges which span from left to right, and what do you get? Two smaller graphs, and a bunch of broken edges. Graphs are recursive objects.
• Points – Take a cloud of points, and separate them into two groups by drawing a line. Now you have two smaller clouds of points. Point sets are recursive objects.
• Polygons – Inserting any internal chord between two nonadjacent vertices of a simple polygon on n vertices cuts it into two smaller polygons. Polygons are recursive objects.
• Strings – Delete the first character from a string, and what do you get? A shorter string. Strings are recursive objects.
Recursive descriptions of objects require both decomposition rules and basis cases, namely the specification of the smallest and simplest objects where the de- composition stops. These basis cases are usually easily defined. Permutations and subsets of zero things presumably look like {}. The smallest interesting tree or graph consists of a single vertex, while the smallest interesting point cloud consists of a single point. Polygons are a little trickier; the smallest genuine simple polygon is a triangle. Finally, the empty string has zero characters in it. The decision of whether the basis case contains zero or one element is more a question of taste and convenience than any fundamental principle. Such recursive decompositions will come to define many of the algorithms we will see in this book. Keep your eyes open for them.
1.5 About the War Stories
The best way to learn how careful algorithm design can have a huge impact on per- formance is to look at real-world case studies. By carefully studying other people’s experiences, we learn how they might apply to our work. Scattered throughout this text are several of my own algorithmic war stories, presenting our successful (and occasionally unsuccessful) algorithm design efforts on real applications. I hope that you will be able to internalize these experiences so that they will serve as models for your own attacks on problems.
1.6 WAR STORY: PSYCHIC MODELING 23
Every one of the war stories is true. Of course, the stories improve somewhat in the retelling, and the dialogue has been punched up to make them more interesting to read. However, I have tried to honestly trace the process of going from a raw problem to a solution, so you can watch how this process unfolded. The Oxford English Dictionary defines an algorist as “one skillful in reckonings or figuring.” In these stories, I have tried to capture some of the mindset of the algorist in action as they attack a problem. The various war stories usually involve at least one, and often several, problems from the problem catalog in Part II. I reference the appropriate section of the catalog when such a problem occurs. This emphasizes the benefits of modeling your application in terms of standard algorithm problems. By using the catalog, you will be able to pull out what is known about any given problem whenever it is needed.
1.6 War Story: Psychic Modeling
The call came for me out of the blue as I sat in my office. “Professor Skiena, I hope you can help me. I’m the President of Lotto Systems Group Inc., and we need an algorithm for a problem arising in our latest product.” “Sure,” I replied. After all, the dean of my engineering school is always encour- aging our faculty to interact more with industry. “At Lotto Systems Group, we market a program designed to improve our cus- tomers’ psychic ability to predict winning lottery numbers.1 In a standard lottery, each ticket consists of six numbers selected from, say, 1 to 44. Thus, any given ticket has only a very small chance of winning. However, after proper training, our clients can visualize, say, 15 numbers out of the 44 and be certain that at least four of them will be on the winning ticket. Are you with me so far?” “Probably not,” I replied. But then I recalled how my dean encourages us to interact with industry. “Our problem is this. After the psychic has narrowed the choices down to 15 numbers and is certain that at least 4 of them will be on the winning ticket, we must find the most efficient way to exploit this information. Suppose a cash prize is awarded whenever you pick at least three of the correct numbers on your ticket. We need an algorithm to construct the smallest set of tickets that we must buy in order to guarantee that we win at least one prize.” “Assuming the psychic is correct?” “Yes, assuming the psychic is correct. We need a program that prints out a list of all the tickets that the psychic should buy in order to minimize their investment. Can you help us?” Maybe they did have psychic ability, for they had come to the right place. Iden- tifying the best subset of tickets to buy was very much a combinatorial algorithm
1Yes, this is a true story.
24 1. INTRODUCTION TO ALGORITHM DESIGN
35
34
25
24
23
15
14
13 12
45
Figure 1.10: Covering all pairs of {1, 2, 3, 4, 5} with tickets {1, 2, 3}, {1, 4, 5}, {2, 4, 5}, {3, 4, 5}
problem. It was going to be some type of covering problem, where each ticket we buy was going to “cover” some of the possible 4-element subsets of the psychic’s set. Finding the absolute smallest set of tickets to cover everything was a special instance of the NP-complete problem set cover (discussed in Section 18.1 (page 621)), and presumably computationally intractable. It was indeed a special instance of set cover, completely specified by only four numbers: the size n of the candidate set S (typically n ≈ 15), the number of slots k for numbers on each ticket (typically k ≈ 6), the number of psychically-promised correct numbers j from S (say j = 4), and finally, the number of matching numbers l necessary to win a prize (say l = 3). Figure 1.10 illustrates a covering of a smaller instance, where n = 5, j = k = 3, and l = 2. “Although it will be hard to find the exact minimum set of tickets to buy, with heuristics I should be able to get you pretty close to the cheapest covering ticket set,” I told him. “Will that be good enough?” “So long as it generates better ticket sets than my competitor’s program, that will be fine. His system doesn’t always guarantee a win. I really appreciate your help on this, Professor Skiena.” “One last thing. If your program can train people to pick lottery winners, why don’t you use it to win the lottery yourself?” “I look forward to talking to you again real soon, Professor Skiena. Thanks for the help.” I hung up the phone and got back to thinking. It seemed like the perfect project to give to a bright undergraduate. After modeling it in terms of sets and subsets, the basic components of a solution seemed fairly straightforward:
1.6 WAR STORY: PSYCHIC MODELING 25
• We needed the ability to generate all subsets of k numbers from the candidate set S. Algorithms for generating and ranking/unranking subsets of sets are presented in Section 14.5 (page 452).
• We needed the right formulation of what it meant to have a covering set of purchased tickets. The obvious criteria would be to pick a small set of tickets such that we have purchased at least one ticket containing each of the n l
l-subsets of S that might pay off with the prize.
• We needed to keep track of which prize combinations we have thus far cov- ered. We seek tickets to cover as many thus-far-uncovered prize combinations as possible. The currently covered combinations are a subset of all possible combinations. Data structures for subsets are discussed in Section 12.5 (page 385). The best candidate seemed to be a bit vector, which would answer in constant time “is this combination already covered?”
• We needed a search mechanism to decide which ticket to buy next. For small enough set sizes, we could do an exhaustive search over all possible sub- sets of tickets and pick the smallest one. For larger problems, a randomized search process like simulated annealing (see Section 7.5.3 (page 254)) would select tickets-to-buy to cover as many uncovered combinations as possible. By repeating this randomized procedure several times and picking the best solution, we would be likely to come up with a good set of tickets.
Excluding the details of the search mechanism, the pseudocode for the book- keeping looked something like this:
LottoTicketSet(n, k, l)
Initialize the n l -element bit-vector V to all false While there exists a false entry in V Select a k-subset T of {1, . . . , n} as the next ticket to buy For each of the l-subsets Ti of T, V [rank(Ti)] = true Report the set of tickets bought
The bright undergraduate, Fayyaz Younas, rose to the challenge. Based on this framework, he implemented a brute-force search algorithm and found optimal solutions for problems with n ≤ 5 in a reasonable time. He implemented a random search procedure to solve larger problems, tweaking it for a while before settling on the best variant. Finally, the day arrived when we could call Lotto Systems Group and announce that we had solved the problem. “Our program found an optimal solution for n = 15, k = 6, j = 6, l = 3 meant buying 28 tickets.” “Twenty-eight tickets!” complained the president. “You must have a bug. Look, these five tickets will suffice to cover everything twice over: {2, 4, 8, 10, 13, 14}, {4, 5, 7, 8, 12, 15}, {1, 2, 3, 6, 11, 13}, {3, 5, 6, 9, 10, 15}, {1, 7, 9, 11, 12, 14}.”
26 1. INTRODUCTION TO ALGORITHM DESIGN
35
34
25
24
23
15
14
13 12
45
Figure 1.11: Guaranteeing a winning pair from {1, 2, 3, 4, 5} using only tickets {1, 2, 3} and {1, 4, 5}
We fiddled with this example for a while before admitting that he was right. We hadn’t modeled the problem correctly! In fact, we didn’t need to explicitly cover all possible winning combinations. Figure 1.11 illustrates the principle by giving a two- ticket solution to our previous four-ticket example. Such unpromising outcomes as {2, 3, 4} and {3, 4, 5} each agree in one matching pair with tickets from Figure 1.11. We were trying to cover too many combinations, and the penny-pinching psychics were unwilling to pay for such extravagance. Fortunately, this story has a happy ending. The general outline of our search- based solution still holds for the real problem. All we must fix is which subsets we get credit for covering with a given set of tickets. After this modification, we obtained the kind of results they were hoping for. Lotto Systems Group gratefully accepted our program to incorporate into their product, and hopefully hit the jackpot with it. The moral of this story is to make sure that you model the problem correctly before trying to solve it. In our case, we came up with a reasonable model, but didn’t work hard enough to validate it before we started to program. Our misin- terpretation would have become obvious had we worked out a small example by hand and bounced it off our sponsor before beginning work. Our success in recov- ering from this error is a tribute to the basic correctness of our initial formulation, and our use of well-defined abstractions for such tasks as (1) ranking/unranking k-subsets, (2) the set data structure, and (3) combinatorial search.
1.7 EXERCISES 27
Chapter Notes
Every decent algorithm book reflects the design philosophy of its author. For stu- dents seeking alternative presentations and viewpoints, we particularly recommend
[Man89]. Formal proofs of algorithm correctness are important, and deserve a fuller dis- cussion than we are able to provide in this chapter. See Gries [Gri89] for a thorough introduction to the techniques of program verification. The movie scheduling problem represents a very special case of the general inde- pendent set problem, which is discussed in Section 16.2 (page 528). The restriction limits the allowable input instances to interval graphs, where the vertices of the graph G can be represented by intervals on the line and (i, j) is an edge of G iff the intervals overlap. Golumbic [Gol04] provides a full treatment of this interesting and important class of graphs. Jon Bentley’s Programming Pearls columns are probably the best known col- lection of algorithmic “war stories.” Originally published in the Communications of the ACM, they have been collected in two books [Ben90, Ben99]. Brooks’s The Mythical Man Month [Bro95] is another wonderful collection of war stories, focused more on software engineering than algorithm design, but they remain a source of considerable wisdom. Every programmer should read all these books, for pleasure as well as insight. Our solution to the lotto ticket set covering problem is presented in more detail in [YS96].
1.7 Exercises
Finding Counterexamples
1-1. [3] Show that a + b can be less than min(a, b).
1-2. [3] Show that a × b can be less than min(a, b).
1-3. [5] Design/draw a road network with two points a and b such that the fastest route between a and b is not the shortest route.
1-4. [5] Design/draw a road network with two points a and b such that the shortest route between a and b is not the route with the fewest turns.
1-5. [4] The knapsack problem is as follows: given a set of integers S = {s1, s2, . . . , sn}, and a target number T, find a subset of S which adds up exactly to T. For example, there exists a subset within S = {1, 2, 5, 9, 10} that adds up to T = 22 but not T = 23.
Find counterexamples to each of the following algorithms for the knapsack problem.
the books of Cormen, et. al [CLRS01], Kleinberg/Tardos [KT06], and Manber
That is, give an S and T where the algorithm does not find a solution which leaves the knapsack completely full, even though a full-knapsack solution exists.
28 1. INTRODUCTION TO ALGORITHM DESIGN
(a) Put the elements of S in the knapsack in left to right order if they fit, i.e. the first-fit algorithm.
(b) Put the elements of S in the knapsack from smallest to largest, i.e. the best-fit algorithm.
(c) Put the elements of S in the knapsack from largest to smallest.
1-6. [5] The set cover problem is as follows: given a set of subsets S1, ..., Sm of the universal set U = {1, ..., n}, find the smallest subset of subsets T ⊂ S such that ∪ti∈T ti = U. For example, there are the following subsets, S1 = {1, 3, 5}, S2 = {2, 4}, S3 = {1, 4}, and S4 = {2, 5} The set cover would then be S1 and S2. Find a counterexample for the following algorithm: Select the largest subset for the cover, and then delete all its elements from the universal set. Repeat by adding the subset containing the largest number of uncovered elements until all are covered.
Proofs of Correctness
1-7. [3] Prove the correctness of the following recursive algorithm to multiply two natural numbers, for all integer constants c ≥ 2. function multiply(y, z) comment Return the product yz. 1. if z = 0 then return(0) else 2. return(multiply(cy, ⌊z/c⌋) + y · (z mod c))
1-8. [3] Prove the correctness of the following algorithm for evaluating a polynomial. P(x) = anxn + an−1xn−1 + . . . + a1x + a0 function horner(A, x) p = An for i from n − 1 to 0 p = p ∗ x + Ai return p
1-9. [3] Prove the correctness of the following sorting algorithm. function bubblesort (A : list[1 . . . n]) var int i, j for i from n to 1 for j from 1 to i − 1 if (A[j] > A[j + 1]) swap the values of A[j] and A[j + 1]
Induction
1-10. [3] Prove that n i=1 i=n(n + 1)/2 for n ≥ 0, by induction.
1-11. [3] Prove that n i=1 i2=n(n + 1)(2n + 1)/6 for n ≥ 0, by induction.
1-12. [3] Prove that n i=1 i3=n2(n + 1)2/4 for n ≥ 0, by induction.
1-13. [3] Prove that
n
i=1 i(i + 1)(i + 2) = n(n + 1)(n + 2)(n + 3)/4
1.7 EXERCISES 29
1-14. [5] Prove by induction on n ≥ 1 that for every a ̸= 1,
n
i=0 ai = an+1 − 1
a − 1
1-15. [3] Prove by induction that for n ≥ 1,
n
i=1
1
i(i + 1) = n
n + 1
1-16. [3] Prove by induction that n3 + 2n is divisible by 3 for all n ≥ 0.
1-17. [3] Prove by induction that a tree with n vertices has exactly n − 1 edges.
1-18. [3] Prove by mathematical induction that the sum of the cubes of the first n positive integers is equal to the square of the sum of these integers, i.e.
n
i=1 i3 = (
n
i=1 i)2
Estimation
1-19. [3] Do all the books you own total at least one million pages? How many total pages are stored in your school library?
1-20. [3] How many words are there in this textbook?
1-21. [3] How many hours are one million seconds? How many days? Answer these questions by doing all arithmetic in your head.
1-22. [3] Estimate how many cities and towns there are in the United States.
1-23. [3] Estimate how many cubic miles of water flow out of the mouth of the Mississippi River each day. Do not look up any supplemental facts. Describe all assumptions you made in arriving at your answer.
1-24. [3] Is disk drive access time normally measured in milliseconds (thousandths of a second) or microseconds (millionths of a second)? Does your RAM memory access a word in more or less than a microsecond? How many instructions can your CPU execute in one year if the machine is left running all the time?
1-25. [4] A sorting algorithm takes 1 second to sort 1,000 items on your local machine. How long will it take to sort 10,000 items. . .
(a) if you believe that the algorithm takes time proportional to n2, and
(b) if you believe that the algorithm takes time roughly proportional to n log n?
Implementation Projects
1-26. [5] Implement the two TSP heuristics of Section 1.1 (page 5). Which of them gives better-quality solutions in practice? Can you devise a heuristic that works better than both of them?
1-27. [5] Describe how to test whether a given set of tickets establishes sufficient coverage in the Lotto problem of Section 1.6 (page 23). Write a program to find good ticket sets.
30 1. INTRODUCTION TO ALGORITHM DESIGN
Interview Problems
1-28. [5] Write a function to perform integer division without using either the / or * operators. Find a fast way to do it.
1-29. [5] There are 25 horses. At most, 5 horses can race together at a time. You must determine the fastest, second fastest, and third fastest horses. Find the minimum number of races in which this can be done.
1-30. [3] How many piano tuners are there in the entire world?
1-31. [3] How many gas stations are there in the United States?
1-32. [3] How much does the ice in a hockey rink weigh?
1-33. [3] How many miles of road are there in the United States?
1-34. [3] On average, how many times would you have to flip open the Manhattan phone book at random in order to find a specific name?
Programming Challenges
These programming challenge problems with robot judging are available at http://www.programming-challenges.com or http://online-judge.uva.es.
1-1. “The 3n + 1 Problem” – Programming Challenges 110101, UVA Judge 100.
1-2. “The Trip” – Programming Challenges 110103, UVA Judge 10137.
1-3. “Australian Voting” – Programming Challenges 110108, UVA Judge 10142.
2
Algorithm Analysis
Algorithms are the most important and durable part of computer science because they can be studied in a language- and machine-independent way. This means that we need techniques that enable us to compare the efficiency of algorithms without implementing them. Our two most important tools are (1) the RAM model of computation and (2) the asymptotic analysis of worst-case complexity. Assessing algorithmic performance makes use of the “big Oh” notation that, proves essential to compare algorithms and design more efficient ones. While the hopelessly practical person may blanch at the notion of theoretical analysis, we present the material because it really is useful in thinking about algorithms. This method of keeping score will be the most mathematically demanding part of this book. But once you understand the intuition behind these ideas, the for- malism becomes a lot easier to deal with.
2.1 The RAM Model of Computation
Machine-independent algorithm design depends upon a hypothetical computer called the Random Access Machine or RAM. Under this model of computation, we are confronted with a computer where:
• Each simple operation (+, *, –, =, if, call) takes exactly one time step.
• Loops and subroutines are not considered simple operations. Instead, they are the composition of many single-step operations. It makes no sense for sort to be a single-step operation, since sorting 1,000,000 items will certainly take much longer than sorting 10 items. The time it takes to run through a loop or execute a subprogram depends upon the number of loop iterations or the specific nature of the subprogram.
S.S. Skiena, The Algorithm Design Manual, 2nd ed., DOI: 10.1007/978-1-84800-070-4 2, c⃝ Springer-Verlag London Limited 2008
32 2. ALGORITHM ANALYSIS
• Each memory access takes exactly one time step. Further, we have as much memory as we need. The RAM model takes no notice of whether an item is in cache or on the disk.
Under the RAM model, we measure run time by counting up the number of steps an algorithm takes on a given problem instance. If we assume that our RAM executes a given number of steps per second, this operation count converts naturally to the actual running time. The RAM is a simple model of how computers perform. Perhaps it sounds too simple. After all, multiplying two numbers takes more time than adding two num- bers on most processors, which violates the first assumption of the model. Fancy compiler loop unrolling and hyperthreading may well violate the second assump- tion. And certainly memory access times differ greatly depending on whether data sits in cache or on the disk. This makes us zero for three on the truth of our basic assumptions. And yet, despite these complaints, the RAM proves an excellent model for understanding how an algorithm will perform on a real computer. It strikes a fine balance by capturing the essential behavior of computers while being simple to work with. We use the RAM model because it is useful in practice. Every model has a size range over which it is useful. Take, for example, the model that the Earth is flat. You might argue that this is a bad model, since it has been fairly well established that the Earth is in fact round. But, when laying the foundation of a house, the flat Earth model is sufficiently accurate that it can be reliably used. It is so much easier to manipulate a flat-Earth model that it is inconceivable that you would try to think spherically when you don’t have to.1
The same situation is true with the RAM model of computation. We make an abstraction that is generally very useful. It is quite difficult to design an algo- rithm such that the RAM model gives you substantially misleading results. The robustness of the RAM enables us to analyze algorithms in a machine-independent way.
Take-Home Lesson: Algorithms can be understood and studied in a language- and machine-independent manner.
2.1.1 Best, Worst, and Average-Case Complexity
Using the RAM model of computation, we can count how many steps our algorithm takes on any given input instance by executing it. However, to understand how good or bad an algorithm is in general, we must know how it works over all instances. To understand the notions of the best, worst, and average-case complexity, think about running an algorithm over all possible instances of data that can be
1The Earth is not completely spherical either, but a spherical Earth provides a useful model for such things as longitude and latitude.
2.1 THE RAM MODEL OF COMPUTATION 33
1 2 3 4 . . . . . . N
. .
Problem Size
Best Case
Average Case
Worst Case of Steps Number
Figure 2.1: Best, worst, and average-case complexity
fed to it. For the problem of sorting, the set of possible input instances consists of all possible arrangements of n keys, over all possible values of n. We can represent each input instance as a point on a graph (shown in Figure 2.1) where the x-axis represents the size of the input problem (for sorting, the number of items to sort), and the y-axis denotes the number of steps taken by the algorithm in this instance. These points naturally align themselves into columns, because only integers represent possible input size (e.g., it makes no sense to sort 10.57 items). We can define three interesting functions over the plot of these points:
• The worst-case complexity of the algorithm is the function defined by the maximum number of steps taken in any instance of size n. This represents the curve passing through the highest point in each column.
• The best-case complexity of the algorithm is the function defined by the min- imum number of steps taken in any instance of size n. This represents the curve passing through the lowest point of each column.
• The average-case complexity of the algorithm, which is the function defined by the average number of steps over all instances of size n.
The worst-case complexity proves to be most useful of these three measures in practice. Many people find this counterintuitive. To illustrate why, try to project what will happen if you bring n dollars into a casino to gamble. The best case, that you walk out owning the place, is possible but so unlikely that you should not even think about it. The worst case, that you lose all n dollars, is easy to calculate and distressingly likely to happen. The average case, that the typical bettor loses 87.32% of the money that he brings to the casino, is difficult to establish and its meaning subject to debate. What exactly does average mean? Stupid people lose
34 2. ALGORITHM ANALYSIS
more than smart people, so are you smarter or stupider than the average person, and by how much? Card counters at blackjack do better on average than customers who accept three or more free drinks. We avoid all these complexities and obtain a very useful result by just considering the worst case. The important thing to realize is that each of these time complexities define a numerical function, representing time versus problem size. These functions are as well defined as any other numerical function, be it y = x2 − 2x + 1 or the price of Google stock as a function of time. But time complexities are such complicated functions that we must simplify them to work with them. For this, we need the “Big Oh” notation.
2.2 The Big Oh Notation
The best, worst, and average-case time complexities for any given algorithm are numerical functions over the size of possible problem instances. However, it is very difficult to work precisely with these functions, because they tend to:
• Have too many bumps – An algorithm such as binary search typically runs a bit faster for arrays of size exactly n = 2k − 1 (where k is an integer), because the array partitions work out nicely. This detail is not particularly significant, but it warns us that the exact time complexity function for any algorithm is liable to be very complicated, with little up and down bumps as shown in Figure 2.2.
• Require too much detail to specify precisely – Counting the exact number of RAM instructions executed in the worst case requires the algorithm be specified to the detail of a complete computer program. Further, the precise answer depends upon uninteresting coding details (e.g., did he use a case statement or nested ifs?). Performing a precise worst-case analysis like
T(n) = 12754n2 + 4353n + 834 lg2 n + 13546
would clearly be very difficult work, but provides us little extra information than the observation that “the time grows quadratically with n.”
It proves to be much easier to talk in terms of simple upper and lower bounds of time-complexity functions using the Big Oh notation. The Big Oh simplifies our analysis by ignoring levels of detail that do not impact our comparison of algorithms. The Big Oh notation ignores the difference between multiplicative constants. The functions f(n) = 2n and g(n) = n are identical in Big Oh analysis. This makes sense given our application. Suppose a given algorithm in (say) C language ran twice as fast as one with the same algorithm written in Java. This multiplicative
2.2 THE BIG OH NOTATION 35
0 n f(n)
n
....... 4 3 2 1
lower bound
upper bound
Figure 2.2: Upper and lower bounds valid for n > n0 smooth out the behavior of complex functions
factor of two tells us nothing about the algorithm itself, since both programs imple- ment exactly the same algorithm. We ignore such constant factors when comparing two algorithms. The formal definitions associated with the Big Oh notation are as follows:
• f(n) = O(g(n)) means c · g(n) is an upper bound on f(n). Thus there exists
n ≥ n0 for some constant n0).
• f(n) = Ω(g(n)) means c · g(n) is a lower bound on f(n). Thus there exists some constant c such that f(n) is always ≥ c · g(n), for all n ≥ n0.
• f(n) = Θ(g(n)) means c1 · g(n) is an upper bound on f(n) and c2 · g(n) is a lower bound on f(n), for all n ≥ n0. Thus there exist constants c1 and c2 such that f(n) ≤ c1 ·g(n) and f(n) ≥ c2 ·g(n). This means that g(n) provides a nice, tight bound on f(n).
Got it? These definitions are illustrated in Figure 2.3. Each of these definitions assumes a constant n0 beyond which they are always satisfied. We are not concerned
care whether one sorting algorithm sorts six items faster than another, but seek which algorithm proves faster when sorting 10,000 or 1,000,000 items. The Big Oh notation enables us to ignore details and focus on the big picture.
Take-Home Lesson: The Big Oh notation and worst-case analysis are tools that greatly simplify our ability to compare the efficiency of algorithms.
Make sure you understand this notation by working through the following ex- amples. We choose certain constants (c and n0) in the explanations below because
some constant c such that f(n) is always ≤ c · g(n), for large enough n (i.e.,
about small values of n (i.e., anything to the left of n0). After all, we don’t really
36 2. ALGORITHM ANALYSIS
(c)
f(n)
c2*g(n)
n n0
c1*g(n) c*g(n)
f(n)
n n0
f(n)
c*g(n)
n n0
(b) (a)
Figure 2.3: Illustrating the big (a) O, (b) Ω, and (c) Θ notations
they work and make a point, but other pairs of constants will do exactly the same job. You are free to choose any constants that maintain the same inequality—ideally constants that make it obvious that the inequality holds:
3n2 − 100n + 6 = O(n2), because I choose c = 3 and 3n2 > 3n2 − 100n + 6;
3n2 − 100n + 6 = O(n3), because I choose c = 1 and n3 > 3n2 − 100n + 6 when n > 3;
3n2 − 100n + 6 ̸= O(n), because for any c I choose c × n < 3n2 when n > c;
3n2 − 100n + 6 = Ω(n2), because I choose c = 2 and 2n2 < 3n2 − 100n + 6 when n > 100;
3n2 − 100n + 6 ̸= Ω(n3), because I choose c n2 − 100n + 6 < n3 when n > 3;
3n2 − 100n + 6 = Ω(n), because for any c I choose cn < 3n2 − 100n + 6 when n > 100c;
3n2 − 100n + 6 = Θ(n2), because both O and Ω apply;
3n2 − 100n + 6 ̸= Θ(n3), because only O applies;
3n2 − 100n + 6 ̸= Θ(n), because only Ω applies.
The Big Oh notation provides for a rough notion of equality when comparing functions. It is somewhat jarring to see an expression like n2 = O(n3), but its meaning can always be resolved by going back to the definitions in terms of upper and lower bounds. It is perhaps most instructive to read the “=” here as meaning one of the functions that are. Clearly, n2 is one of functions that are O(n3).
= 1 and 3
2.3 GROWTH RATES AND DOMINANCE RELATIONS 37
Stop and Think: Back to the Definition
Problem: Is 2n+1 = Θ(2n)?
Solution: Designing novel algorithms requires cleverness and inspiration. However, applying the Big Oh notation is best done by swallowing any creative instincts you may have. All Big Oh problems can be correctly solved by going back to the definition and working with that.
• Is 2n+1 = O(2n)? Well, f(n) = O(g(n)) iff (if and only if) there exists a constant c such that for all sufficiently large n f(n) ≤ c · g(n). Is there? The key observation is that 2n+1 = 2 · 2n, so 2 · 2n ≤ c · 2n for any c ≥ 2.
• Is 2n+1 = Ω(2n)? Go back to the definition. f(n) = Ω(g(n)) iff there exists a constant c > 0 such that for all sufficiently large n f(n) ≥ c · g(n). This would be satisfied for any 0 < c ≤ 2. Together the Big Oh and Ω bounds imply 2n+1 = Θ(2n)
Stop and Think: Hip to the Squares?
Problem: Is (x + y)2 = O(x2 + y2).
Solution: Working with the Big Oh means going back to the definition at the slightest sign of confusion. By definition, this expression is valid iff we can find some c such that (x + y)2 ≤ c(x2 + y2). My first move would be to expand the left side of the equation, i.e. (x + y)2 = x2+2xy+y2. If the middle 2xy term wasn’t there, the inequality would clearly hold for any c > 1. But it is there, so we need to relate the 2xy to x2+y2. What if x ≤ y? Then 2xy ≤ 2y2 ≤ 2(x2 +y2). What if x ≥ y? Then 2xy ≤ 2x2 ≤ 2(x2 +y2). Either way, we now can bound this middle term by two times the right-side function. This means that (x + y)2 ≤ 3(x2 + y2), and so the result holds.
2.3 Growth Rates and Dominance Relations
With the Big Oh notation, we cavalierly discard the multiplicative constants. Thus, the functions f(n) = 0.001n2 and g(n) = 1000n2 are treated identically, even though g(n) is a million times larger than f(n) for all values of n.
38 2. ALGORITHM ANALYSIS
n f(n) lg n n n lg n n2 2n n!
10 0.003 μs 0.01 μs 0.033 μs 0.1 μs 1 μs 3.63 ms
20 0.004 μs 0.02 μs 0.086 μs 0.4 μs 1 ms 77.1 years
30 0.005 μs 0.03 μs 0.147 μs 0.9 μs 1 sec 8.4 × 1015 yrs
40 0.005 μs 0.04 μs 0.213 μs 1.6 μs 18.3 min
50 0.006 μs 0.05 μs 0.282 μs 2.5 μs 13 days
100 0.007 μs 0.1 μs 0.644 μs 10 μs 4 × 1013 yrs
1,000 0.010 μs 1.00 μs 9.966 μs 1 ms
10,000 0.013 μs 10 μs 130 μs 100 ms
100,000 0.017 μs 0.10 ms 1.67 ms 10 sec
1,000,000 0.020 μs 1 ms 19.93 ms 16.7 min
10,000,000 0.023 μs 0.01 sec 0.23 sec 1.16 days
100,000,000 0.027 μs 0.10 sec 2.66 sec 115.7 days
1,000,000,000 0.030 μs 1 sec 29.90 sec 31.7 years
Figure 2.4: Growth rates of common functions measured in nanoseconds
The reason why we are content with coarse Big Oh analysis is provided by Figure 2.4, which shows the growth rate of several common time analysis functions. In particular, it shows how long algorithms that use f(n) operations take to run on a fast computer, where each operation takes one nanosecond (10−9 seconds). The following conclusions can be drawn from this table:
• All such algorithms take roughly the same time for n = 10.
• Any algorithm with n! running time becomes useless for n ≥ 20.
• Algorithms whose running time is 2n have a greater operating range, but become impractical for n > 40.
• Quadratic-time algorithms whose running time is n2 remain usable up to about n = 10, 000, but quickly deteriorate with larger inputs. They are likely to be hopeless for n > 1,000,000.
• Linear-time and n lg n algorithms remain practical on inputs of one billion items.
• An O(lg n) algorithm hardly breaks a sweat for any imaginable value of n.
The bottom line is that even ignoring constant factors, we get an excellent idea of whether a given algorithm is appropriate for a problem of a given size. An algo- rithm whose running time is f(n) = n3 seconds will beat one whose running time is g(n) = 1,000,000 · n2 seconds only when n < 1,000,000. Such enormous differences in constant factors between algorithms occur far less frequently in practice than large problems do.
2.3 GROWTH RATES AND DOMINANCE RELATIONS 39
2.3.1 Dominance Relations
The Big Oh notation groups functions into a set of classes, such that all the func- tions in a particular class are equivalent with respect to the Big Oh. Functions f(n) = 0.34n and g(n) = 234,234n belong in the same class, namely those that are order Θ(n). Further, when two functions f and g belong to different classes, they are different with respect to our notation. Either f(n) = O(g(n)) or g(n) = O(f(n)), but not both. We say that a faster-growing function dominates a slower-growing one, just as a faster-growing country eventually comes to dominate the laggard. When f and g belong to different classes (i.e., f(n) ̸= Θ(g(n))), we say g dominates f when f(n) = O(g(n)), sometimes written g ≫ f. The good news is that only a few function classes tend to occur in the course of basic algorithm analysis. These suffice to cover almost all the algorithms we will discuss in this text, and are listed in order of increasing dominance:
• Constant functions, f(n) = 1 – Such functions might measure the cost of adding two numbers, printing out “The Star Spangled Banner,” or the growth realized by functions such as f(n) = min(n, 100). In the big picture, there is no dependence on the parameter n.
• Logarithmic functions, f(n) = log n – Logarithmic time-complexity shows up in algorithms such as binary search. Such functions grow quite slowly as n gets big, but faster than the constant function (which is standing still, after all). Logarithms will be discussed in more detail in Section 2.6 (page 46)
• Linear functions, f(n) = n – Such functions measure the cost of looking at each item once (or twice, or ten times) in an n-element array, say to identify the biggest item, the smallest item, or compute the average value.
• Superlinear functions, f(n) = n lg n – This important class of functions arises in such algorithms as Quicksort and Mergesort. They grow just a little faster than linear (see Figure 2.4), just enough to be a different dominance class.
• Quadratic functions, f(n) = n2 – Such functions measure the cost of looking at most or all pairs of items in an n-element universe. This arises in algorithms such as insertion sort and selection sort.
• Cubic functions, f(n) = n3 – Such functions enumerate through all triples of items in an n-element universe. These also arise in certain dynamic program- ming algorithms developed in Chapter 8.
• Exponential functions, f(n) = cn for a given constant c > 1 – Functions like 2n arise when enumerating all subsets of n items. As we have seen, exponential algorithms become useless fast, but not as fast as. . .
• Factorial functions, f(n) = n! – Functions like n! arise when generating all permutations or orderings of n items.
40 2. ALGORITHM ANALYSIS
The intricacies of dominance relations will be futher discussed in Section 2.9.2
(page 56). However, all you really need to understand is that:
n! ≫ 2n ≫ n3 ≫ n2 ≫ n log n ≫ n ≫ log n ≫ 1
Take-Home Lesson: Although esoteric functions arise in advanced algorithm analysis, a small variety of time complexities suffice and account for most algorithms that are widely used in practice.
2.4 Working with the Big Oh
You learned how to do simplifications of algebraic expressions back in high school. Working with the Big Oh requires dusting off these tools. Most of what you learned there still holds in working with the Big Oh, but not everything.
2.4.1 Adding Functions
The sum of two functions is governed by the dominant one, namely:
O(f(n)) + O(g(n)) → O(max(f(n), g(n)))
Ω(f(n)) + Ω(g(n)) → Ω(max(f(n), g(n)))
Θ(f(n)) + Θ(g(n)) → Θ(max(f(n), g(n)))
This is very useful in simplifying expressions, since it implies that n3 + n2 + n + 1 = O(n3). Everything is small potatoes besides the dominant term. The intuition is as follows. At least half the bulk of f(n) + g(n) must come from the larger value. The dominant function will, by definition, provide the larger value as n → ∞. Thus, dropping the smaller function from consideration reduces the value by at most a factor of 1/2, which is just a multiplicative constant. Suppose f(n) = O(n2) and g(n) = O(n2). This implies that f(n) + g(n) = O(n2) as well.
2.4.2 Multiplying Functions
Multiplication is like repeated addition. Consider multiplication by any constant c > 0, be it 1.02 or 1,000,000. Multiplying a function by a constant can not affect its asymptotic behavior, because we can multiply the bounding constants in the Big Oh analysis of c · f(n) by 1/c to give appropriate constants for the Big Oh analysis of f(n). Thus:
O(c · f(n)) → O(f(n))
Ω(c · f(n)) → Ω(f(n))
2.5 REASONING ABOUT EFFICIENCY 41
Θ(c · f(n)) → Θ(f(n))
Of course, c must be strictly positive (i.e., c > 0) to avoid any funny business, since we can wipe out even the fastest growing function by multiplying it by zero. On the other hand, when two functions in a product are increasing, both are important. The function O(n! log n) dominates n! just as much as log n dominates 1. In general,
O(f(n)) ∗ O(g(n)) → O(f(n) ∗ g(n))
Ω(f(n)) ∗ Ω(g(n)) → Ω(f(n) ∗ g(n))
Θ(f(n)) ∗ Θ(g(n)) → Θ(f(n) ∗ g(n))
Stop and Think: Transitive Experience
Problem: Show that Big Oh relationships are transitive. That is, if f(n) = O(g(n)) and g(n) = O(h(n)), then f(n) = O(h(n)).
Solution: We always go back to the definition when working with the Big Oh. What we need to show here is that f(n) ≤ c3h(n) for n > n3 given that f(n) ≤ c1g(n) and g(n) ≤ c2h(n), for n > n1 and n > n2, respectively. Cascading these inequalities, we get that
f(n) ≤ c1g(n) ≤ c1c2h(n)
for n > n3 = max(n1, n2).
2.5 Reasoning About Efficiency
Gross reasoning about an algorithm’s running time of is usually easy given a precise written description of the algorithm. In this section, I will work through several examples, perhaps in greater detail than necessary.
2.5.1 Selection Sort
Here we analyze the selection sort algorithm, which repeatedly identifies the small- est remaining unsorted element and puts it at the end of the sorted portion of the array. An animation of selection sort in action appears in Figure 2.5, and the code is shown below:
42 2. ALGORITHM ANALYSIS
S E L E C T I O N S O R T C E L E S T I O N S O R T C E L E S T I O N S O R T C E L E S T I O N S O R T C E E L S T I O N S O R T C E E L S T I O N S O R T C E E I S T L O N S O R T C E E I L T S O N S O R T C E E I L N S O T S O R T C E E I L N S O T S O R T C E E I L N O S T S O R T C E E I L N O S T S O R T C E E I L N O O T S S R T C E E I L N O O R S S T T C E E I L N O O R S S T T C E E I L N O O R S S T T C E E I L N O O R S S T T C E E I L N O O R S S T T C E E I L N O O R S S T T C E E I L N O O R S S T T C E E I L N O O R S S T T
Figure 2.5: Animation of selection sort in action
selection_sort(int s[], int n) { int i,j; /* counters */ int min; /* index of minimum */
for (i=0; i<n; i++) { min=i; for (j=i+1; j<n; j++) if (s[j] < s[min]) min=j; swap(&s[i],&s[min]); } }
The outer loop goes around n times. The nested inner loop goes around n−i−1 times, where i is the index of the outer loop. The exact number of times the if statement is executed is given by:
S(n) =
n−1
i=0
n−1
j=i+1 1 =
n−1
i=0 n − i − 1
What this sum is doing is adding up the integers in decreasing order starting from n − 1, i.e.
S(n) = (n − 1) + (n − 2) + (n − 3) + . . . + 2 + 1
How can we reason about such a formula? We must solve the summation formula using the techniques of Section 1.3.5 (page 17) to get an exact value. But, with the Big Oh we are only interested in the order of the expression. One way to think about it is that we are adding up n − 1 terms, whose average value is about n/2. This yields S(n) ≈ n(n − 1)/2.
2.5 REASONING ABOUT EFFICIENCY 43
2.5.2 Insertion Sort
A basic rule of thumb in Big Oh analysis is that worst-case running time follows from multiplying the largest number of times each nested loop can iterate. Consider the insertion sort algorithm presented on page 4, whose inner loops are repeated here:
for (i=1; i<n; i++) { j=i; while ((j>0) && (s[j] < s[j-1])) { swap(&s[j],&s[j-1]); j = j-1; } }
How often does the inner while loop iterate? This is tricky because there are two different stopping conditions: one to prevent us from running off the bounds of the array (j > 0) and the other to mark when the element finds its proper place in sorted order (s[j] < s[j − 1]). Since worst-case analysis seeks an upper bound on the running time, we ignore the early termination and assume that this loop always goes around i times. In fact, we can assume it always goes around n times since i < n. Since the outer loop goes around n times, insertion sort must be a quadratic-time algorithm, i.e. O(n2). This crude “round it up” analysis always does the job, in that the Big Oh running time bound you get will always be correct. Occasionally, it might be too generous, meaning the actual worst case time might be of a lower order than implied by such analysis. Still, I strongly encourage this kind of reasoning as a basis for simple algorithm analysis.
2.5.3 String Pattern Matching
Pattern matching is the most fundamental algorithmic operation on text strings. This algorithm implements the find command available in any web browser or text editor:
Problem: Substring Pattern Matching Input: A text string t and a pattern string p. Output: Does t contain the pattern p as a substring, and if so where?
Another way to think about it is in terms of upper and lower bounds. We have n terms at most, each of which is at most n−1. Thus, S(n) ≤ n(n−1) = O(n2). We have n/2 terms each that are bigger than n/2. Thus S(n) ≥ (n/2)×(n/2) = Ω(n2). Together, this tells us that the running time is Θ(n2), meaning that selection sort is quadratic.
44 2. ALGORITHM ANALYSIS
a a b a b b a
a a b b a
a b b a b
Figure 2.6: Searching for the substring abba in the text aababba.
Perhaps you are interested finding where “Skiena” appears in a given news article (well, I would be interested in such a thing). This is an instance of string pattern matching with t as the news article and p=“Skiena.” There is a fairly straightforward algorithm for string pattern matching that considers the possibility that p may start at each possible position in t and then tests if this is so.
int findmatch(char *p, char *t) { int i,j; /* counters */ int m, n; /* string lengths */
m = strlen(p); n = strlen(t);
for (i=0; i<=(n-m); i=i+1) { j=0; while ((j<m) && (t[i+j]==p[j])) j = j+1; if (j == m) return(i); }
return(-1); }
What is the worst-case running time of these two nested loops? The inner while loop goes around at most m times, and potentially far less when the pattern match fails. This, plus two other statements, lies within the outer for loop. The outer loop goes around at most n − m times, since no complete alignment is possible once we get too far to the right of the text. The time complexity of nested loops multiplies, so this gives a worst-case running time of O((n − m)(m + 2)). We did not count the time it takes to compute the length of the strings using the function strlen. Since the implementation of strlen is not given, we must guess how long it should take. If we explicitly count the number of characters until we
2.5 REASONING ABOUT EFFICIENCY 45
hit the end of the string; this would take time linear in the length of the string. This suggests that the running time should be O(n + m + (n − m)(m + 2)). Let’s use our knowledge of the Big Oh to simplify things. Since m + 2 = Θ(m), the “+2” isn’t interesting, so we are left with O(n + m + (n − m)m). Multiplying this out yields O(n + m + nm − m2), which still seems kind of ugly. However, in any interesting problem we know that n ≥ m, since it is impossible to have p as a substring of t for any pattern longer than the text itself. One consequence of this is that n + m ≤ 2n = Θ(n). Thus our worst-case running time simplifies further to O(n + nm − m2). Two more observations and we are done. First, note that n ≤ nm, since m ≥ 1 in any interesting pattern. Thus n + nm = Θ(nm), and we can drop the additive n, simplifying our analysis to O(nm − m2). Finally, observe that the −m2 term is negative, and thus only serves to lower the value within. Since the Big Oh gives an upper bound, we can drop any negative term without invalidating the upper bound. That n ≥ m implies that mn ≥ m2, so the negative term is not big enough to cancel any other term which is left. Thus we can simply express the worst-case running time of this algorithm as O(nm). After you get enough experience, you will be able to do such an algorithm analysis in your head without even writing the algorithm down. After all, algorithm design for a given task involves mentally rifling through different possibilities and selecting the best approach. This kind of fluency comes with practice, but if you are confused about why a given algorithm runs in O(f(n)) time, start by writing it out carefully and then employ the reasoning we used in this section.
2.5.4 Matrix Multiplication
Nested summations often arise in the analysis of algorithms with nested loops. Consider the problem of matrix multiplication:
Problem: Matrix Multiplication Input: Two matrices, A (of dimension x × y) and B (dimension y × z). Output: An x × z matrix C where C[i][j] is the dot product of the ith row of A and the jth column of B.
Matrix multiplication is a fundamental operation in linear algebra, presented with an example in catalog in Section 13.3 (page 401). That said, the elementary algorithm for matrix multiplication is implemented as a tight product of three nested loops:
for (i=1; i<=x; i++)
C[i][j] = 0;
C[i][j] += A[i][k] * B[k][j]; }
for (j=1; j<=z; j++) {
for (k=1; k<=y; k++)
46 2. ALGORITHM ANALYSIS
How can we analyze the time complexity of this algorithm? The number of multiplications M(x, y, z) is given by the following summation:
M(x, y, z) =
x
i=1
y
j=1
z
k=1 1
Sums get evaluated from the right inward. The sum of z ones is z, so
M(x, y, z) =
x
i=1
y
j=1 z
The sum of y zs is just as simple, yz, so
M(x, y, z) =
x
i=1 yz
Finally, the sum of x yzs is xyz. Thus the running of this matrix multiplication algorithm is O(xyz). If we con- sider the common case where all three dimensions are the same, this becomes
2.6 Logarithms and Their Applications
Logarithm is an anagram of algorithm, but that’s not why we need to know what logarithms are. You’ve seen the button on your calculator but may have forgotten why it is there. A logarithm is simply an inverse exponential function. Saying bx = y is equivalent to saying that x = logb y. Further, this definition is the same as saying blogb y = y. Exponential functions grow at a distressingly fast rate, as anyone who has ever tried to pay off a credit card balance understands. Thus, inverse exponen- tial functions—i.e. logarithms—grow refreshingly slowly. Logarithms arise in any process where things are repeatedly halved. We now look at several examples.
2.6.1 Logarithms and Binary Search
Binary search is a good example of an O(log n) algorithm. To locate a particular person p in a telephone book containing n names, you start by comparing p against the middle, or (n/2)nd name, say Monroe, Marilyn. Regardless of whether p belongs before this middle name (Dean, James) or after it (Presley, Elvis), after only one comparison you can discard one half of all the names in the book. The number of steps the algorithm takes equals the number of times we can halve n until only one name is left. By definition, this is exactly log2 n. Thus, twenty comparisons suffice to find any name in the million-name Manhattan phone book! Binary search is one of the most powerful ideas in algorithm design. This power becomes apparent if we imagine being forced to live in a world with only unsorted
O(n3)—i.e., a cubic algorithm.
2.6 LOGARITHMS AND THEIR APPLICATIONS 47
telephone books. Figure 2.4 shows that O(log n) algorithms are fast enough to be used on problem instances of essentially unlimited size.
2.6.2 Logarithms and Trees
A binary tree of height 1 can have up to 2 leaf nodes, while a tree of height two can have up to four leaves. What is the height h of a rooted binary tree with n leaf nodes? Note that the number of leaves doubles every time we increase the height by one. To account for n leaves, n = 2h which implies that h = log2 n. What if we generalize to trees that have d children, where d = 2 for the case of binary trees? A tree of height 1 can have up to d leaf nodes, while one of height two can have up to d2 leaves. The number of possible leaves multiplies by d every time we increase the height by one, so to account for n leaves, n = dh which implies that h = logd n, as shown in Figure 2.7.
The punch line is that very short trees can have very many leaves, which is the main reason why binary trees prove fundamental to the design of fast data structures.
2.6.3 Logarithms and Bits
There are two bit patterns of length 1 (0 and 1) and four of length 2 (00, 01, 10, and 11). How many bits w do we need to represent any one of n different possibilities, be it one of n items or the integers from 1 to n? The key observation is that there must be at least n different bit patterns of length w. Since the number of different bit patterns doubles as you add each bit, we need at least w bits where 2w = n—i.e., we need w = log2 n bits.
2.6.4 Logarithms and Multiplication
Logarithms were particularly important in the days before pocket calculators. They provided the easiest way to multiply big numbers by hand, either implicitly using a slide rule or explicitly by using a book of logarithms.
Figure 2.7: A height h tree with d children per node has dh leaves. Here h = 2 and d = 3
48 2. ALGORITHM ANALYSIS
Logarithms are still useful for multiplication, particularly for exponentiation.
the logs. A direct consequence of this is
loga nb = b · loga n
So how can we compute ab for any a and b using the exp(x) and ln(x) functions on your calculator, where exp(x) = ex and ln(x) = loge(x)? We know
ab = exp(ln(ab)) = exp(b ln a)
so the problem is reduced to one multiplication plus one call to each of these functions.
2.6.5 Fast Exponentiation
Suppose that we need to exactly compute the value of an for some reasonably large n. Such problems occur in primality testing for cryptography, as discussed in Section 13.8 (page 420). Issues of numerical precision prevent us from applying the formula above. The simplest algorithm performs n − 1 multiplications, by computing a × a × . . . × a. However, we can do better by observing that n = ⌊n/2⌋ + ⌈n/2⌉. If n is even, then an = (an/2)2. If n is odd, then an = a(a⌊n/2⌋)2. In either case, we have halved the size of our exponent at the cost of, at most, two multiplications, so O(lg n) multiplications suffice to compute the final value.
function power(a, n) if (n = 0) return(1) x = power(a, ⌊n/2⌋) if (n is even) then return(x2) else return(a × x2)
This simple algorithm illustrates an important principle of divide and conquer. It always pays to divide a job as evenly as possible. This principle applies to real life as well. When n is not a power of two, the problem cannot always be divided perfectly evenly, but a difference of one element between the two sides cannot cause any serious imbalance.
2.6.6 Logarithms and Summations
The Harmonic numbers arise as a special case of arithmetic progression, namely H(n) = S(n, −1). They reflect the sum of the progression of simple reciprocals, namely,
H(n) =
n
i=1 1/i ∼ ln n
Recall that loga(xy) = loga(x) + loga(y); i.e., the log of a product is the sum of
2.6 LOGARITHMS AND THEIR APPLICATIONS 49
Loss (apply the greatest) Increase in level
(A) $2,000 or less no increase
(B) More than $2,000 add 1
(C) More than $5,000 add 2
(D) More than $10,000 add 3
(E) More than $20,000 add 4
(F) More than $40,000 add 5
(G) More than $70,000 add 6
(H) More than $120,000 add 7
(I) More than $200,000 add 8
(J) More than $350,000 add 9
(K) More than $500,000 add 10
(L) More than $800,000 add 11
(M) More than $1,500,000 add 12
(N) More than $2,500,000 add 13
(O) More than $5,000,000 add 14
(P) More than $10,000,000 add 15
(Q) More than $20,000,000 add 16
(R) More than $40,000,000 add 17 add 18
Figure 2.8: The Federal Sentencing Guidelines for fraud
The Harmonic numbers prove important because they usually explain “where the log comes from” when one magically pops out from algebraic manipulation. For example, the key to analyzing the average case complexity of Quicksort is the sum- mation S(n) = n n i=1 1/i. Employing the Harmonic number identity immediately reduces this to Θ(n log n).
2.6.7 Logarithms and Criminal Justice
Figure 2.8 will be our final example of logarithms in action. This table appears in the Federal Sentencing Guidelines, used by courts throughout the United States. These guidelines are an attempt to standardize criminal sentences, so that a felon convicted of a crime before one judge receives the same sentence that they would before a different judge. To accomplish this, the judges have prepared an intricate point function to score the depravity of each crime and map it to time-to-serve. Figure 2.8 gives the actual point function for fraud—a table mapping dollars stolen to points. Notice that the punishment increases by one level each time the amount of money stolen roughly doubles. That means that the level of punishment (which maps roughly linearly to the amount of time served) grows logarithmically with the amount of money stolen.
(S) More than $80,000,000
50 2. ALGORITHM ANALYSIS
Think for a moment about the consequences of this. Many a corrupt CEO certainly has. It means that your total sentence grows extremely slowly with the amount of money you steal. Knocking off five liquor stores for $10,000 each will get you more time than embezzling $1,000,000 once. The corresponding benefit of stealing really large amounts of money is even greater. The moral of logarithmic growth is clear: “If you are gonna do the crime, make it worth the time!”
Take-Home Lesson: Logarithms arise whenever things are repeatedly halved or doubled.
2.7 Properties of Logarithms
As we have seen, stating bx = y is equivalent to saying that x = logb y. The b term is known as the base of the logarithm. Three bases are of particular importance for mathematical and historical reasons:
• Base b = 2 – The binary logarithm, usually denoted lg x, is a base 2 logarithm. We have seen how this base arises whenever repeated halving (i.e., binary search) or doubling (i.e., nodes in trees) occurs. Most algorithmic applications of logarithms imply binary logarithms.
• Base b = e – The natural log, usually denoted ln x, is a base e = 2.71828 . . . logarithm. The inverse of ln x is the exponential function exp(x) = ex on your calculator. Thus, composing these functions gives us
exp(ln x) = x
• Base b = 10 – Less common today is the base-10 or common logarithm, usually denoted as log x. This base was employed in slide rules and logarithm books in the days before pocket calculators.
We have already seen one important property of logarithms, namely that
loga(xy) = loga(x) + loga(y)
The other important fact to remember is that it is easy to convert a logarithm from one base to another. This is a consequence of the formula:
loga b = logc b
logc a
Two implications of these properties of logarithms are important to appreciate from an algorithmic perspective:
Thus, changing the base of log b from base-a to base-c simply involves multiplying by logc a. It is easy to convert a common log function to a natural log function, and vice versa.
2.8 WAR STORY: MYSTERY OF THE PYRAMIDS 51
• The base of the logarithm has no real impact on the growth rate- Compare the following three values: log2(1, 000, 000) = 19.9316, log3(1, 000, 000) = 12.5754, and log100(1, 000, 000) = 3. A big change in the base of the logarithm produces little difference in the value of the log. Changing the base of the log from a to c involves dividing by logc a. This conversion factor is lost to the Big Oh notation whenever a and c are constants. Thus we are usually justified in ignoring the base of the logarithm when analyzing algorithms.
• Logarithms cut any function down to size- The growth rate of the logarithm of any polynomial function is O(lg n). This follows because
loga nb = b · loga n
The power of binary search on a wide range of problems is a consequence of this observation. Note that doing a binary search on a sorted array of n2 things requires only twice as many comparisons as a binary search on n things.
Logarithms efficiently cut any function down to size. It is hard to do arith- metic on factorials except for logarithms, since
n! = Πn i=1i → log n! =
n
i=1 log i = Θ(n log n)
provides another way for logarithms to pop up in algorithm analysis.
Stop and Think: Importance of an Even Split
Problem: How many queries does binary search take on the million-name Manhat- tan phone book if each split was 1/3 to 2/3 instead of 1/2 to 1/2?
Solution: When performing binary searches in a telephone book, how important is it that each query split the book exactly in half? Not much. For the Manhattan telephone book, we now use log3/2(1, 000, 000) ≈ 35 queries in the worst case, not a significant change from log2(1, 000, 000) ≈ 20. The power of binary search comes from its logarithmic complexity, not the base of the log.
2.8 War Story: Mystery of the Pyramids
That look in his eyes should have warned me even before he started talking. “We want to use a parallel supercomputer for a numerical calculation up to 1,000,000,000, but we need a faster algorithm to do it.”
52 2. ALGORITHM ANALYSIS
I’d seen that distant look before. Eyes dulled from too much exposure to the raw horsepower of supercomputers—machines so fast that brute force seemed to elim- inate the need for clever algorithms; at least until the problems got hard enough. “I am working with a Nobel prize winner to use a computer on a famous problem in number theory. Are you familiar with Waring’s problem?” I knew some number theory. “Sure. Waring’s problem asks whether every integer can be expressed at least one way as the sum of at most four integer squares. For example, 78 = 82 + 32 + 22 + 12 = 72 + 52 + 22. I remember proving that four squares suffice to represent any integer in my undergraduate number theory class. Yes, it’s a famous problem but one that got solved about 200 years ago.” “No, we are interested in a different version of Waring’s problem. A pyramidal number is a number of the form (m3 − m)/6, for m ≥ 2. Thus the first several pyramidal numbers are 1, 4, 10, 20, 35, 56, 84, 120, and 165. The conjecture since 1928 is that every integer can be represented by the sum of at most five such pyramidal numbers. We want to use a supercomputer to prove this conjecture on all numbers from 1 to 1,000,000,000.” “Doing a billion of anything will take a substantial amount of time,” I warned. “The time you spend to compute the minimum representation of each number will be critical, because you are going to do it one billion times. Have you thought about what kind of an algorithm you are going to use?” “We have already written our program and run it on a parallel supercomputer. It works very fast on smaller numbers. Still, it takes much too much time as soon as we get to 100,000 or so.” Terrific, I thought. Our supercomputer junkie had discovered asymptotic growth. No doubt his algorithm ran in something like quadratic time, and he got burned as soon as n got large. “We need a faster program in order to get to one billion. Can you help us? Of course, we can run it on our parallel supercomputer when you are ready.” I am a sucker for this kind of challenge, finding fast algorithms to speed up programs. I agreed to think about it and got down to work. I started by looking at the program that the other guy had written. He had built an array of all the Θ(n1/3) pyramidal numbers from 1 to n inclusive.2 To test each number k in this range, he did a brute force test to establish whether it was the sum of two pyramidal numbers. If not, the program tested whether it was the sum of three of them, then four, and finally five, until it first got an answer. About 45% of the integers are expressible as the sum of three pyramidal numbers. Most of the remaining 55% require the sum of four, and usually each of these can be represented in many different ways. Only 241 integers are known to require the sum of five pyramidal numbers, the largest being 343,867. For about half of the n numbers, this algorithm presumably went through all of the three-tests and at least
2Why n1/3? Recall that pyramidal numbers are of the form (m3 − m)/6. The largest m such that the resulting number is at most n is roughly 3√
6n, so there are Θ(n1/3) such numbers.
2.8 WAR STORY: MYSTERY OF THE PYRAMIDS 53
some of the four-tests before terminating. Thus, the total time for this algorithm would be at least O(n × (n1/3) 3) = O(n2) time, where n = 1,000,000,000. No wonder his program cried “Uncle.” Anything that was going to do significantly better on a problem this large had to avoid explicitly testing all triples. For each value of k, we were seeking the smallest set of pyramidal numbers that add up to exactly to k. This problem is called the knapsack problem, and is discussed in Section 13.10 (page 427). In our case, the weights are the set of pyramidal numbers no greater than n, with an additional constraint that the knapsack holds exactly k items. A standard approach to solving knapsack precomputes the sum of smaller sub- sets of the items for use in computing larger subsets. If we have a table of all sums of two numbers and want to know whether k is expressible as the sum of three numbers, we can ask whether k is expressible as the sum of a single number plus a number in this two-table. Therefore I needed a table of all integers less than n that can be expressed as the sum of two of the 1,818 pyramidal numbers less than 1,000,000,000. There can be at most 1, 8182 = 3,305,124 of them. Actually, there are only about half this many after we eliminate duplicates and any sum bigger than our target. Building a sorted array storing these numbers would be no big deal. Let’s call this sorted data structure of all pair-sums the two-table. To find the minimum decomposition for a given k, I would first check whether it was one of the 1,818 pyramidal numbers. If not, I would then check whether k was in the sorted table of the sums of two pyramidal numbers. To see whether k was expressible as the sum of three such numbers, all I had to do was check whether k − p[i] was in the two-table for 1 ≤ i ≤ 1, 818. This could be done quickly using binary search. To see whether k was expressible as the sum of four pyramidal numbers, I had to check whether k − two[i] was in the two-table for any 1 ≤ i ≤ |two|. However, since almost every k was expressible in many ways as the sum of four pyramidal numbers, this test would terminate quickly, and the total time taken would be dominated by the cost of the threes. Testing whether k was the sum of three pyramidal numbers would take O(n1/3 lg n). Running this on each of the n integers gives an O(n4/3 lg n) algorithm for the complete job. Comparing this to his O(n2) algorithm for n = 1,000,000,000 suggested that my algorithm was a cool 30,000 times faster than his original! My first attempt to code this solved up to n = 1, 000, 000 on my ancient Sparc ELC in about 20 minutes. From here, I experimented with different data structures to represent the sets of numbers and different algorithms to search these tables. I tried using hash tables and bit vectors instead of sorted arrays, and experimented with variants of binary search such as interpolation search (see Section 14.2 (page 441)). My reward for this work was solving up to n =1,000,000 in under three minutes, a factor of six improvement over my original program. With the real thinking done, I worked to tweak a little more performance out of the program. I avoided doing a sum-of-four computation on any k when k − 1 was
54 2. ALGORITHM ANALYSIS
the sum-of-three, since 1 is a pyramidal number, saving about 10% of the total run time using this trick alone. Finally, I got out my profiler and tried some low-level tricks to squeeze a little more performance out of the code. For example, I saved another 10% by replacing a single procedure call with in line code. At this point, I turned the code over to the supercomputer guy. What he did with it is a depressing tale, which is reported in Section 7.10 (page 268). In writing up this war story, I went back to rerun my program more than ten years later. On my desktop SunBlade 150, getting to 1,000,000 now took 27.0 seconds using the gcc compiler without turning on any compiler optimization. With Level 4 optimization, the job ran in just 14.0 seconds—quite a tribute to the quality of the optimizer. The run time on my desktop machine improved by a factor of about three over the four-year period prior to my first edition of this book, with an additional 5.3 times over the last 11 years. These speedups are probably typical for most desktops. The primary lesson of this war story is to show the enormous potential for algorithmic speedups, as opposed to the fairly limited speedup obtainable via more expensive hardware. I sped his program up by about 30,000 times. His million- dollar computer had 16 processors, each reportedly five times faster on integer computations than the $3,000 machine on my desk. That gave a maximum potential speedup of less than 100 times. Clearly, the algorithmic improvement was the big winner here, as it is certain to be in any sufficiently large computation.
2.9 Advanced Analysis (*)
Ideally, each of us would be fluent in working with the mathematical techniques of asymptotic analysis. And ideally, each of us would be rich and good looking as well. In this section I will survey the major techniques and functions employed in advanced algorithm analysis. I consider this optional material—it will not be used elsewhere in the textbook section of this book. That said, it will make some of the complexity functions reported in the Hitchhiker’s Guide far less mysterious.
2.9.1 Esoteric Functions
The bread-and-butter classes of complexity functions were presented in Section 2.3.1 (page 39). More esoteric functions also make appearances in advanced algo- rithm analysis. Although we will not see them much in this book, it is still good business to know what they mean and where they come from:
The exact definition of this function and why it arises will not be discussed further. It is sufficient to think of it as geek talk for the slowest-growing
• Inverse Ackermann’s function f(n) = α(n) – This function arises in the detailed analysis of several algorithms, most notably the Union-Find data structure discussed in Section 6.1.3 (page 198).
2.9 ADVANCED ANALYSIS (*) 55
complexity function. Unlike the constant function f(n) = 1, it eventually gets to infinity as n → ∞, but it certainly takes its time about it. The value of α(n) < 5 for any value of n that can be written in this physical universe.
• f(n) = log log n – The “log log” function is just that—the logarithm of the logarithm of n. One natural example of how it might arise is doing a binary search on a sorted array of only lg n items.
• f(n) = log n/ log log n – This function grows a little slower than log n because it is divided by an even slower growing function.
To see where this arises, consider an n-leaf rooted tree of degree d. For binary trees, i.e. when d = 2, the height h is given
n = 2h → h = lg n
by taking the logarithm of both sides of the equation. Now consider the height of such a tree when the degree d = log n. Then
n = (log n)h → h = log n/ log log n
• f(n) = log2 n – This is the product of log functions—i.e., (log n) × (log n). It might arise if we wanted to count the bits looked at in doing a binary search on n items, each of which was an integer from 1 to (say) n2. Each such integer requires a lg(n2) = 2 lg n bit representation, and we look at lg n of them, for a total of 2 lg2 n bits.
The “log squared” function typically arises in the design of intricate nested data structures, where each node in (say) a binary tree represents another data structure, perhaps ordered on a different key.
• f(n) = √n – The square root is not so esoteric, but represents the class of “sublinear polynomials” since √n = n1/2. Such functions arise in building d-dimensional grids that contain n points. A √n × √n square has area n, and an n1/3 × n1/3 × n1/3 cube has volume n. In general, a d-dimensional hypercube of length n1/d on each side has volume d.
• f(n) = n(1+ϵ) – Epsilon (ϵ) is the mathematical symbol to denote a constant that can be made arbitrarily small but never quite goes away.
It arises in the following way. Suppose I design an algorithm that runs in 2cn(1+1/c) time, and I get to pick whichever c I want. For c = 2, this is 4n3/2
or O(n3/2). For c = 3, this is 8n4/3 or O(n4/3), which is better. Indeed, the exponent keeps getting better the larger I make c.
The problem is that I cannot make c arbitrarily large before the 2c term begins to dominate. Instead, we report this algorithm as running in O(n1+ϵ), and leave the best value of ϵ to the beholder.
56 2. ALGORITHM ANALYSIS
2.9.2 Limits and Dominance Relations
The dominance relation between functions is a consequence of the theory of lim- its, which you may recall from Calculus. We say that f(n) dominates g(n) if limn→∞ g(n)/f(n) = 0. Let’s see this definition in action. Suppose f(n) = 2n2 and g(n) = n2. Clearly f(n) > g(n) for all n, but it does not dominate since
lim n→∞ g(n)/f(n) = lim n→∞ n2/2n2 = lim n→∞ 1/2 ̸= 0
This is to be expected because both functions are in the class Θ(n2). What about f(n) = n3 and g(n) = n2? Since
lim n→∞ g(n)/f(n) = lim n→∞ n2/n3 = lim n→∞ 1/n = 0
the higher-degree polynomial dominates. This is true for any two polynomials, namely that na dominates nb if a > b since
lim n→∞ nb/na = lim n→∞ nb−a → 0
Thus n1.2 dominates n1.1999999. Now consider two exponential functions, say f(n) = 3n and g(n) = 2n. Since
lim n→∞ g(n)/f(n) = 2n/3n = lim n→∞(2/3)n = 0
the exponential with the higher base dominates. Our ability to prove dominance relations from scratch depends upon our ability to prove limits. Let’s look at one important pair of functions. Any polynomial (say f(n) = nϵ) dominates logarithmic functions (say g(n) = lg n). Since n = 2lg n,
f(n) = (2lg n)ϵ = 2ϵ lg n
Now consider
lim n→∞ g(n)/f(n) = lg n/2ϵ lg n
In fact, this does go to 0 as n → ∞.
Take-Home Lesson: By interleaving the functions here with those of Section 2.3.1 (page 39), we see where everything fits into the dominance pecking order:
n! ≫ cn ≫ n3 ≫ n2 ≫ n1+ϵ ≫ n log n ≫ n ≫ √n ≫
log2 n ≫ log n ≫ log n/ log log n ≫ log log n ≫ α(n) ≫ 1
Chapter Notes
Most other algorithm texts devote considerably more efforts to the formal analysis of algorithms than we have here, and so we refer the theoretically-inclined reader
2.10 EXERCISES 57
elsewhere for more depth. Algorithm texts more heavily stressing analysis include [CLRS01, KT06]. The book Concrete Mathematics by Knuth, Graham, and Patashnik [GKP89]
offers an interesting and thorough presentation of mathematics for the analysis of algorithms. Niven and Zuckerman [NZ80] is an nice introduction to number theory, including Waring’s problem, discussed in the war story. The notion of dominance also gives rise to the “Little Oh” notation. We say that f(n) = o(g(n)) iff g(n) dominates f(n). Among other things, the Little Oh proves useful for asking questions. Asking for an o(n2) algorithm means you want one that is better than quadratic in the worst case—and means you would be willing to settle for O(n1.999 log2 n).
2.10 Exercises
Program Analysis
2-1. [3] What value is returned by the following function? Express your answer as a function of n. Give the worst-case running time using the Big Oh notation. function mystery(n) r := 0 for i := 1 to n − 1 do for j := i + 1 to n do for k := 1 to j do r := r + 1 return(r)
2-2. [3] What value is returned by the following function? Express your answer as a function of n. Give the worst-case running time using Big Oh notation. function pesky(n) r := 0 for i := 1 to n do for j := 1 to i do for k := j to i + j do r := r + 1 return(r)
2-3. [5] What value is returned by the following function? Express your answer as a function of n. Give the worst-case running time using Big Oh notation. function prestiferous(n) r := 0 for i := 1 to n do for j := 1 to i do for k := j to i + j do for l := 1 to i + j − k do
58 2. ALGORITHM ANALYSIS
r := r + 1 return(r)
2-4. [8] What value is returned by the following function? Express your answer as a function of n. Give the worst-case running time using Big Oh notation. function conundrum(n) r := 0 for i := 1 to n do for j := i + 1 to n do for k := i + j − 1 to n do r := r + 1 return(r)
2-5. [5] Suppose the following algorithm is used to evaluate the polynomial
p(x) = anxn + an−1xn−1 + . . . + a1x + a0
p := a0; xpower := 1; for i := 1 to n do xpower := x ∗ xpower; p := p + ai ∗ xpower end
(a) How many multiplications are done in the worst-case? How many additions?
(b) How many multiplications are done on the average?
(c) Can you improve this algorithm?
2-6. [3] Prove that the following algorithm for computing the maximum value in an array A[1..n] is correct. function max(A) m := A[1] for i := 2 to n do if A[i] > m then m := A[i] return (m)
Big Oh
2-7. [3] True or False?
(a) Is 2n+1 = O(2n)?
(b) Is 22n = O(2n)?
2-8. [3] For each of the following pairs of functions, either f(n) is in O(g(n)), f(n) is in Ω(g(n)), or f(n) = Θ(g(n)). Determine which relationship is correct and briefly explain why.
(a) f(n) = log n2; g(n) = log n + 5
2.10 EXERCISES 59
(b) f(n) = √n; g(n) = log n2
(c) f(n) = log2 n; g(n) = log n
(d) f(n) = n; g(n) = log2 n
(e) f(n) = n log n + n; g(n) = log n
(f) f(n) = 10; g(n) = log 10
(g) f(n) = 2n; g(n) = 10n2
(h) f(n) = 2n; g(n) = 3n
2-9. [3] For each of the following pairs of functions f(n) and g(n), determine whether f(n) = O(g(n)), g(n) = O(f(n)), or both.
(a) f(n) = (n2 − n)/2, g(n) = 6n
(b) f(n) = n + 2√n, g(n) = n2
(c) f(n) = n log n, g(n) = n√n/2
(d) f(n) = n + log n, g(n) = √n
(e) f(n) = 2(log n)2, g(n) = log n + 1
(f) f(n) = 4n log n + n, g(n) = (n2 − n)/2
2-10. [3] Prove that n3 − 3n2 − n + 1 = Θ(n3).
2-11. [3] Prove that n2 = O(2n).
2-12. [3] For each of the following pairs of functions f(n) and g(n), give an appropriate positive constant c such that f(n) ≤ c · g(n) for all n > 1.
(a) f(n) = n2 + n + 1, g(n) = 2n3
(b) f(n) = n√n + n2, g(n) = n2
(c) f(n) = n2 − n + 1, g(n) = n2/2
2-13. [3] Prove that if f1(n) = O(g1(n)) and f2(n) = O(g2(n)), then f1(n) + f2(n) = O(g1(n) + g2(n)).
2-14. [3] Prove that if f1(N) = Ω(g1(n)) and f2(n) = Ω(g2(n)), then f1(n) + f2(n) = Ω(g1(n) + g2(n)).
2-15. [3] Prove that if f1(n) = O(g1(n)) and f2(n) = O(g2(n)), then f1(n) · f2(n) = O(g1(n) · g2(n))
2-16. [5] Prove for all k ≥ 1 and all sets of constants {ak, ak−1, . . . , a1, a0} ∈ R,
aknk + ak−1nk−1 + .... + a1n + a0 = O(nk)
2-17. [5] Show that for any real constants a and b, b > 0
(n + a)b = Θ(nb)
2-18. [5] List the functions below from the lowest to the highest order. If any two or more are of the same order, indicate which.
60 2. ALGORITHM ANALYSIS
n 2n n lg n ln n n − n3 + 7n5 lg n √n en
n2 + lg n n2 2n−1 lg lg n n3 (lg n)2 n! n1+ε where 0 < ε < 1
2-19. [5] List the functions below from the lowest to the highest order. If any two or more are of the same order, indicate which. √n n 2n
n log n n − n3 + 7n5 n2 + log n n2 n3 log n n 1 3 + log n (log n)2 n! ln n n
log n log log n (1/3)n (3/2)n 6
2-20. [5] Find two functions f(n) and g(n) that satisfy the following relationship. If no such f and g exist, write “None.”
(a) f(n) = o(g(n)) and f(n) ̸= Θ(g(n))
(b) f(n) = Θ(g(n)) and f(n) = o(g(n))
(c) f(n) = Θ(g(n)) and f(n) ̸= O(g(n))
(d) f(n) = Ω(g(n)) and f(n) ̸= O(g(n))
2-21. [5] True or False?
(a) 2n2 + 1 = O(n2)
(b) √n = O(log n)
(c) log n = O(√n)
(d) n2(1 + √n) = O(n2 log n)
(e) 3n2 + √n = O(n2)
(f) √n log n = O(n)
(g) log n = O(n−1/2)
2-22. [5] For each of the following pairs of functions f(n) and g(n), state whether f(n) = O(g(n)), f(n) = Ω(g(n)), f(n) = Θ(g(n)), or none of the above.
(a) f(n) = n2 + 3n + 4, g(n) = 6n + 7
(b) f(n) = n√n, g(n) = n2 − n
(c) f(n) = 2n − n2, g(n) = n4 + n2
2-23. [3] For each of these questions, briefly explain your answer. (a) If I prove that an algorithm takes O(n2) worst-case time, is it possible that it takes O(n) on some inputs? (b) If I prove that an algorithm takes O(n2) worst-case time, is it possible that it takes O(n) on all inputs? (c) If I prove that an algorithm takes Θ(n2) worst-case time, is it possible that it takes O(n) on some inputs?
2.10 EXERCISES 61
(d) If I prove that an algorithm takes Θ(n2) worst-case time, is it possible that it takes O(n) on all inputs? (e) Is the function f(n) = Θ(n2), where f(n) = 100n2 for even n and f(n) = 20n2 − n log2 n for odd n?
2-24. [3] For each of the following, answer yes, no, or can’t tell. Explain your reasoning. (a) Is 3n = O(2n)? (b) Is log 3n = O(log 2n)? (c) Is 3n = Ω(2n)? (d) Is log 3n = Ω(log 2n)?
2-25. [5] For each of the following expressions f(n) find a simple g(n) such that f(n) = Θ(g(n)).
(a) f(n) = n i=1 1 i .
(b) f(n) = n i=1⌈ 1
i ⌉.
(c) f(n) = n i=1 log i.
(d) f(n) = log(n!).
2-26. [5] Place the following functions into increasing asymptotic order. f1(n) = n2 log2 n, f2(n) = n(log2 n)2, f3(n) = n i=0 2i, f4(n) = log2(n i=0 2i).
2-27. [5] Place the following functions into increasing asymptotic order. If two or more of the functions are of the same asymptotic order then indicate this.
f1(n) = n i=1 √
i, f2(n) = (√n) log n, f3(n) = n√log n, f4(n) = 12n 3 2 + 4n,
2-28. [5] For each of the following expressions f(n) find a simple g(n) such that f(n) = Θ(g(n)). (You should be able to prove your result by exhibiting the rel- evant parameters, but this is not required for the homework.)
(a) f(n) = n i=1 3i4 + 2i3 − 19i + 20.
(b) f(n) = n i=1 3(4i) + 2(3i) − i19 + 20.
(c) f(n) = n i=1 5i + 32i.
2-29. [5] Which of the following are true?
(a) n i=1 3i = Θ(3n−1).
(b) n i=1 3i = Θ(3n).
(c) n i=1 3i = Θ(3n+1).
2-30. [5] For each of the following functions f find a simple function g such that f(n) = Θ(g(n)).
(a) f1(n) = (1000)2n + 4n.
(b) f2(n) = n + n log n + √n.
(c) f3(n) = log(n20) + (log n)10.
(d) f4(n) = (0.99)n + n100.
62 2. ALGORITHM ANALYSIS
2-31. [5] For each pair of expressions (A, B) below, indicate whether A is O, o, Ω, ω, or Θ of B. Note that zero, one or more of these relations may hold for a given pair; list all correct ones. A B (a) n100 2n
(b) (lg n)12 √n (c) √n ncos(πn/8)
(d) 10n 100n
(e) nlg n (lg n)n
(f) lg (n!) n lg n
Summations
2-32. [5] Prove that:
12 − 22 + 32 − 42 + . . . + (−1)k−1k2 = (−1)k−1k(k + 1)/2
2-33. [5] Find an expression for the sum of the ith row of the following triangle, and prove its correctness. Each entry is the sum of the three entries directly above it. All non existing entries are considered 0.
1 1 1 1 1 2 3 2 1 1 3 6 7 6 3 1 1 4 10 16 19 16 10 4 1
2-34. [3] Assume that Christmas has n days. Exactly how many presents did my “true love” send me? (Do some research if you do not understand this question.)
2-35. [5] Consider the following code fragment.
for i=1 to n do for j=i to 2*i do output ‘‘foobar’’
Let T(n) denote the number of times ‘foobar’ is printed as a function of n.
a. Express T(n) as a summation (actually two nested summations).
b. Simplify the summation. Show your work.
2-36. [5] Consider the following code fragment.
for i=1 to n/2 do for j=i to n-i do for k=1 to j do output ‘‘foobar’’
Assume n is even. Let T(n) denote the number of times ‘foobar’ is printed as a function of n.
(a) Express T(n) as three nested summations.
(b) Simplify the summation. Show your work.
2.10 EXERCISES 63
2-37. [6] When you first learned to multiply numbers, you were told that x × y means adding x a total of y times, so 5×4 = 5+5+5+5 = 20. What is the time complexity of multiplying two n-digit numbers in base b (people work in base 10, of course, while computers work in base 2) using the repeated addition method, as a function of n and b. Assume that single-digit by single-digit addition or multiplication takes O(1) time. (Hint: how big can y be as a function of n and b?)
2-38. [6] In grade school, you learned to multiply long numbers on a digit-by-digit basis,
complexity of multiplying two n-digit numbers with this method as a function of n (assume constant base size). Assume that single-digit by single-digit addition or multiplication takes O(1) time.
Logarithms
2-39. [5] Prove the following identities on logarithms:
(a) Prove that loga(xy) = loga x + loga y
(b) Prove that loga xy = y loga x
(c) Prove that loga x = logb x
logb a
(d) Prove that xlogb y = ylogb x
2-40. [3] Show that ⌈lg(n + 1)⌉ = ⌊lg n⌋ + 1
2-41. [3] Prove that that the binary representation of n ≥ 1 has ⌊lg2 n⌋ + 1 bits.
2-42. [5] In one of my research papers I give a comparison-based sorting algorithm that runs in O(n log(√n)). Given the existence of an Ω(n log n) lower bound for sorting, how can this be possible?
Interview Problems
2-43. [5] You are given a set S of n numbers. You must pick a subset S′ of k numbers from S such that the probability of each element of S occurring in S′ is equal (i.e., each is selected with probability k/n). You may make only one pass over the numbers. What if n is unknown?
2-44. [5] We have 1,000 data items to store on 1,000 nodes. Each node can store copies of exactly three different items. Propose a replication scheme to minimize data loss as nodes fail. What is the expected number of data entries that get lost when three random nodes fail?
2-45. [5] Consider the following algorithm to find the minimum element in an array of numbers A[0, . . . , n]. One extra variable tmp is allocated to hold the current minimum value. Start from A[0]; ”tmp” is compared against A[1], A[2], . . . , A[N] in order. When A[i] < tmp, tmp = A[i]. What is the expected number of times that the assignment operation tmp = A[i] is performed?
2-46. [5] You have a 100-story building and a couple of marbles. You must identify the lowest floor for which a marble will break if you drop it from this floor. How fast can you find this floor if you are given an infinite supply of marbles? What if you have only two marbles?
so that 127 × 211 = 127 × 1 + 127 × 10 + 127 × 200 = 26, 797. Analyze the time
64 2. ALGORITHM ANALYSIS
2-47. [5] You are given 10 bags of gold coins. Nine bags contain coins that each weigh 10 grams. One bag contains all false coins that weigh one gram less. You must identify this bag in just one weighing. You have a digital balance that reports the weight of what is placed on it.
2-48. [5] You have eight balls all of the same size. Seven of them weigh the same, and one of them weighs slightly more. How can you find the ball that is heavier by using a balance and only two weighings?
2-49. [5] Suppose we start with n companies that eventually merge into one big company. How many different ways are there for them to merge?
2-52. [7] Reconsider the pirate problem above, where only one indivisible dollar is to be divided. Who gets the dollar and how many are killed?
Programming Challenges
These programming challenge problems with robot judging are available at http://www.programming-challenges.com or http://online-judge.uva.es.
2-1. “Primary Arithmetic” – Programming Challenges 110501, UVA Judge 10035.
2-2. “A Multiplication Game” – Programming Challenges 110505, UVA Judge 847.
2-3. “Light, More Light” – Programming Challenges 110701, UVA Judge 10110.
2-51. [7] Six pirates must divide $300 dollars among themselves. The division is to pro- ceed as follows. The senior pirate proposes a way to divide the money. Then the pirates vote. If the senior pirate gets at least half the votes he wins, and that divi- sion remains. If he doesn’t, he is killed and then the next senior-most pirate gets a chance to do the division. Now you have to tell what will happen and why (i.e., how many pirates survive and how the division is done)? All the pirates are intel- ligent and the first priority is to stay alive and the next priority is to get as much money as possible.
2-50. [5] A Ramanujan-Hardy number can be written two different ways as the sum of two cubes—i.e., there exist distinct a, b, c, and d such that a3 + b3 = c3 + d3. Generate all Ramanujam numbers where a, b, c, d < n.
3
Data Structures
Changing a data structure in a slow program can work the same way an organ transplant does in a sick patient. Important classes of abstract data types such as containers, dictionaries, and priority queues, have many different but functionally equivalent data structures that implement them. Changing the data structure does not change the correctness of the program, since we presumably replace a correct implementation with a different correct implementation. However, the new imple- mentation of the data type realizes different tradeoffs in the time to execute various operations, so the total performance can improve dramatically. Like a patient in need of a transplant, only one part might need to be replaced in order to fix the problem. But it is better to be born with a good heart than have to wait for a replace- ment. The maximum benefit from good data structures results from designing your program around them in the first place. We assume that the reader has had some previous exposure to elementary data structures and pointer manipulation. Still, data structure (CS II) courses these days focus more on data abstraction and ob- ject orientation than the nitty-gritty of how structures should be represented in memory. We will review this material to make sure you have it down. In data structures, as with most subjects, it is more important to really un- derstand the basic material than have exposure to more advanced concepts. We will focus on each of the three fundamental abstract data types (containers, dic- tionaries, and priority queues) and see how they can be implemented with arrays and lists. Detailed discussion of the tradeoffs between more sophisticated imple- mentations is deferred to the relevant catalog entry for each of these data types.
S.S. Skiena, The Algorithm Design Manual, 2nd ed., DOI: 10.1007/978-1-84800-070-4 3, c⃝ Springer-Verlag London Limited 2008
66 3. DATA STRUCTURES
3.1 Contiguous vs. Linked Data Structures
Data structures can be neatly classified as either contiguous or linked, depending upon whether they are based on arrays or pointers:
• Contiguously-allocated structures are composed of single slabs of memory, and include arrays, matrices, heaps, and hash tables.
• Linked data structures are composed of distinct chunks of memory bound together by pointers, and include lists, trees, and graph adjacency lists.
In this section, we review the relative advantages of contiguous and linked data structures. These tradeoffs are more subtle than they appear at first glance, so I encourage readers to stick with me here even if you may be familiar with both types of structures.
3.1.1 Arrays
The array is the fundamental contiguously-allocated data structure. Arrays are structures of fixed-size data records such that each element can be efficiently located by its index or (equivalently) address. A good analogy likens an array to a street full of houses, where each array element is equivalent to a house, and the index is equivalent to the house number. Assuming all the houses are equal size and numbered sequentially from 1 to n, we can compute the exact position of each house immediately from its address.1
Advantages of contiguously-allocated arrays include:
• Constant-time access given the index – Because the index of each element maps directly to a particular memory address, we can access arbitrary data items instantly provided we know the index.
• Space efficiency – Arrays consist purely of data, so no space is wasted with links or other formatting information. Further, end-of-record information is not needed because arrays are built from fixed-size records.
• Memory locality – A common programming idiom involves iterating through all the elements of a data structure. Arrays are good for this because they exhibit excellent memory locality. Physical continuity between successive data accesses helps exploit the high-speed cache memory on modern computer architectures.
The downside of arrays is that we cannot adjust their size in the middle of a program’s execution. Our program will fail soon as we try to add the (n +
1Houses in Japanese cities are traditionally numbered in the order they were built, not by their physical location. This makes it extremely difficult to locate a Japanese address without a detailed map.
3.1 CONTIGUOUS VS. LINKED DATA STRUCTURES 67
1)st customer, if we only allocate room for n records. We can compensate by allocating extremely large arrays, but this can waste space, again restricting what our programs can do. Actually, we can efficiently enlarge arrays as we need them, through the miracle of dynamic arrays. Suppose we start with an array of size 1, and double its size from m to 2m each time we run out of space. This doubling process involves allocating a new contiguous array of size 2m, copying the contents of the old array to the lower half of the new one, and returning the space used by the old array to the storage allocation system. The apparent waste in this procedure involves the recopying of the old contents on each expansion. How many times might an element have to be recopied after a total of n insertions? Well, the first inserted element will have been recopied when the array expands after the first, second, fourth, eighth, . . . insertions. It will take log2 n doublings until the array gets to have n positions. However, most elements do not suffer much upheaval. Indeed, the (n/2 + 1)st through nth elements will move at most once and might never have to move at all. If half the elements move once, a quarter of the elements twice, and so on, the total number of movements M is given by
M =
lg n
i=1 i · n/2i = n
lg n
i=1 i/2i ≤ n
∞
i=1 i/2i = 2n
Thus, each of the n elements move only two times on average, and the total work of managing the dynamic array is the same O(n) as it would have been if a single array of sufficient size had been allocated in advance! The primary thing lost using dynamic arrays is the guarantee that each array access takes constant time in the worst case. Now all the queries will be fast, except for those relatively few queries triggering array doubling. What we get instead is a promise that the nth array access will be completed quickly enough that the total effort expended so far will still be O(n). Such amortized guarantees arise frequently in the analysis of data structures.
3.1.2 Pointers and Linked Structures
Pointers are the connections that hold the pieces of linked structures together. Pointers represent the address of a location in memory. A variable storing a pointer to a given data item can provide more freedom than storing a copy of the item itself. A cell-phone number can be thought of as a pointer to its owner as they move about the planet. Pointer syntax and power differ significantly across programming languages, so we begin with a quick review of pointers in C language. A pointer p is assumed to
68 3. DATA STRUCTURES
Clinton Jefferson Lincoln NIL
Figure 3.1: Linked list example showing data and pointer fields
All linked data structures share certain properties, as revealed by the following linked list type declaration:
typedef struct list { item_type item; /* data item */ struct list *next; /* point to successor */ } list;
In particular:
• Each node in our data structure (here list) contains one or more data fields (here item) that retain the data that we need to store.
• Each node contains a pointer field to at least one other node (here next). This means that much of the space used in linked data structures has to be devoted to pointers, not data.
• Finally, we need a pointer to the head of the structure, so we know where to access it.
The list is the simplest linked structure. The three basic operations supported by lists are searching, insertion, and deletion. In doubly-linked lists, each node points both to its predecessor and its successor element. This simplifies certain operations at a cost of an extra pointer field per node.
Searching a List
Searching for item x in a linked list can be done iteratively or recursively. We opt for recursively in the implementation below. If x is in the list, it is either the first element or located in the smaller rest of the list. Eventually, we reduce the problem to searching in an empty list, which clearly cannot contain x.
2C permits direct manipulation of memory addresses in ways which may horrify Java programmers, but we will avoid doing any such tricks.
give the address in memory where a particular chunk of data is located.2 Pointers in C have types declared at compile time, denoting the data type of the items they can point to. We use *p to denote the item that is pointed to by pointer p, and &x to denote the address of (i.e., pointer to) of a particular variable x. A special NULL pointer value is used to denote structure-terminating or unassigned pointers.
3.1 CONTIGUOUS VS. LINKED DATA STRUCTURES 69
list *search_list(list *l, item_type x) { if (l == NULL) return(NULL);
if (l->item == x) return(l); else return( search_list(l->next, x) ); }
Insertion into a List
Insertion into a singly-linked list is a nice exercise in pointer manipulation, as shown below. Since we have no need to maintain the list in any particular order, we might as well insert each new item in the simplest place. Insertion at the beginning of the list avoids any need to traverse the list, but does require us to update the pointer (denoted l) to the head of the data structure.
void insert_list(list **l, item_type x) { list *p; /* temporary pointer */
p = malloc( sizeof(list) ); p->item = x; p->next = *l; *l = p; }
Two C-isms to note. First, the malloc function allocates a chunk of memory of sufficient size for a new node to contain x. Second, the funny double star (**l) denotes that l is a pointer to a pointer to a list node. Thus the last line, *l=p;
Deletion From a List
Deletion from a linked list is somewhat more complicated. First, we must find a pointer to the predecessor of the item to be deleted. We do this recursively:
copies p to the place pointed to by l, which is the external variable maintaining access to the head of the list.
70 3. DATA STRUCTURES
list *predecessor_list(list *l, item_type x) { if ((l == NULL) || (l->next == NULL)) {
return(NULL); }
if ((l->next)->item == x) return(l); else return( predecessor_list(l->next, x) ); }
The predecessor is needed because it points to the doomed node, so its next pointer must be changed. The actual deletion operation is simple, once ruling out the case that the to-be-deleted element does not exist. Special care must be taken to reset the pointer to the head of the list (l) when the first element is deleted:
delete_list(list **l, item_type x) { list *p; /* item pointer */ list *pred; /* predecessor pointer */ list *search_list(), *predecessor_list();
p = search_list(*l,x); if (p != NULL) { pred = predecessor_list(*l,x);
*l = p->next; else pred->next = p->next;
free(p); /* free memory used by node */ } }
C language requires explicit deallocation of memory, so we must free the deleted node after we are finished with it to return the memory to the system.
3.1.3 Comparison
The relative advantages of linked lists over static arrays include:
• Overflow on linked structures can never occur unless the memory is actually full.
//predecessor sought on null list
if (pred == NULL) /* splice out of list */
3.2 STACKS AND QUEUES 71
• Insertions and deletions are simpler than for contiguous (array) lists.
• With large records, moving pointers is easier and faster than moving the items themselves.
while the relative advantages of arrays include:
• Linked structures require extra space for storing pointer fields.
• Linked lists do not allow efficient random access to items.
• Arrays allow better memory locality and cache performance than random pointer jumping.
Take-Home Lesson: Dynamic memory allocation provides us with flexibility on how and where we use our limited storage resources.
One final thought about these fundamental structures is that they can be thought of as recursive objects:
• Lists – Chopping the first element off a linked list leaves a smaller linked list. This same argument works for strings, since removing characters from string leaves a string. Lists are recursive objects.
• Arrays – Splitting the first k elements off of an n element array gives two smaller arrays, of size k and n − k, respectively. Arrays are recursive objects.
This insight leads to simpler list processing, and efficient divide-and-conquer algorithms such as quicksort and binary search.
3.2 Stacks and Queues
We use the term container to denote a data structure that permits storage and retrieval of data items independent of content. By contrast, dictionaries are abstract data types that retrieve based on key values or content, and will be discussed in Section 3.3 (page 72). Containers are distinguished by the particular retrieval order they support. In the two most important types of containers, this retrieval order depends on the insertion order:
• Stacks – Support retrieval by last-in, first-out (LIFO) order. Stacks are simple to implement and very efficient. For this reason, stacks are probably the right container to use when retrieval order doesn’t matter at all, such as when processing batch jobs. The put and get operations for stacks are usually called push and pop:
72 3. DATA STRUCTURES
– Push(x,s): Insert item x at the top of stack s.
– Pop(s): Return (and remove) the top item of stack s.
LIFO order arises in many real-world contexts. People crammed into a subway car exit in LIFO order. Food inserted into my refrigerator usually exits the same way, despite the incentive of expiration dates. Algorithmically, LIFO tends to happen in the course of executing recursive algorithms.
• Queues – Support retrieval in first in, first out (FIFO) order. This is surely the fairest way to control waiting times for services. You want the container holding jobs to be processed in FIFO order to minimize the maximum time spent waiting. Note that the average waiting time will be the same regardless of whether FIFO or LIFO is used. Many computing applications involve data items with infinite patience, which renders the question of maximum waiting time moot.
Queues are somewhat trickier to implement than stacks and thus are most appropriate for applications (like certain simulations) where the order is im- portant. The put and get operations for queues are usually called enqueue and dequeue.
– Enqueue(x,q): Insert item x at the back of queue q.
– Dequeue(q): Return (and remove) the front item from queue q.
We will see queues later as the fundamental data structure controlling breadth-first searches in graphs.
Stacks and queues can be effectively implemented using either arrays or linked lists. The key issue is whether an upper bound on the size of the container is known in advance, thus permitting the use of a statically-allocated array.
3.3 Dictionaries
The dictionary data type permits access to data items by content. You stick an item into a dictionary so you can find it when you need it. The primary operations of dictionary support are:
• Search(D,k) – Given a search key k, return a pointer to the element in dic- tionary D whose key value is k, if one exists.
• Insert(D,x) – Given a data item x, add it to the set in the dictionary D.
• Delete(D,x) – Given a pointer to a given data item x in the dictionary D, remove it from D.
3.3 DICTIONARIES 73
Certain dictionary data structures also efficiently support other useful opera- tions:
• Max(D) or Min(D) – Retrieve the item with the largest (or smallest) key from D. This enables the dictionary to serve as a priority queue, to be discussed in Section 3.5 (page 83).
through the elements of the data structure.
Many common data processing tasks can be handled using these dictionary operations. For example, suppose we want to remove all duplicate names from a mailing list, and print the results in sorted order. Initialize an empty dictionary D, whose search key will be the record name. Now read through the mailing list, and for each record search to see if the name is already in D. If not, insert it into D. Once finished, we must extract the remaining names out of the dictionary. By starting from the first item Min(D) and repeatedly calling Successor until we obtain Max(D), we traverse all elements in sorted order. By defining such problems in terms of abstract dictionary operations, we avoid the details of the data structure’s representation and focus on the task at hand. In the rest of this section, we will carefully investigate simple dictionary imple- mentations based on arrays and linked lists. More powerful dictionary implemen- tations such as binary search trees (see Section 3.4 (page 77)) and hash tables (see Section 3.7 (page 89)) are also attractive options in practice. A complete discussion of different dictionary data structures is presented in the catalog in Section 12.1 (page 367). We encourage the reader to browse through the data structures section of the catalog to better learn what your options are.
Stop and Think: Comparing Dictionary Implementations (I)
Problem: What are the asymptotic worst-case running times for each of the seven fundamental dictionary operations (search, insert, delete, successor, predecessor, minimum, and maximum) when the data structure is implemented as:
• An unsorted array.
• A sorted array.
• Predecessor(D,x) or Successor(D,x) – Retrieve the item from D whose key is immediately before (or after) x in sorted order. These enable us to iterate
Solution: This problem (and the one following it) reveals some of the inherent tradeoffs of data structure design. A given data representation may permit effi- cient implementation of certain operations at the cost that other operations are expensive.
74 3. DATA STRUCTURES
In addition to the array in question, we will assume access to a few extra variables such as n—the number of elements currently in the array. Note that we must maintain the value of these variables in the operations where they change (e.g., insert and delete), and charge these operations the cost of this maintenance. The basic dictionary operations can be implemented with the following costs on unsorted and sorted arrays, respectively:
Unsorted Sorted
Dictionary operation array array
Search(L, k) O(n) O(log n)
Insert(L, x) O(1) O(n)
Delete(L, x) O(1)∗ O(n)
Successor(L, x) O(n) O(1)
Predecessor(L, x) O(n) O(1)
Minimum(L) O(n) O(1)
Maximum(L) O(n) O(1)
We must understand the implementation of each operation to see why. First, we discuss the operations when maintaining an unsorted array A.
• Search is implemented by testing the search key k against (potentially) each element of an unsorted array. Thus, search takes linear time in the worst case, which is when key k is not found in A.
• Insertion is implemented by incrementing n and then copying item x to the nth cell in the array, A[n]. The bulk of the array is untouched, so this operation takes constant time.
• Deletion is somewhat trickier, hence the superscript(∗) in the table. The definition states that we are given a pointer x to the element to delete, so we need not spend any time searching for the element. But removing the xth element from the array A leaves a hole that must be filled. We could fill the hole by moving each of the elements A[x + 1] to A[n] up one position, but this requires Θ(n) time when the first element is deleted. The following idea is better: just write over A[x] with A[n], and decrement n. This only takes constant time.
• The definition of the traversal operations, Predecessor and Successor, refer to the item appearing before/after x in sorted order. Thus, the answer is not simply A[x − 1] (or A[x + 1]), because in an unsorted array an element’s physical predecessor (successor) is not necessarily its logical predecessor (suc- cessor). Instead, the predecessor of A[x] is the biggest element smaller than A[x]. Similarly, the successor of A[x] is the smallest element larger than A[x]. Both require a sweep through all n elements of A to determine the winner.
• Minimum and Maximum are similarly defined with respect to sorted order, and so require linear sweeps to identify in an unsorted array.
3.3 DICTIONARIES 75
Implementing a dictionary using a sorted array completely reverses our notions of what is easy and what is hard. Searches can now be done in O(log n) time, using binary search, because we know the median element sits in A[n/2]. Since the upper and lower portions of the array are also sorted, the search can continue recursively on the appropriate portion. The number of halvings of n until we get to a single element is ⌈lg n⌉. The sorted order also benefits us with respect to the other dictionary retrieval operations. The minimum and maximum elements sit in A[1] and A[n], while the predecessor and successor to A[x] are A[x − 1] and A[x + 1], respectively. Insertion and deletion become more expensive, however, because making room for a new item or filling a hole may require moving many items arbitrarily. Thus both become linear-time operations.
Take-Home Lesson: Data structure design must balance all the different op- erations it supports. The fastest data structure to support both operations A and B may well not be the fastest structure to support either operation A or B.
Stop and Think: Comparing Dictionary Implementations (II)
Problem: What is the asymptotic worst-case running times for each of the seven fundamental dictionary operations when the data structure is implemented as
• A singly-linked unsorted list.
• A doubly-linked unsorted list.
• A singly-linked sorted list.
• A doubly-linked sorted list.
Solution: Two different issues must be considered in evaluating these implementa- tions: singly- vs. doubly-linked lists and sorted vs. unsorted order. Subtle operations are denoted with a superscript:
Singly Double Singly Doubly
Dictionary operation unsorted unsorted sorted sorted
Search(L, k) O(n) O(n) O(n) O(n)
Insert(L, x) O(1) O(1) O(n) O(n)
Delete(L, x) O(n)∗ O(1) O(n)∗ O(1)
Successor(L, x) O(n) O(n) O(1) O(1)
Predecessor(L, x) O(n) O(n) O(n)∗ O(1)
Minimum(L) O(n) O(n) O(1) O(1)
Maximum(L) O(n) O(n) O(1)∗ O(1)
76 3. DATA STRUCTURES
As with unsorted arrays, search operations are destined to be slow while main- tenance operations are fast.
• Insertion/Deletion – The complication here is deletion from a singly-linked list. The definition of the Delete operation states we are given a pointer x to the item to be deleted. But what we really need is a pointer to the element pointing to x in the list, because that is the node that needs to be changed. We can do nothing without this list predecessor, and so must spend linear time searching for it on a singly-linked list. Doubly-linked lists avoid this problem, since we can immediately retrieve the list predecessor of x.
Deletion is faster for sorted doubly-linked lists than sorted arrays, because splicing out the deleted element from the list is more efficient than filling the hole by moving array elements. The predecessor pointer problem again complicates deletion from singly-linked sorted lists.
• Search – Sorting provides less benefit for linked lists than it did for arrays. Bi- nary search is no longer possible, because we can’t access the median element without traversing all the elements before it. What sorted lists do provide is quick termination of unsuccessful searches, for if we have not found Abbott by the time we hit Costello we can deduce that he doesn’t exist. Still, searching takes linear time in the worst case.
• Traversal operations – The predecessor pointer problem again complicates implementing Predecessor. The logical successor is equivalent to the node successor for both types of sorted lists, and hence can be implemented in constant time.
• Maximum – The maximum element sits at the tail of the list, which would normally require Θ(n) time to reach in either singly- or doubly-linked lists.
However, we can maintain a separate pointer to the list tail, provided we pay the maintenance costs for this pointer on every insertion and deletion. The tail pointer can be updated in constant time on doubly-linked lists: on insertion check whether last->next still equals NULL, and on deletion set last to point to the list predecessor of last if the last element is deleted.
We have no efficient way to find this predecessor for singly-linked lists. So why can we implement maximum in Θ(1) on singly-linked lists? The trick is to charge the cost to each deletion, which already took linear time. Adding an extra linear sweep to update the pointer does not harm the asymptotic complexity of Delete, while gaining us Maximum in constant time as a reward for clear thinking.
3.4 BINARY SEARCH TREES 77
1 2
1 3
1 3
1
2
3
2
2
3
1
2
3
Figure 3.2: The five distinct binary search trees on three nodes
3.4 Binary Search Trees
We have seen data structures that allow fast search or flexible update, but not fast search and flexible update. Unsorted, doubly-linked lists supported insertion and deletion in O(1) time but search took linear time in the worse case. Sorted arrays support binary search and logarithmic query times, but at the cost of linear-time update. Binary search requires that we have fast access to two elements—specifically the median elements above and below the given node. To combine these ideas, we need a “linked list” with two pointers per node. This is the basic idea behind binary search trees. A rooted binary tree is recursively defined as either being (1) empty, or (2) consisting of a node called the root, together with two rooted binary trees called the left and right subtrees, respectively. The order among “brother” nodes matters in rooted trees, so left is different from right. Figure 3.2 gives the shapes of the five distinct binary trees that can be formed on three nodes. A binary search tree labels each node in a binary tree with a single key such that for any node labeled x, all nodes in the left subtree of x have keys < x while all nodes in the right subtree of x have keys > x. This search tree labeling scheme is very special. For any binary tree on n nodes, and any set of n keys, there is exactly one labeling that makes it a binary search tree. The allowable labelings for three-node trees are given in Figure 3.2.
3.4.1 Implementing Binary Search Trees
Binary tree nodes have left and right pointer fields, an (optional) parent pointer, and a data field. These relationships are shown in Figure 3.3; a type declaration for the tree structure is given below:
78 3. DATA STRUCTURES
left right
parent
Figure 3.3: Relationships in a binary search tree (left). Finding the minimum (center) and maximum (right) elements in a binary search tree
typedef struct tree { item_type item; /* data item */ struct tree *parent; /* pointer to parent */ struct tree *left; /* pointer to left child */ struct tree *right; /* pointer to right child */ } tree;
The basic operations supported by binary trees are searching, traversal, inser- tion, and deletion.
Searching in a Tree
at the root. Unless it contains the query key x, proceed either left or right depending upon whether x occurs before or after the root key. This algorithm works because both the left and right subtrees of a binary search tree are themselves binary search trees. This recursive structure yields the recursive search algorithm below:
tree *search_tree(tree *l, item_type x) { if (l == NULL) return(NULL);
if (l->item == x) return(l);
if (x < l->item) return( search_tree(l->left, x) ); else return( search_tree(l->right, x) ); }
The binary search tree labeling uniquely identifies where each key is located. Start
3.4 BINARY SEARCH TREES 79
This search algorithm runs in O(h) time, where h denotes the height of the tree.
Finding Minimum and Maximum Elements in a Tree
Implementing the find-minimum operation requires knowing where the minimum element is in the tree. By definition, the smallest key must reside in the left subtree of the root, since all keys in the left subtree have values less than that of the root. Therefore, as shown in Figure 3.3, the minimum element must be the leftmost descendent of the root. Similarly, the maximum element must be the rightmost descendent of the root.
tree *find_minimum(tree *t) { tree *min; /* pointer to minimum */
if (t == NULL) return(NULL);
min = t; while (min->left != NULL) min = min->left; return(min); }
Traversal in a Tree
Visiting all the nodes in a rooted binary tree proves to be an important component of many algorithms. It is a special case of traversing all the nodes and edges in a graph, which will be the foundation of Chapter 5. A prime application of tree traversal is listing the labels of the tree nodes. Binary search trees make it easy to report the labels in sorted order. By definition, all the keys smaller than the root must lie in the left subtree of the root, and all keys bigger than the root in the right subtree. Thus, visiting the nodes recursively in accord with such a policy produces an in-order traversal of the search tree:
void traverse_tree(tree *l) { if (l != NULL) { traverse_tree(l->left); process_item(l->item); traverse_tree(l->right); }
}
80 3. DATA STRUCTURES
Each item is processed once during the course of traversal, which runs in O(n) time, where n denotes the number of nodes in the tree. Alternate traversal orders come from changing the position of process item relative to the traversals of the left and right subtrees. Processing the item first yields a pre-order traversal, while processing it last gives a post-order traversal. These make relatively little sense with search trees, but prove useful when the rooted tree represents arithmetic or logical expressions.
Insertion in a Tree
There is only one place to insert an item x into a binary search tree T where we know we can find it again. We must replace the NULL pointer found in T after an unsuccessful query for the key k. This implementation uses recursion to combine the search and node insertion stages of key insertion. The three arguments to insert tree are (1) a pointer l to the pointer linking the search subtree to the rest of the tree, (2) the key x to be inserted, and (3) a parent pointer to the parent node containing l. The node is allocated and linked in on hitting the NULL pointer. Note that we pass the pointer to the appropriate left/right pointer in the node during the search, so the assignment *l = p; links the new node into the tree:
insert_tree(tree **l, item_type x, tree *parent) { tree *p; /* temporary pointer */
if (*l == NULL) { p = malloc(sizeof(tree)); /* allocate new node */ p->item = x; p->left = p->right = NULL; p->parent = parent; *l = p; /* link into parent’s record */ return; }
if (x < (*l)->item) insert_tree(&((*l)->left), x, *l); else insert_tree(&((*l)->right), x, *l); }
Allocating the node and linking it in to the tree is a constant-time operation after the search has been performed in O(h) time.
3.4 BINARY SEARCH TREES 81
initial tree delete node with zero children (3)
5
5
2
6
8
7
3
1
2
8
7
4
3
1
2
5
6
8
7
4
1
delete node with