3. Plotting the classics¶
In this example, we will explore statistics for two classic novels: The Adventures of Huckleberry Finn by Mark Twain, and Little Women by Louisa May Alcott. The text of any book can be read by a computer at great speed. Books published before 1923 are currently in the public domain, meaning that everyone has the right to copy or use the text in any way. Project Gutenberg is a website that publishes public domain books online. Using Python, we can load the text of these books directly from the web.
This example is meant to illustrate some of the broad themes of this text. Don’t worry if the details of the program don’t yet make sense. Instead, focus on interpreting the images generated below. Later sections of the text will describe most of the features of the Python programming language used below.
First, we read the text of both books into lists of chapters, called huck_finn_chapters and little_women_chapters. In Python, a name cannot contain any spaces, and so we will often use an underscore _ to stand in for a space. The = in the lines below give a name on the left to the result of some computation described on the right. A uniform resource locator or URL is an address on the Internet for some content; in this case, the text of a book. The # symbol starts a comment, which is ignored by the computer but helpful for people reading the code.
# Read two books, fast!
huck_finn_url = 'https://www.inferentialthinking.com/data/huck_finn.txt'
huck_finn_text = read_url(huck_finn_url)
huck_finn_chapters = huck_finn_text.split('CHAPTER ')[44:]
little_women_url = 'https://www.inferentialthinking.com/data/little_women.txt'
little_women_text = read_url(little_women_url)
little_women_chapters = little_women_text.split('CHAPTER ')[1:]
While a computer cannot understand the text of a book, it can provide us with some insight into the structure of the text. The name huck_finn_chapters is currently bound to a list of all the chapters in the book. We can place them into a table to see how each chapter begins.
# Display the chapters of Huckleberry Finn in a dataframe.
pd.DataFrame({'Chapters':huck_finn_chapters})
| Chapters | |
|---|---|
| 0 | I. YOU don't know about me without you have re... | 
| 1 | II. WE went tiptoeing along a path amongst the... | 
| 2 | III. WELL, I got a good going-over in the morn... | 
| 3 | IV. WELL, three or four months run along, and ... | 
| 4 | V. I had shut the door to. Then I turned aroun... | 
| 5 | VI. WELL, pretty soon the old man was up and a... | 
| 6 | VII. "GIT up! What you 'bout?" I opened my eye... | 
| 7 | VIII. THE sun was up so high when I waked that... | 
| 8 | IX. I wanted to go and look at a place right a... | 
| 9 | X. AFTER breakfast I wanted to talk about the ... | 
| 10 | XI. "COME in," says the woman, and I did. She ... | 
| 11 | XII. IT must a been close on to one o'clock wh... | 
| 12 | XIII. WELL, I catched my breath and most faint... | 
| 13 | XIV. BY and by, when we got up, we turned over... | 
| 14 | XV. WE judged that three nights more would fet... | 
| 15 | XVI. WE slept most all day, and started out at... | 
| 16 | XVII. IN about a minute somebody spoke out of ... | 
| 17 | XVIII. COL. Grangerford was a gentleman, you s... | 
| 18 | XIX. TWO or three days and nights went by; I r... | 
| 19 | XX. THEY asked us considerable many questions;... | 
| 20 | XXI. IT was after sun-up now, but we went righ... | 
| 21 | XXII. THEY swarmed up towards Sherburn's house... | 
| 22 | XXIII. WELL, all day him and the king was hard... | 
| 23 | XXIV. NEXT day, towards night, we laid up unde... | 
| 24 | XXV. THE news was all over town in two minutes... | 
| 25 | XXVI. WELL, when they was all gone the king he... | 
| 26 | XXVII. I crept to their doors and listened; th... | 
| 27 | XXVIII. BY and by it was getting-up time. So I... | 
| 28 | XXIX. THEY was fetching a very nice-looking ol... | 
| 29 | XXX. WHEN they got aboard the king went for me... | 
| 30 | XXXI. WE dasn't stop again at any town for day... | 
| 31 | XXXII. WHEN I got there it was all still and S... | 
| 32 | XXXIII. SO I started for town in the wagon, an... | 
| 33 | XXXIV. WE stopped talking, and got to thinking... | 
| 34 | XXXV. IT would be most an hour yet till breakf... | 
| 35 | XXXVI. AS soon as we reckoned everybody was as... | 
| 36 | XXXVII. THAT was all fixed. So then we went aw... | 
| 37 | XXXVIII. MAKING them pens was a distressid tou... | 
| 38 | XXXIX. IN the morning we went up to the villag... | 
| 39 | XL. WE was feeling pretty good after breakfast... | 
| 40 | XLI. THE doctor was an old man; a very nice, k... | 
| 41 | XLII. THE old man was uptown again before brea... | 
| 42 | THE LAST THE first time I catched Tom private ... | 
Each chapter begins with a chapter number in Roman numerals, followed by the first sentence of the chapter. Project Gutenberg has printed the first word of each chapter in upper case.
