[빅데이터시스템] Word Count / Link Prediction

이상현·2020년 11월 4일
0
sc = SparkContext.getOrCreate()

TextFile

text_file = sc.textFile("./4.pyspark/test1.txt")
text_file.collect()
['ROMEO AND JULIET',
 '',
 '',
 'ACT I',
 '',
 '',
 '',
 'SCENE I\tVerona. A public place.',
 '',
 '',
 '\t[Enter SAMPSON and GREGORY, of the house of Capulet,',
 '\tarmed with swords and bucklers]',
 '',
 "SAMPSON\tGregory, o' my word, we'll not carry coals.",
 '',
 'GREGORY\tNo, for then we should be colliers.',
 '',
 "SAMPSON\tI mean, an we be in choler, we'll draw.",
 '',
 "GREGORY\tAy, while you live, draw your neck out o' the collar.",
 '',
 'SAMPSON\tI strike quickly, being moved.',
 '',
 'GREGORY\tBut thou art not quickly moved to strike.',
 '',
 'SAMPSON\tA dog of the house of Montague moves me.',
 '',
 'GREGORY\tTo move is to stir; and to be valiant is to stand:',
 "\ttherefore, if thou art moved, thou runn'st away.",
 '',
 'SAMPSON\tA dog of that house shall move me to stand: I will',
 "\ttake the wall of any man or maid of Montague's.",
 '',
 'GREGORY\tThat shows thee a weak slave; for the weakest goes',
 '\tto the wall.',
 '',
 'SAMPSON\tTrue; and therefore women, being the weaker vessels,',
 '\tare ever thrust to the wall: therefore I will push',
 "\tMontague's men from the wall, and thrust his maids",
 '\tto the wall.',
 '',
 'GREGORY\tThe quarrel is between our masters and us their men.',
 '',
 "SAMPSON\t'Tis all one, I will show myself a tyrant: when I",
 '\thave fought with the men, I will be cruel with the',
 '\tmaids, and cut off their heads.',
 '',
 'GREGORY\tThe heads of the maids?',
 '',
 'SAMPSON\tAy, the heads of the maids, or their maidenheads;',
 '\ttake it in what sense thou wilt.',
 '',
 'GREGORY\tThey must take it in sense that feel it.',
 '',
 'SAMPSON\tMe they shall feel while I am able to stand: and',
 "\t'tis known I am a pretty piece of flesh.",
 '',
 "GREGORY\t'Tis well thou art not fish; if thou hadst, thou",
 '\thadst been poor John. Draw thy tool! here comes',
 '\ttwo of the house of the Montagues.',
 '',
 'SAMPSON\tMy naked weapon is out: quarrel, I will back thee.',
 '',
 'GREGORY\tHow! turn thy back and run?',
 '',
 'SAMPSON\tFear me not.',
 '',
 'GREGORY\tNo, marry; I fear thee!',
 '',
 'SAMPSON\tLet us take the law of our sides; let them begin.',
 '',
 'GREGORY\tI will frown as I pass by, and let them take it as',
 '\tthey list.',
 '',
 'SAMPSON\tNay, as they dare. I will bite my thumb at them;',
 '\twhich is a disgrace to them, if they bear it.',
 '',
 '\t[Enter ABRAHAM and BALTHASAR]',
 '',
 'ABRAHAM\tDo you bite your thumb at us, sir?',
 '',
 'SAMPSON\tI do bite my thumb, sir.',
 '',
 'ABRAHAM\tDo you bite your thumb at us, sir?',
 '',
 'SAMPSON\t[Aside to GREGORY]  Is the law of our side, if I say',
 '\tay?',
 '',
 'GREGORY\tNo.',
 '',
 'SAMPSON\tNo, sir, I do not bite my thumb at you, sir, but I',
 '\tbite my thumb, sir.',
 '',
 'GREGORY\tDo you quarrel, sir?',
 '',
 'ABRAHAM\tQuarrel sir! no, sir.',
 '',
 'SAMPSON\tIf you do, sir, I am for you: I serve as good a man as you.',
 '',
 'ABRAHAM\tNo better.',
 '',
 'SAMPSON\tWell, sir.',
 '',
 "GREGORY\tSay 'better:' here comes one of my master's kinsmen.",
 '',
 'SAMPSON\tYes, better, sir.',
 '',
 'ABRAHAM\tYou lie.',
 '',
 'SAMPSON\tDraw, if you be men. Gregory, remember thy swashing blow.',
 '',
 '\t[They fight]',
 '',
 '\t[Enter BENVOLIO]',
 '',
 'BENVOLIO\tPart, fools!',
 '\tPut up your swords; you know not what you do.',
 '',
 '\t[Beats down their swords]',
 '',
 '\t[Enter TYBALT]',
 '',
 'TYBALT\tWhat, art thou drawn among these heartless hinds?',
 '\tTurn thee, Benvolio, look upon thy death.',
 '',
 'BENVOLIO\tI do but keep the peace: put up thy sword,',
 '\tOr manage it to part these men with me.',
 '',
 'TYBALT\tWhat, drawn, and talk of peace! I hate the word,',
 '\tAs I hate hell, all Montagues, and thee:',
 '\tHave at thee, coward!',
 '',
 '\t[They fight]',
 '',
 '\t[Enter, several of both houses, who join the fray;',
 '\tthen enter Citizens, with clubs]',
 '',
 'First Citizen\tClubs, bills, and partisans! strike! beat them down!',
 '\tDown with the Capulets! down with the Montagues!',
 '',
 '\t[Enter CAPULET in his gown, and LADY CAPULET]',
 '',
 'CAPULET\tWhat noise is this? Give me my long sword, ho!',
 '',
 'LADY CAPULET\tA crutch, a crutch! why call you for a sword?',
 '',
 'CAPULET\tMy sword, I say! Old Montague is come,',
 '\tAnd flourishes his blade in spite of me.',
 '',
 '\t[Enter MONTAGUE and LADY MONTAGUE]',
 '',
 'MONTAGUE\tThou villain Capulet,--Hold me not, let me go.',
 '',
 'LADY MONTAGUE\tThou shalt not stir a foot to seek a foe.',
 '',
 '\t[Enter PRINCE, with Attendants]',
 '',
 'PRINCE\tRebellious subjects, enemies to peace,',
 '\tProfaners of this neighbour-stained steel,--',
 '\tWill they not hear? What, ho! you men, you beasts,',
 '\tThat quench the fire of your pernicious rage',
 '\tWith purple fountains issuing from your veins,',
 '\tOn pain of torture, from those bloody hands',
 "\tThrow your mistemper'd weapons to the ground,",
 '\tAnd hear the sentence of your moved prince.',
 '\tThree civil brawls, bred of an airy word,',
 '\tBy thee, old Capulet, and Montague,',
 "\tHave thrice disturb'd the quiet of our streets,",
 "\tAnd made Verona's ancient citizens",
 '\tCast by their grave beseeming ornaments,',
 '\tTo wield old partisans, in hands as old,',
 "\tCanker'd with peace, to part your canker'd hate:",
 '\tIf ever you disturb our streets again,',
 '\tYour lives shall pay the forfeit of the peace.',
 '\tFor this time, all the rest depart away:',
 '\tYou Capulet; shall go along with me:',
 '\tAnd, Montague, come you this afternoon,',
 '\tTo know our further pleasure in this case,',
 '\tTo old Free-town, our common judgment-place.',
 '\tOnce more, on pain of death, all men depart.',
 '',
 '\t[Exeunt all but MONTAGUE, LADY MONTAGUE, and BENVOLIO]',
 '',
 'MONTAGUE\tWho set this ancient quarrel new abroach?',
 '\tSpeak, nephew, were you by when it began?',
 '',
 'BENVOLIO\tHere were the servants of your adversary,',
 '\tAnd yours, close fighting ere I did approach:',
 '\tI drew to part them: in the instant came',
 '\tThe fiery Tybalt, with his sword prepared,',
 '\tWhich, as he breathed defiance to my ears,',
 '\tHe swung about his head and cut the winds,',
 "\tWho nothing hurt withal hiss'd him in scorn:",
 '\tWhile we were interchanging thrusts and blows,',
 '\tCame more and more and fought on part and part,',
 '\tTill the prince came, who parted either part.',
 '',
 'LADY MONTAGUE\tO, where is Romeo? saw you him to-day?',
 '\tRight glad I am he was not at this fray.',
 '',
 "BENVOLIO\tMadam, an hour before the worshipp'd sun",
 "\tPeer'd forth the golden window of the east,",
 '\tA troubled mind drave me to walk abroad;',
 '\tWhere, underneath the grove of sycamore',
 "\tThat westward rooteth from the city's side,",
 '\tSo early walking did I see your son:',
 '\tTowards him I made, but he was ware of me',
 '\tAnd stole into the covert of the wood:',
 '\tI, measuring his affections by my own,',
 "\tThat most are busied when they're most alone,",
 '\tPursued my humour not pursuing his,',
 "\tAnd gladly shunn'd who gladly fled from me.",
 '',
 'MONTAGUE\tMany a morning hath he there been seen,',
 '\tWith tears augmenting the fresh morning dew.',
 '\tAdding to clouds more clouds with his deep sighs;',
 '\tBut all so soon as the all-cheering sun',
 '\tShould in the furthest east begin to draw',
 "\tThe shady curtains from Aurora's bed,",
 '\tAway from the light steals home my heavy son,',
 '\tAnd private in his chamber pens himself,',
 '\tShuts up his windows, locks far daylight out',
 '\tAnd makes himself an artificial night:',
 '\tBlack and portentous must this humour prove,',
 '\tUnless good counsel may the cause remove.',
 '',
 'BENVOLIO\tMy noble uncle, do you know the cause?',
 '',
 'MONTAGUE\tI neither know it nor can learn of him.',
 '',
 'BENVOLIO\tHave you importuned him by any means?',
 '',
 'MONTAGUE\tBoth by myself and many other friends:',
 "\tBut he, his own affections' counsellor,",
 '\tIs to himself--I will not say how true--',
 '\tBut to himself so secret and so close,',
 '\tSo far from sounding and discovery,',
 '\tAs is the bud bit with an envious worm,',
 '\tEre he can spread his sweet leaves to the air,',
 '\tOr dedicate his beauty to the sun.',
 '\tCould we but learn from whence his sorrows grow.',
 '\tWe would as willingly give cure as know.',
 '',
 '\t[Enter ROMEO]',
 '',
 'BENVOLIO\tSee, where he comes: so please you, step aside;',
 "\tI'll know his grievance, or be much denied.",
 '',
 'MONTAGUE\tI would thou wert so happy by thy stay,',
 "\tTo hear true shrift. Come, madam, let's away.",
 '',
 '\t[Exeunt MONTAGUE and LADY MONTAGUE]',
 '',
 'BENVOLIO\tGood-morrow, cousin.',
 '',
 'ROMEO\tIs the day so young?',
 '',
 'BENVOLIO\tBut new struck nine.',
 '',
 'ROMEO\tAy me! sad hours seem long.',
 '\tWas that my father that went hence so fast?',
 '',
 "BENVOLIO\tIt was. What sadness lengthens Romeo's hours?",
 '',
 'ROMEO\tNot having that, which, having, makes them short.',
 '',
 'BENVOLIO\tIn love?',
 '',
 'ROMEO\tOut--',
 '',
 'BENVOLIO\tOf love?',
 '',
 'ROMEO\tOut of her favour, where I am in love.',
 '',
 'BENVOLIO\tAlas, that love, so gentle in his view,',
 '\tShould be so tyrannous and rough in proof!',
 '',
 'ROMEO\tAlas, that love, whose view is muffled still,',
 '\tShould, without eyes, see pathways to his will!',
 '\tWhere shall we dine? O me! What fray was here?',
 '\tYet tell me not, for I have heard it all.',
 "\tHere's much to do with hate, but more with love.",
 '\tWhy, then, O brawling love! O loving hate!',
 '\tO any thing, of nothing first create!',
 '\tO heavy lightness! serious vanity!',
 '\tMis-shapen chaos of well-seeming forms!',
 '\tFeather of lead, bright smoke, cold fire,',
 '\tsick health!',
 '\tStill-waking sleep, that is not what it is!',
 '\tThis love feel I, that feel no love in this.',
 '\tDost thou not laugh?',
 '',
 'BENVOLIO\tNo, coz, I rather weep.',
 '',
 'ROMEO\tGood heart, at what?',
 '',
 "BENVOLIO\tAt thy good heart's oppression.",
 '',
 "ROMEO\tWhy, such is love's transgression.",
 '\tGriefs of mine own lie heavy in my breast,',
 '\tWhich thou wilt propagate, to have it prest',
 '\tWith more of thine: this love that thou hast shown',
 '\tDoth add more grief to too much of mine own.',
 '\tLove is a smoke raised with the fume of sighs;',
 "\tBeing purged, a fire sparkling in lovers' eyes;",
 "\tBeing vex'd a sea nourish'd with lovers' tears:",
 '\tWhat is it else? a madness most discreet,',
 '\tA choking gall and a preserving sweet.',
 '\tFarewell, my coz.',
 '',
 'BENVOLIO\t                  Soft! I will go along;',
 '\tAn if you leave me so, you do me wrong.',
 '',
 'ROMEO\tTut, I have lost myself; I am not here;',
 "\tThis is not Romeo, he's some other where.",
 '',
 'BENVOLIO\tTell me in sadness, who is that you love.',
 '',
 'ROMEO\tWhat, shall I groan and tell thee?',
 '',
 'BENVOLIO\tGroan! why, no.',
 '\tBut sadly tell me who.',
 '',
 'ROMEO\tBid a sick man in sadness make his will:',
 '\tAh, word ill urged to one that is so ill!',
 '\tIn sadness, cousin, I do love a woman.',
 '',
 "BENVOLIO\tI aim'd so near, when I supposed you loved.",
 '',
 "ROMEO\tA right good mark-man! And she's fair I love.",
 '',
 'BENVOLIO\tA right fair mark, fair coz, is soonest hit.',
 '',
 "ROMEO\tWell, in that hit you miss: she'll not be hit",
 "\tWith Cupid's arrow; she hath Dian's wit;",
 "\tAnd, in strong proof of chastity well arm'd,",
 "\tFrom love's weak childish bow she lives unharm'd.",
 '\tShe will not stay the siege of loving terms,',
 '\tNor bide the encounter of assailing eyes,',
 '\tNor ope her lap to saint-seducing gold:',
 '\tO, she is rich in beauty, only poor,',
 '\tThat when she dies with beauty dies her store.',
 '',
 'BENVOLIO\tThen she hath sworn that she will still live chaste?',
 '',
 'ROMEO\tShe hath, and in that sparing makes huge waste,',
 '\tFor beauty starved with her severity',
 '\tCuts beauty off from all posterity.',
 '\tShe is too fair, too wise, wisely too fair,',
 '\tTo merit bliss by making me despair:',
 '\tShe hath forsworn to love, and in that vow',
 '\tDo I live dead that live to tell it now.',
 '',
 'BENVOLIO\tBe ruled by me, forget to think of her.',
 '',
 'ROMEO\tO, teach me how I should forget to think.',
 '',
 'BENVOLIO\tBy giving liberty unto thine eyes;',
 '\tExamine other beauties.',
 '',
 "ROMEO\t'Tis the way",
 '\tTo call hers exquisite, in question more:',
 "\tThese happy masks that kiss fair ladies' brows",
 '\tBeing black put us in mind they hide the fair;',
 '\tHe that is strucken blind cannot forget',
 '\tThe precious treasure of his eyesight lost:',
 '\tShow me a mistress that is passing fair,',
 '\tWhat doth her beauty serve, but as a note',
 "\tWhere I may read who pass'd that passing fair?",
 '\tFarewell: thou canst not teach me to forget.',
 '',
 "BENVOLIO\tI'll pay that doctrine, or else die in debt.",
 '',
 '\t[Exeunt]']

Split, map

string = "this is a test"
string.split(" ")
['this', 'is', 'a', 'test']
data1 = text_file.map(lambda line : line.split(" "))
data1.take(2) # 라인 두개가 출력된다.
[['ROMEO', 'AND', 'JULIET'], ['']]

map 펑션을 이용하면 리스트 형태로 리턴이 된다.
하지만 라인마다 리스트를 만들 필요가 없으므로, flatMap 을 사용해서 바운더리를 없앨 수 있도록 한다.

Split, flatMap

data2 = text_file.flatMap(lambda line:line.split(" "))
data2.take(10) # 열개의 단어가 출력된다. -> wordCount시에는 flatMap이 더 효과적
['ROMEO', 'AND', 'JULIET', '', '', 'ACT', 'I', '', '', '']

각 단어에 1을 부여

data3 = data2.map(lambda word: (word,1))
data3.take(10)
[('ROMEO', 1),
 ('AND', 1),
 ('JULIET', 1),
 ('', 1),
 ('', 1),
 ('ACT', 1),
 ('I', 1),
 ('', 1),
 ('', 1),
 ('', 1)]

reduceByKey

reduceByKey 를 이용하여 value 들의 합을 구한다.

data4 = data3.reduceByKey(lambda a,b: a+b)
data4.take(10)
[('', 134),
 ('ACT', 1),
 ('SCENE', 1),
 ('I\tVerona.', 1),
 ('public', 1),
 ('place.', 1),
 ('SAMPSON', 1),
 ('GREGORY,', 1),
 ('of', 45),
 ('house', 4)]

어느 단어가 가장 많이 나왔는지 궁금하면, sort 펑션을 통해 알수 있다.

sorting

data5 = data4.map(lambda pair : (pair[1], pair[0]))
data5.take(10)
[(134, ''),
 (1, 'ACT'),
 (1, 'SCENE'),
 (1, 'I\tVerona.'),
 (1, 'public'),
 (1, 'place.'),
 (1, 'SAMPSON'),
 (1, 'GREGORY,'),
 (45, 'of'),
 (4, 'house')]
data6 = data5.sortByKey(ascending=False)
data6.collect()
[(134, ''),
 (57, 'the'),
 (45, 'of'),
 (37, 'I'),
 (34, 'to'),
 (33, 'and'),
 (27, 'in'),
 (22, 'is'),
 (22, 'you'),
 (21, 'that'),
 (19, 'with'),
 (19, 'a'),
 (18, 'his'),
 (17, 'not'),
 (16, 'me'),
 (14, 'my'),
 (13, 'thou'),
 (11, 'as'),
 (11, 'your'),
 (11, 'will'),
 (11, 'it'),
 (11, 'so'),
 (10, 'from'),
 (8, 'this'),
 (8, '\t[Enter'),
 (8, 'be'),
 (8, '\tAnd'),
 (8, 'by'),
 (7, 'thy'),
 (7, 'at'),
 (7, 'do'),
 (7, 'but'),
 (7, 'our'),
 (7, 'all'),
 (7, 'LADY'),
 (6, 'am'),
 (6, '\tTo'),
 (6, 'he'),
 (6, 'more'),
 (6, 'if'),
 (6, 'shall'),
 (6, 'she'),
 (5, 'we'),
 (5, 'an'),
 (5, 'when'),
 (5, 'know'),
 (5, 'beauty'),
 (5, 'her'),
 (5, 'for'),
 (5, 'their'),
 (5, 'they'),
 (5, 'bite'),
 (5, 'sir.'),
 (5, 'who'),
 (4, 'house'),
 (4, 'feel'),
 (4, 'them'),
 (4, 'thumb'),
 (4, 'good'),
 (4, '\tBut'),
 (4, 'love.'),
 (4, 'tell'),
 (4, 'love'),
 (4, 'art'),
 (4, 'me.'),
 (4, 'or'),
 (4, 'part'),
 (4, '\tThat'),
 (4, '\tWith'),
 (4, 'him'),
 (4, 'hath'),
 (4, 'too'),
 (4, 'fair'),
 (3, 'us'),
 (3, 'take'),
 (3, 'let'),
 (3, 'sir?'),
 (3, 'sir,'),
 (3, 'thee,'),
 (3, '\tThe'),
 (3, 'where'),
 (3, 'was'),
 (3, 'heavy'),
 (3, 'other'),
 (3, 'O'),
 (3, 'have'),
 (3, 'live'),
 (3, 'fair,'),
 (3, 'word,'),
 (3, 'SAMPSON\tI'),
 (3, 'stand:'),
 (3, 'any'),
 (3, 'man'),
 (3, 'men'),
 (3, 'what'),
 (3, 'up'),
 (3, 'sword,'),
 (3, 'old'),
 (3, 'were'),
 (3, 'most'),
 (3, 'makes'),
 (3, 'much'),
 (3, 'love,'),
 (3, '\tBeing'),
 (3, '\tShe'),
 (3, 'forget'),
 (2, "o'"),
 (2, 'GREGORY\tNo,'),
 (2, 'draw'),
 (2, 'out'),
 (2, 'SAMPSON\tA'),
 (2, 'move'),
 (2, '\ttake'),
 (2, 'wall.'),
 (2, 'ever'),
 (2, 'quarrel'),
 (2, 'myself'),
 (2, 'men,'),
 (2, 'cut'),
 (2, 'must'),
 (2, 'it.'),
 (2, 'quarrel,'),
 (2, 'law'),
 (2, 'ABRAHAM\tDo'),
 (2, 'thumb,'),
 (2, 'say'),
 (2, 'fight]'),
 (2, 'down'),
 (2, 'TYBALT\tWhat,'),
 (2, 'these'),
 (2, 'put'),
 (2, '\tOr'),
 (2, 'hate'),
 (2, '\tAs'),
 (2, '\tHave'),
 (2, 'ho!'),
 (2, 'call'),
 (2, 'MONTAGUE'),
 (2, 'fire'),
 (2, 'pain'),
 (2, 'hear'),
 (2, 'Montague,'),
 (2, 'ancient'),
 (2, 'pay'),
 (2, '\tFor'),
 (2, 'go'),
 (2, 'new'),
 (2, '\tHe'),
 (2, 'sun'),
 (2, 'mind'),
 (2, '\tSo'),
 (2, 'humour'),
 (2, 'sighs;'),
 (2, 'far'),
 (2, 'may'),
 (2, 'MONTAGUE\tI'),
 (2, 'own'),
 (2, 'would'),
 (2, 'What'),
 (2, 'sadness'),
 (2, 'love?'),
 (2, '\tWhere'),
 (2, 'loving'),
 (2, 'eyes;'),
 (2, 'sadness,'),
 (2, 'right'),
 (2, 'dies'),
 (2, 'Capulet,'),
 (2, "we'll"),
 (2, 'should'),
 (2, 'while'),
 (2, 'being'),
 (2, 'moved'),
 (2, 'dog'),
 (2, 'Montague'),
 (2, 'away.'),
 (2, 'weak'),
 (2, '\tto'),
 (2, 'therefore'),
 (2, 'thrust'),
 (2, 'GREGORY\tThe'),
 (2, 'men.'),
 (2, 'fought'),
 (2, 'off'),
 (2, 'heads'),
 (2, 'sense'),
 (2, 'well'),
 (2, 'been'),
 (2, 'here'),
 (2, 'comes'),
 (2, 'back'),
 (2, 'us,'),
 (2, 'side,'),
 (2, 'you,'),
 (2, 'one'),
 (2, '\t[They'),
 (2, 'BENVOLIO]'),
 (2, 'BENVOLIO\tI'),
 (2, 'MONTAGUE]'),
 (2, 'MONTAGUE\tThou'),
 (2, 'not,'),
 (2, 'peace,'),
 (2, 'hands'),
 (2, 'lives'),
 (2, '\tAnd,'),
 (2, 'on'),
 (2, '\t[Exeunt'),
 (2, 'MONTAGUE,'),
 (2, 'did'),
 (2, 'nothing'),
 (2, '\tA'),
 (2, 'see'),
 (2, 'gladly'),
 (2, 'morning'),
 (2, 'clouds'),
 (2, '\tShould'),
 (2, 'himself'),
 (2, 'can'),
 (2, 'learn'),
 (2, 'how'),
 (2, 'happy'),
 (2, 'me!'),
 (2, 'eyes,'),
 (2, '\tO'),
 (2, '\tThis'),
 (2, 'coz,'),
 (2, "love's"),
 (2, 'mine'),
 (2, "lovers'"),
 (2, '\tWhat'),
 (2, 'hit'),
 (2, '\tNor'),
 (2, 'teach'),
 (2, 'passing'),
 (1, 'ACT'),
 (1, 'SCENE'),
 (1, 'I\tVerona.'),
 (1, 'public'),
 (1, 'place.'),
 (1, 'SAMPSON'),
 (1, 'GREGORY,'),
 (1, 'swords'),
 (1, 'bucklers]'),
 (1, 'SAMPSON\tGregory,'),
 (1, 'carry'),
 (1, 'colliers.'),
 (1, 'mean,'),
 (1, 'choler,'),
 (1, 'draw.'),
 (1, 'GREGORY\tAy,'),
 (1, 'live,'),
 (1, 'neck'),
 (1, 'collar.'),
 (1, 'moved.'),
 (1, 'quickly'),
 (1, 'strike.'),
 (1, 'moves'),
 (1, 'valiant'),
 (1, '\ttherefore,'),
 (1, 'moved,'),
 (1, 'maid'),
 (1, "Montague's."),
 (1, 'goes'),
 (1, 'women,'),
 (1, 'weaker'),
 (1, 'vessels,'),
 (1, 'wall:'),
 (1, 'push'),
 (1, 'maids'),
 (1, 'one,'),
 (1, 'tyrant:'),
 (1, '\thave'),
 (1, 'heads.'),
 (1, 'maids?'),
 (1, 'SAMPSON\tAy,'),
 (1, 'maids,'),
 (1, 'maidenheads;'),
 (1, 'wilt.'),
 (1, 'SAMPSON\tMe'),
 (1, 'able'),
 (1, 'known'),
 (1, 'pretty'),
 (1, "GREGORY\t'Tis"),
 (1, 'fish;'),
 (1, 'poor'),
 (1, 'tool!'),
 (1, '\ttwo'),
 (1, 'Montagues.'),
 (1, 'SAMPSON\tMy'),
 (1, 'naked'),
 (1, 'weapon'),
 (1, 'GREGORY\tHow!'),
 (1, 'turn'),
 (1, 'not.'),
 (1, 'marry;'),
 (1, 'fear'),
 (1, 'thee!'),
 (1, 'begin.'),
 (1, 'frown'),
 (1, 'pass'),
 (1, 'by,'),
 (1, 'dare.'),
 (1, '\twhich'),
 (1, 'disgrace'),
 (1, 'them,'),
 (1, 'bear'),
 (1, '\tay?'),
 (1, 'GREGORY\tNo.'),
 (1, 'SAMPSON\tNo,'),
 (1, 'ABRAHAM\tQuarrel'),
 (1, 'sir!'),
 (1, 'no,'),
 (1, 'do,'),
 (1, 'serve'),
 (1, 'ABRAHAM\tNo'),
 (1, 'GREGORY\tSay'),
 (1, "'better:'"),
 (1, 'kinsmen.'),
 (1, 'better,'),
 (1, 'ABRAHAM\tYou'),
 (1, 'lie.'),
 (1, 'SAMPSON\tDraw,'),
 (1, 'blow.'),
 (1, 'swords;'),
 (1, 'TYBALT]'),
 (1, 'hinds?'),
 (1, '\tTurn'),
 (1, 'look'),
 (1, 'upon'),
 (1, 'death.'),
 (1, 'peace:'),
 (1, 'hell,'),
 (1, 'thee:'),
 (1, 'coward!'),
 (1, 'several'),
 (1, 'both'),
 (1, 'join'),
 (1, 'fray;'),
 (1, '\tthen'),
 (1, 'enter'),
 (1, 'Citizens,'),
 (1, 'Citizen\tClubs,'),
 (1, 'partisans!'),
 (1, 'strike!'),
 (1, 'beat'),
 (1, 'down!'),
 (1, '\tDown'),
 (1, 'CAPULET'),
 (1, 'CAPULET]'),
 (1, 'CAPULET\tWhat'),
 (1, 'Give'),
 (1, 'long'),
 (1, 'CAPULET\tA'),
 (1, 'crutch!'),
 (1, 'why'),
 (1, 'Old'),
 (1, 'come,'),
 (1, 'villain'),
 (1, 'go.'),
 (1, 'shalt'),
 (1, 'stir'),
 (1, 'foe.'),
 (1, 'Attendants]'),
 (1, 'PRINCE\tRebellious'),
 (1, '\tProfaners'),
 (1, 'neighbour-stained'),
 (1, '\tWill'),
 (1, 'hear?'),
 (1, 'beasts,'),
 (1, 'rage'),
 (1, 'fountains'),
 (1, 'issuing'),
 (1, '\tOn'),
 (1, 'bloody'),
 (1, "mistemper'd"),
 (1, 'weapons'),
 (1, 'ground,'),
 (1, 'prince.'),
 (1, '\tThree'),
 (1, 'bred'),
 (1, 'thrice'),
 (1, "disturb'd"),
 (1, 'quiet'),
 (1, 'streets,'),
 (1, "Verona's"),
 (1, 'citizens'),
 (1, '\tCast'),
 (1, 'grave'),
 (1, 'wield'),
 (1, 'partisans,'),
 (1, 'old,'),
 (1, "canker'd"),
 (1, '\tIf'),
 (1, 'streets'),
 (1, 'again,'),
 (1, '\tYour'),
 (1, 'forfeit'),
 (1, 'peace.'),
 (1, 'rest'),
 (1, 'depart'),
 (1, 'away:'),
 (1, 'further'),
 (1, 'case,'),
 (1, 'common'),
 (1, 'death,'),
 (1, 'depart.'),
 (1, 'MONTAGUE\tWho'),
 (1, 'set'),
 (1, 'abroach?'),
 (1, '\tSpeak,'),
 (1, 'BENVOLIO\tHere'),
 (1, 'adversary,'),
 (1, 'close'),
 (1, 'fighting'),
 (1, '\tI'),
 (1, 'instant'),
 (1, 'came'),
 (1, 'prepared,'),
 (1, 'swung'),
 (1, 'head'),
 (1, 'winds,'),
 (1, 'hurt'),
 (1, 'withal'),
 (1, '\tWhile'),
 (1, 'thrusts'),
 (1, '\tCame'),
 (1, 'part,'),
 (1, 'prince'),
 (1, 'part.'),
 (1, 'to-day?'),
 (1, 'before'),
 (1, "\tPeer'd"),
 (1, 'east,'),
 (1, 'walk'),
 (1, 'abroad;'),
 (1, 'underneath'),
 (1, 'sycamore'),
 (1, "city's"),
 (1, 'ware'),
 (1, 'into'),
 (1, 'wood:'),
 (1, 'are'),
 (1, 'busied'),
 (1, 'alone,'),
 (1, "shunn'd"),
 (1, 'MONTAGUE\tMany'),
 (1, 'there'),
 (1, 'tears'),
 (1, 'augmenting'),
 (1, 'dew.'),
 (1, 'all-cheering'),
 (1, 'furthest'),
 (1, 'east'),
 (1, 'shady'),
 (1, 'home'),
 (1, 'private'),
 (1, 'pens'),
 (1, 'daylight'),
 (1, 'artificial'),
 (1, '\tUnless'),
 (1, 'cause'),
 (1, 'noble'),
 (1, 'neither'),
 (1, 'nor'),
 (1, 'friends:'),
 (1, 'he,'),
 (1, "affections'"),
 (1, 'secret'),
 (1, 'close,'),
 (1, 'sounding'),
 (1, '\tEre'),
 (1, 'sweet'),
 (1, 'leaves'),
 (1, 'air,'),
 (1, 'dedicate'),
 (1, 'sun.'),
 (1, '\tCould'),
 (1, 'grow.'),
 (1, '\tWe'),
 (1, 'willingly'),
 (1, 'give'),
 (1, 'cure'),
 (1, 'ROMEO]'),
 (1, 'BENVOLIO\tSee,'),
 (1, 'comes:'),
 (1, 'step'),
 (1, "\tI'll"),
 (1, 'stay,'),
 (1, 'true'),
 (1, 'shrift.'),
 (1, "let's"),
 (1, 'ROMEO\tIs'),
 (1, 'nine.'),
 (1, 'ROMEO\tAy'),
 (1, 'sad'),
 (1, 'hours'),
 (1, 'long.'),
 (1, 'father'),
 (1, 'fast?'),
 (1, 'lengthens'),
 (1, "Romeo's"),
 (1, 'ROMEO\tNot'),
 (1, 'that,'),
 (1, 'having,'),
 (1, 'BENVOLIO\tIn'),
 (1, 'ROMEO\tOut--'),
 (1, 'ROMEO\tOut'),
 (1, 'gentle'),
 (1, 'proof!'),
 (1, 'whose'),
 (1, 'muffled'),
 (1, '\tShould,'),
 (1, 'pathways'),
 (1, 'dine?'),
 (1, '\tYet'),
 (1, 'heard'),
 (1, "\tHere's"),
 (1, 'hate,'),
 (1, 'then,'),
 (1, 'love!'),
 (1, 'hate!'),
 (1, 'lightness!'),
 (1, 'vanity!'),
 (1, '\tMis-shapen'),
 (1, 'chaos'),
 (1, '\tFeather'),
 (1, 'bright'),
 (1, 'cold'),
 (1, 'fire,'),
 (1, '\tsick'),
 (1, 'health!'),
 (1, '\tStill-waking'),
 (1, 'sleep,'),
 (1, 'is!'),
 (1, 'no'),
 (1, 'laugh?'),
 (1, 'BENVOLIO\tNo,'),
 (1, 'rather'),
 (1, 'what?'),
 (1, 'BENVOLIO\tAt'),
 (1, "heart's"),
 (1, 'oppression.'),
 (1, 'ROMEO\tWhy,'),
 (1, '\tGriefs'),
 (1, 'lie'),
 (1, 'breast,'),
 (1, '\tWhich'),
 (1, 'propagate,'),
 (1, 'prest'),
 (1, 'thine:'),
 (1, 'hast'),
 (1, 'own.'),
 (1, 'raised'),
 (1, 'purged,'),
 (1, 'sea'),
 (1, "nourish'd"),
 (1, 'discreet,'),
 (1, 'preserving'),
 (1, 'sweet.'),
 (1, 'BENVOLIO\t'),
 (1, '\tAn'),
 (1, 'leave'),
 (1, 'wrong.'),
 (1, 'lost'),
 (1, 'Romeo,'),
 (1, 'where.'),
 (1, 'BENVOLIO\tTell'),
 (1, 'groan'),
 (1, 'thee?'),
 (1, 'no.'),
 (1, 'ROMEO\tBid'),
 (1, 'make'),
 (1, 'will:'),
 (1, 'ill'),
 (1, 'urged'),
 (1, 'ill!'),
 (1, 'cousin,'),
 (1, "aim'd"),
 (1, 'near,'),
 (1, 'mark-man!'),
 (1, 'And'),
 (1, 'mark,'),
 (1, 'hit.'),
 (1, 'ROMEO\tWell,'),
 (1, "she'll"),
 (1, "Dian's"),
 (1, 'wit;'),
 (1, 'proof'),
 (1, 'chastity'),
 (1, "arm'd,"),
 (1, 'childish'),
 (1, "unharm'd."),
 (1, 'siege'),
 (1, 'terms,'),
 (1, 'encounter'),
 (1, 'lap'),
 (1, 'gold:'),
 (1, 'only'),
 (1, 'poor,'),
 (1, 'BENVOLIO\tThen'),
 (1, 'sworn'),
 (1, 'hath,'),
 (1, 'sparing'),
 (1, 'huge'),
 (1, 'starved'),
 (1, 'severity'),
 (1, '\tCuts'),
 (1, 'posterity.'),
 (1, 'wise,'),
 (1, 'wisely'),
 (1, 'making'),
 (1, 'forsworn'),
 (1, 'vow'),
 (1, 'now.'),
 (1, 'ruled'),
 (1, 'think'),
 (1, 'think.'),
 (1, 'BENVOLIO\tBy'),
 (1, 'thine'),
 (1, '\tExamine'),
 (1, 'beauties.'),
 (1, 'way'),
 (1, 'hers'),
 (1, 'exquisite,'),
 (1, 'question'),
 (1, 'more:'),
 (1, '\tThese'),
 (1, 'masks'),
 (1, "ladies'"),
 (1, 'brows'),
 (1, 'hide'),
 (1, 'fair;'),
 (1, 'cannot'),
 (1, 'precious'),
 (1, 'treasure'),
 (1, 'doth'),
 (1, 'serve,'),
 (1, 'read'),
 (1, 'fair?'),
 (1, '\tFarewell:'),
 (1, 'doctrine,'),
 (1, 'die'),
 (1, 'ROMEO'),
 (1, 'AND'),
 (1, 'JULIET'),
 (1, 'A'),
 (1, '\tarmed'),
 (1, 'coals.'),
 (1, 'then'),
 (1, 'strike'),
 (1, 'quickly,'),
 (1, 'GREGORY\tBut'),
 (1, 'GREGORY\tTo'),
 (1, 'stir;'),
 (1, "runn'st"),
 (1, 'wall'),
 (1, 'GREGORY\tThat'),
 (1, 'shows'),
 (1, 'thee'),
 (1, 'slave;'),
 (1, 'weakest'),
 (1, 'SAMPSON\tTrue;'),
 (1, '\tare'),
 (1, "\tMontague's"),
 (1, 'wall,'),
 (1, 'between'),
 (1, 'masters'),
 (1, "SAMPSON\t'Tis"),
 (1, 'show'),
 (1, 'cruel'),
 (1, '\tmaids,'),
 (1, 'GREGORY\tThey'),
 (1, "\t'tis"),
 (1, 'piece'),
 (1, 'flesh.'),
 (1, 'hadst,'),
 (1, '\thadst'),
 (1, 'John.'),
 (1, 'Draw'),
 (1, 'out:'),
 (1, 'thee.'),
 (1, 'run?'),
 (1, 'SAMPSON\tFear'),
 (1, 'SAMPSON\tLet'),
 (1, 'sides;'),
 (1, 'GREGORY\tI'),
 (1, '\tthey'),
 (1, 'list.'),
 (1, 'SAMPSON\tNay,'),
 (1, 'them;'),
 (1, 'ABRAHAM'),
 (1, 'BALTHASAR]'),
 (1, 'SAMPSON\t[Aside'),
 (1, 'GREGORY]'),
 (1, 'Is'),
 (1, '\tbite'),
 (1, 'GREGORY\tDo'),
 (1, 'SAMPSON\tIf'),
 (1, 'you:'),
 (1, 'you.'),
 (1, 'better.'),
 (1, 'SAMPSON\tWell,'),
 (1, "master's"),
 (1, 'SAMPSON\tYes,'),
 (1, 'Gregory,'),
 (1, 'remember'),
 (1, 'swashing'),
 (1, 'BENVOLIO\tPart,'),
 (1, 'fools!'),
 (1, '\tPut'),
 (1, 'do.'),
 (1, '\t[Beats'),
 (1, 'swords]'),
 (1, 'drawn'),
 (1, 'among'),
 (1, 'heartless'),
 (1, 'Benvolio,'),
 (1, 'keep'),
 (1, 'manage'),
 (1, 'drawn,'),
 (1, 'talk'),
 (1, 'peace!'),
 (1, 'Montagues,'),
 (1, '\t[Enter,'),
 (1, 'houses,'),
 (1, 'clubs]'),
 (1, 'First'),
 (1, 'bills,'),
 (1, 'Capulets!'),
 (1, 'Montagues!'),
 (1, 'gown,'),
 (1, 'noise'),
 (1, 'this?'),
 (1, 'crutch,'),
 (1, 'sword?'),
 (1, 'CAPULET\tMy'),
 (1, 'say!'),
 (1, 'flourishes'),
 (1, 'blade'),
 (1, 'spite'),
 (1, 'Capulet,--Hold'),
 (1, 'foot'),
 (1, 'seek'),
 (1, 'PRINCE,'),
 (1, 'subjects,'),
 (1, 'enemies'),
 (1, 'steel,--'),
 (1, 'What,'),
 (1, 'quench'),
 (1, 'pernicious'),
 (1, 'purple'),
 (1, 'veins,'),
 (1, 'torture,'),
 (1, 'those'),
 (1, '\tThrow'),
 (1, 'sentence'),
 (1, 'civil'),
 (1, 'brawls,'),
 (1, 'airy'),
 (1, '\tBy'),
 (1, 'made'),
 (1, 'beseeming'),
 (1, 'ornaments,'),
 (1, "\tCanker'd"),
 (1, 'hate:'),
 (1, 'disturb'),
 (1, 'time,'),
 (1, '\tYou'),
 (1, 'Capulet;'),
 (1, 'along'),
 (1, 'me:'),
 (1, 'come'),
 (1, 'afternoon,'),
 (1, 'pleasure'),
 (1, 'Free-town,'),
 (1, 'judgment-place.'),
 (1, '\tOnce'),
 (1, 'more,'),
 (1, 'nephew,'),
 (1, 'began?'),
 (1, 'servants'),
 (1, 'yours,'),
 (1, 'ere'),
 (1, 'approach:'),
 (1, 'drew'),
 (1, 'them:'),
 (1, 'fiery'),
 (1, 'Tybalt,'),
 (1, 'sword'),
 (1, '\tWhich,'),
 (1, 'breathed'),
 (1, 'defiance'),
 (1, 'ears,'),
 (1, 'about'),
 (1, '\tWho'),
 (1, "hiss'd"),
 (1, 'scorn:'),
 (1, 'interchanging'),
 (1, 'blows,'),
 (1, '\tTill'),
 (1, 'came,'),
 (1, 'parted'),
 (1, 'either'),
 (1, 'MONTAGUE\tO,'),
 (1, 'Romeo?'),
 (1, 'saw'),
 (1, '\tRight'),
 (1, 'glad'),
 (1, 'fray.'),
 (1, 'BENVOLIO\tMadam,'),
 (1, 'hour'),
 (1, "worshipp'd"),
 (1, 'forth'),
 (1, 'golden'),
 (1, 'window'),
 (1, 'troubled'),
 (1, 'drave'),
 (1, '\tWhere,'),
 (1, 'grove'),
 (1, 'westward'),
 (1, 'rooteth'),
 (1, 'early'),
 (1, 'walking'),
 (1, 'son:'),
 (1, '\tTowards'),
 (1, 'made,'),
 (1, 'stole'),
 (1, 'covert'),
 (1, '\tI,'),
 (1, 'measuring'),
 (1, 'affections'),
 (1, 'own,'),
 (1, "they're"),
 (1, '\tPursued'),
 (1, 'pursuing'),
 (1, 'his,'),
 (1, 'fled'),
 (1, 'seen,'),
 (1, 'fresh'),
 (1, '\tAdding'),
 (1, 'deep'),
 (1, 'soon'),
 (1, 'begin'),
 (1, 'curtains'),
 (1, "Aurora's"),
 (1, 'bed,'),
 (1, '\tAway'),
 (1, 'light'),
 (1, 'steals'),
 (1, 'son,'),
 (1, 'chamber'),
 (1, 'himself,'),
 (1, '\tShuts'),
 (1, 'windows,'),
 (1, 'locks'),
 (1, 'night:'),
 (1, '\tBlack'),
 (1, 'portentous'),
 (1, 'prove,'),
 (1, 'counsel'),
 (1, 'remove.'),
 (1, 'BENVOLIO\tMy'),
 (1, 'uncle,'),
 (1, 'cause?'),
 (1, 'him.'),
 (1, 'BENVOLIO\tHave'),
 (1, 'importuned'),
 (1, 'means?'),
 (1, 'MONTAGUE\tBoth'),
 (1, 'many'),
 (1, 'counsellor,'),
 (1, '\tIs'),
 (1, 'himself--I'),
 (1, 'true--'),
 (1, 'discovery,'),
 (1, 'bud'),
 (1, 'bit'),
 (1, 'envious'),
 (1, 'worm,'),
 (1, 'spread'),
 (1, 'whence'),
 (1, 'sorrows'),
 (1, 'know.'),
 (1, 'please'),
 (1, 'aside;'),
 (1, 'grievance,'),
 (1, 'denied.'),
 (1, 'wert'),
 (1, 'Come,'),
 (1, 'madam,'),
 (1, 'BENVOLIO\tGood-morrow,'),
 (1, 'cousin.'),
 (1, 'day'),
 (1, 'young?'),
 (1, 'BENVOLIO\tBut'),
 (1, 'struck'),
 (1, 'seem'),
 (1, '\tWas'),
 (1, 'went'),
 (1, 'hence'),
 (1, 'BENVOLIO\tIt'),
 (1, 'was.'),
 (1, 'hours?'),
 (1, 'having'),
 (1, 'which,'),
 (1, 'short.'),
 (1, 'BENVOLIO\tOf'),
 (1, 'favour,'),
 (1, 'BENVOLIO\tAlas,'),
 (1, 'view,'),
 (1, 'tyrannous'),
 (1, 'rough'),
 (1, 'ROMEO\tAlas,'),
 (1, 'view'),
 (1, 'still,'),
 (1, 'without'),
 (1, 'will!'),
 (1, 'fray'),
 (1, 'here?'),
 (1, 'all.'),
 (1, '\tWhy,'),
 (1, 'brawling'),
 (1, 'thing,'),
 (1, 'first'),
 (1, 'create!'),
 (1, 'serious'),
 (1, 'well-seeming'),
 (1, 'forms!'),
 (1, 'lead,'),
 (1, 'smoke,'),
 (1, 'I,'),
 (1, 'this.'),
 (1, '\tDost'),
 (1, 'weep.'),
 (1, 'ROMEO\tGood'),
 (1, 'heart,'),
 (1, 'such'),
 (1, 'transgression.'),
 (1, 'wilt'),
 (1, 'shown'),
 (1, '\tDoth'),
 (1, 'add'),
 (1, 'grief'),
 (1, '\tLove'),
 (1, 'smoke'),
 (1, 'fume'),
 (1, 'sparkling'),
 (1, "vex'd"),
 (1, 'tears:'),
 (1, 'else?'),
 (1, 'madness'),
 (1, 'choking'),
 (1, 'gall'),
 (1, '\tFarewell,'),
 (1, 'coz.'),
 (1, 'Soft!'),
 (1, 'along;'),
 (1, 'so,'),
 (1, 'ROMEO\tTut,'),
 (1, 'myself;'),
 (1, 'here;'),
 (1, "he's"),
 (1, 'some'),
 (1, 'ROMEO\tWhat,'),
 (1, 'BENVOLIO\tGroan!'),
 (1, 'why,'),
 (1, 'sadly'),
 (1, 'who.'),
 (1, 'sick'),
 (1, '\tAh,'),
 (1, 'word'),
 (1, '\tIn'),
 (1, 'woman.'),
 (1, 'supposed'),
 (1, 'loved.'),
 (1, 'ROMEO\tA'),
 (1, "she's"),
 (1, 'BENVOLIO\tA'),
 (1, 'soonest'),
 (1, 'miss:'),
 (1, "Cupid's"),
 (1, 'arrow;'),
 (1, 'strong'),
 (1, '\tFrom'),
 (1, 'bow'),
 (1, 'stay'),
 (1, 'bide'),
 (1, 'assailing'),
 (1, 'ope'),
 (1, 'saint-seducing'),
 (1, '\tO,'),
 (1, 'rich'),
 (1, 'beauty,'),
 (1, 'store.'),
 (1, 'still'),
 (1, 'chaste?'),
 (1, 'ROMEO\tShe'),
 (1, 'waste,'),
 (1, 'merit'),
 (1, 'bliss'),
 ...]

Complete Program

text_file = sc.textFile("./4.pyspark/test1.txt")
counts = text_file.flatMap(lambda line: line.split(" "))\
                .map(lambda word: (word, 1))\
                .reduceByKey(lambda a, b: a+b)\
                .map(lambda pair : (pair[1], pair[0]))\
                .sortByKey(ascending=False)

# counts.saveAsTextFile("count")# count라는 디렉토리가 만들어지고 파티션별로 실행된 결과가 저장된다
counts.collect()
[(134, ''),
 (57, 'the'),
 (45, 'of'),
 (37, 'I'),
 (34, 'to'),
 (33, 'and'),
 (27, 'in'),
 (22, 'is'),
 (22, 'you'),
 (21, 'that'),
 (19, 'with'),
 (19, 'a'),
 (18, 'his'),
 (17, 'not'),
 (16, 'me'),
 (14, 'my'),
 (13, 'thou'),
 (11, 'as'),
 (11, 'your'),
 (11, 'will'),
 (11, 'it'),
 (11, 'so'),
 (10, 'from'),
 (8, 'this'),
 (8, '\t[Enter'),
 (8, 'be'),
 (8, '\tAnd'),
 (8, 'by'),
 (7, 'thy'),
 (7, 'at'),
 (7, 'do'),
 (7, 'but'),
 (7, 'our'),
 (7, 'all'),
 (7, 'LADY'),
 (6, 'am'),
 (6, '\tTo'),
 (6, 'he'),
 (6, 'more'),
 (6, 'if'),
 (6, 'shall'),
 (6, 'she'),
 (5, 'we'),
 (5, 'an'),
 (5, 'when'),
 (5, 'know'),
 (5, 'beauty'),
 (5, 'her'),
 (5, 'for'),
 (5, 'their'),
 (5, 'they'),
 (5, 'bite'),
 (5, 'sir.'),
 (5, 'who'),
 (4, 'house'),
 (4, 'feel'),
 (4, 'them'),
 (4, 'thumb'),
 (4, 'good'),
 (4, '\tBut'),
 (4, 'love.'),
 (4, 'tell'),
 (4, 'love'),
 (4, 'art'),
 (4, 'me.'),
 (4, 'or'),
 (4, 'part'),
 (4, '\tThat'),
 (4, '\tWith'),
 (4, 'him'),
 (4, 'hath'),
 (4, 'too'),
 (4, 'fair'),
 (3, 'us'),
 (3, 'take'),
 (3, 'let'),
 (3, 'sir?'),
 (3, 'sir,'),
 (3, 'thee,'),
 (3, '\tThe'),
 (3, 'where'),
 (3, 'was'),
 (3, 'heavy'),
 (3, 'other'),
 (3, 'O'),
 (3, 'have'),
 (3, 'live'),
 (3, 'fair,'),
 (3, 'word,'),
 (3, 'SAMPSON\tI'),
 (3, 'stand:'),
 (3, 'any'),
 (3, 'man'),
 (3, 'men'),
 (3, 'what'),
 (3, 'up'),
 (3, 'sword,'),
 (3, 'old'),
 (3, 'were'),
 (3, 'most'),
 (3, 'makes'),
 (3, 'much'),
 (3, 'love,'),
 (3, '\tBeing'),
 (3, '\tShe'),
 (3, 'forget'),
 (2, "o'"),
 (2, 'GREGORY\tNo,'),
 (2, 'draw'),
 (2, 'out'),
 (2, 'SAMPSON\tA'),
 (2, 'move'),
 (2, '\ttake'),
 (2, 'wall.'),
 (2, 'ever'),
 (2, 'quarrel'),
 (2, 'myself'),
 (2, 'men,'),
 (2, 'cut'),
 (2, 'must'),
 (2, 'it.'),
 (2, 'quarrel,'),
 (2, 'law'),
 (2, 'ABRAHAM\tDo'),
 (2, 'thumb,'),
 (2, 'say'),
 (2, 'fight]'),
 (2, 'down'),
 (2, 'TYBALT\tWhat,'),
 (2, 'these'),
 (2, 'put'),
 (2, '\tOr'),
 (2, 'hate'),
 (2, '\tAs'),
 (2, '\tHave'),
 (2, 'ho!'),
 (2, 'call'),
 (2, 'MONTAGUE'),
 (2, 'fire'),
 (2, 'pain'),
 (2, 'hear'),
 (2, 'Montague,'),
 (2, 'ancient'),
 (2, 'pay'),
 (2, '\tFor'),
 (2, 'go'),
 (2, 'new'),
 (2, '\tHe'),
 (2, 'sun'),
 (2, 'mind'),
 (2, '\tSo'),
 (2, 'humour'),
 (2, 'sighs;'),
 (2, 'far'),
 (2, 'may'),
 (2, 'MONTAGUE\tI'),
 (2, 'own'),
 (2, 'would'),
 (2, 'What'),
 (2, 'sadness'),
 (2, 'love?'),
 (2, '\tWhere'),
 (2, 'loving'),
 (2, 'eyes;'),
 (2, 'sadness,'),
 (2, 'right'),
 (2, 'dies'),
 (2, 'Capulet,'),
 (2, "we'll"),
 (2, 'should'),
 (2, 'while'),
 (2, 'being'),
 (2, 'moved'),
 (2, 'dog'),
 (2, 'Montague'),
 (2, 'away.'),
 (2, 'weak'),
 (2, '\tto'),
 (2, 'therefore'),
 (2, 'thrust'),
 (2, 'GREGORY\tThe'),
 (2, 'men.'),
 (2, 'fought'),
 (2, 'off'),
 (2, 'heads'),
 (2, 'sense'),
 (2, 'well'),
 (2, 'been'),
 (2, 'here'),
 (2, 'comes'),
 (2, 'back'),
 (2, 'us,'),
 (2, 'side,'),
 (2, 'you,'),
 (2, 'one'),
 (2, '\t[They'),
 (2, 'BENVOLIO]'),
 (2, 'BENVOLIO\tI'),
 (2, 'MONTAGUE]'),
 (2, 'MONTAGUE\tThou'),
 (2, 'not,'),
 (2, 'peace,'),
 (2, 'hands'),
 (2, 'lives'),
 (2, '\tAnd,'),
 (2, 'on'),
 (2, '\t[Exeunt'),
 (2, 'MONTAGUE,'),
 (2, 'did'),
 (2, 'nothing'),
 (2, '\tA'),
 (2, 'see'),
 (2, 'gladly'),
 (2, 'morning'),
 (2, 'clouds'),
 (2, '\tShould'),
 (2, 'himself'),
 (2, 'can'),
 (2, 'learn'),
 (2, 'how'),
 (2, 'happy'),
 (2, 'me!'),
 (2, 'eyes,'),
 (2, '\tO'),
 (2, '\tThis'),
 (2, 'coz,'),
 (2, "love's"),
 (2, 'mine'),
 (2, "lovers'"),
 (2, '\tWhat'),
 (2, 'hit'),
 (2, '\tNor'),
 (2, 'teach'),
 (2, 'passing'),
 (1, 'ACT'),
 (1, 'SCENE'),
 (1, 'I\tVerona.'),
 (1, 'public'),
 (1, 'place.'),
 (1, 'SAMPSON'),
 (1, 'GREGORY,'),
 (1, 'swords'),
 (1, 'bucklers]'),
 (1, 'SAMPSON\tGregory,'),
 (1, 'carry'),
 (1, 'colliers.'),
 (1, 'mean,'),
 (1, 'choler,'),
 (1, 'draw.'),
 (1, 'GREGORY\tAy,'),
 (1, 'live,'),
 (1, 'neck'),
 (1, 'collar.'),
 (1, 'moved.'),
 (1, 'quickly'),
 (1, 'strike.'),
 (1, 'moves'),
 (1, 'valiant'),
 (1, '\ttherefore,'),
 (1, 'moved,'),
 (1, 'maid'),
 (1, "Montague's."),
 (1, 'goes'),
 (1, 'women,'),
 (1, 'weaker'),
 (1, 'vessels,'),
 (1, 'wall:'),
 (1, 'push'),
 (1, 'maids'),
 (1, 'one,'),
 (1, 'tyrant:'),
 (1, '\thave'),
 (1, 'heads.'),
 (1, 'maids?'),
 (1, 'SAMPSON\tAy,'),
 (1, 'maids,'),
 (1, 'maidenheads;'),
 (1, 'wilt.'),
 (1, 'SAMPSON\tMe'),
 (1, 'able'),
 (1, 'known'),
 (1, 'pretty'),
 (1, "GREGORY\t'Tis"),
 (1, 'fish;'),
 (1, 'poor'),
 (1, 'tool!'),
 (1, '\ttwo'),
 (1, 'Montagues.'),
 (1, 'SAMPSON\tMy'),
 (1, 'naked'),
 (1, 'weapon'),
 (1, 'GREGORY\tHow!'),
 (1, 'turn'),
 (1, 'not.'),
 (1, 'marry;'),
 (1, 'fear'),
 (1, 'thee!'),
 (1, 'begin.'),
 (1, 'frown'),
 (1, 'pass'),
 (1, 'by,'),
 (1, 'dare.'),
 (1, '\twhich'),
 (1, 'disgrace'),
 (1, 'them,'),
 (1, 'bear'),
 (1, '\tay?'),
 (1, 'GREGORY\tNo.'),
 (1, 'SAMPSON\tNo,'),
 (1, 'ABRAHAM\tQuarrel'),
 (1, 'sir!'),
 (1, 'no,'),
 (1, 'do,'),
 (1, 'serve'),
 (1, 'ABRAHAM\tNo'),
 (1, 'GREGORY\tSay'),
 (1, "'better:'"),
 (1, 'kinsmen.'),
 (1, 'better,'),
 (1, 'ABRAHAM\tYou'),
 (1, 'lie.'),
 (1, 'SAMPSON\tDraw,'),
 (1, 'blow.'),
 (1, 'swords;'),
 (1, 'TYBALT]'),
 (1, 'hinds?'),
 (1, '\tTurn'),
 (1, 'look'),
 (1, 'upon'),
 (1, 'death.'),
 (1, 'peace:'),
 (1, 'hell,'),
 (1, 'thee:'),
 (1, 'coward!'),
 (1, 'several'),
 (1, 'both'),
 (1, 'join'),
 (1, 'fray;'),
 (1, '\tthen'),
 (1, 'enter'),
 (1, 'Citizens,'),
 (1, 'Citizen\tClubs,'),
 (1, 'partisans!'),
 (1, 'strike!'),
 (1, 'beat'),
 (1, 'down!'),
 (1, '\tDown'),
 (1, 'CAPULET'),
 (1, 'CAPULET]'),
 (1, 'CAPULET\tWhat'),
 (1, 'Give'),
 (1, 'long'),
 (1, 'CAPULET\tA'),
 (1, 'crutch!'),
 (1, 'why'),
 (1, 'Old'),
 (1, 'come,'),
 (1, 'villain'),
 (1, 'go.'),
 (1, 'shalt'),
 (1, 'stir'),
 (1, 'foe.'),
 (1, 'Attendants]'),
 (1, 'PRINCE\tRebellious'),
 (1, '\tProfaners'),
 (1, 'neighbour-stained'),
 (1, '\tWill'),
 (1, 'hear?'),
 (1, 'beasts,'),
 (1, 'rage'),
 (1, 'fountains'),
 (1, 'issuing'),
 (1, '\tOn'),
 (1, 'bloody'),
 (1, "mistemper'd"),
 (1, 'weapons'),
 (1, 'ground,'),
 (1, 'prince.'),
 (1, '\tThree'),
 (1, 'bred'),
 (1, 'thrice'),
 (1, "disturb'd"),
 (1, 'quiet'),
 (1, 'streets,'),
 (1, "Verona's"),
 (1, 'citizens'),
 (1, '\tCast'),
 (1, 'grave'),
 (1, 'wield'),
 (1, 'partisans,'),
 (1, 'old,'),
 (1, "canker'd"),
 (1, '\tIf'),
 (1, 'streets'),
 (1, 'again,'),
 (1, '\tYour'),
 (1, 'forfeit'),
 (1, 'peace.'),
 (1, 'rest'),
 (1, 'depart'),
 (1, 'away:'),
 (1, 'further'),
 (1, 'case,'),
 (1, 'common'),
 (1, 'death,'),
 (1, 'depart.'),
 (1, 'MONTAGUE\tWho'),
 (1, 'set'),
 (1, 'abroach?'),
 (1, '\tSpeak,'),
 (1, 'BENVOLIO\tHere'),
 (1, 'adversary,'),
 (1, 'close'),
 (1, 'fighting'),
 (1, '\tI'),
 (1, 'instant'),
 (1, 'came'),
 (1, 'prepared,'),
 (1, 'swung'),
 (1, 'head'),
 (1, 'winds,'),
 (1, 'hurt'),
 (1, 'withal'),
 (1, '\tWhile'),
 (1, 'thrusts'),
 (1, '\tCame'),
 (1, 'part,'),
 (1, 'prince'),
 (1, 'part.'),
 (1, 'to-day?'),
 (1, 'before'),
 (1, "\tPeer'd"),
 (1, 'east,'),
 (1, 'walk'),
 (1, 'abroad;'),
 (1, 'underneath'),
 (1, 'sycamore'),
 (1, "city's"),
 (1, 'ware'),
 (1, 'into'),
 (1, 'wood:'),
 (1, 'are'),
 (1, 'busied'),
 (1, 'alone,'),
 (1, "shunn'd"),
 (1, 'MONTAGUE\tMany'),
 (1, 'there'),
 (1, 'tears'),
 (1, 'augmenting'),
 (1, 'dew.'),
 (1, 'all-cheering'),
 (1, 'furthest'),
 (1, 'east'),
 (1, 'shady'),
 (1, 'home'),
 (1, 'private'),
 (1, 'pens'),
 (1, 'daylight'),
 (1, 'artificial'),
 (1, '\tUnless'),
 (1, 'cause'),
 (1, 'noble'),
 (1, 'neither'),
 (1, 'nor'),
 (1, 'friends:'),
 (1, 'he,'),
 (1, "affections'"),
 (1, 'secret'),
 (1, 'close,'),
 (1, 'sounding'),
 (1, '\tEre'),
 (1, 'sweet'),
 (1, 'leaves'),
 (1, 'air,'),
 (1, 'dedicate'),
 (1, 'sun.'),
 (1, '\tCould'),
 (1, 'grow.'),
 (1, '\tWe'),
 (1, 'willingly'),
 (1, 'give'),
 (1, 'cure'),
 (1, 'ROMEO]'),
 (1, 'BENVOLIO\tSee,'),
 (1, 'comes:'),
 (1, 'step'),
 (1, "\tI'll"),
 (1, 'stay,'),
 (1, 'true'),
 (1, 'shrift.'),
 (1, "let's"),
 (1, 'ROMEO\tIs'),
 (1, 'nine.'),
 (1, 'ROMEO\tAy'),
 (1, 'sad'),
 (1, 'hours'),
 (1, 'long.'),
 (1, 'father'),
 (1, 'fast?'),
 (1, 'lengthens'),
 (1, "Romeo's"),
 (1, 'ROMEO\tNot'),
 (1, 'that,'),
 (1, 'having,'),
 (1, 'BENVOLIO\tIn'),
 (1, 'ROMEO\tOut--'),
 (1, 'ROMEO\tOut'),
 (1, 'gentle'),
 (1, 'proof!'),
 (1, 'whose'),
 (1, 'muffled'),
 (1, '\tShould,'),
 (1, 'pathways'),
 (1, 'dine?'),
 (1, '\tYet'),
 (1, 'heard'),
 (1, "\tHere's"),
 (1, 'hate,'),
 (1, 'then,'),
 (1, 'love!'),
 (1, 'hate!'),
 (1, 'lightness!'),
 (1, 'vanity!'),
 (1, '\tMis-shapen'),
 (1, 'chaos'),
 (1, '\tFeather'),
 (1, 'bright'),
 (1, 'cold'),
 (1, 'fire,'),
 (1, '\tsick'),
 (1, 'health!'),
 (1, '\tStill-waking'),
 (1, 'sleep,'),
 (1, 'is!'),
 (1, 'no'),
 (1, 'laugh?'),
 (1, 'BENVOLIO\tNo,'),
 (1, 'rather'),
 (1, 'what?'),
 (1, 'BENVOLIO\tAt'),
 (1, "heart's"),
 (1, 'oppression.'),
 (1, 'ROMEO\tWhy,'),
 (1, '\tGriefs'),
 (1, 'lie'),
 (1, 'breast,'),
 (1, '\tWhich'),
 (1, 'propagate,'),
 (1, 'prest'),
 (1, 'thine:'),
 (1, 'hast'),
 (1, 'own.'),
 (1, 'raised'),
 (1, 'purged,'),
 (1, 'sea'),
 (1, "nourish'd"),
 (1, 'discreet,'),
 (1, 'preserving'),
 (1, 'sweet.'),
 (1, 'BENVOLIO\t'),
 (1, '\tAn'),
 (1, 'leave'),
 (1, 'wrong.'),
 (1, 'lost'),
 (1, 'Romeo,'),
 (1, 'where.'),
 (1, 'BENVOLIO\tTell'),
 (1, 'groan'),
 (1, 'thee?'),
 (1, 'no.'),
 (1, 'ROMEO\tBid'),
 (1, 'make'),
 (1, 'will:'),
 (1, 'ill'),
 (1, 'urged'),
 (1, 'ill!'),
 (1, 'cousin,'),
 (1, "aim'd"),
 (1, 'near,'),
 (1, 'mark-man!'),
 (1, 'And'),
 (1, 'mark,'),
 (1, 'hit.'),
 (1, 'ROMEO\tWell,'),
 (1, "she'll"),
 (1, "Dian's"),
 (1, 'wit;'),
 (1, 'proof'),
 (1, 'chastity'),
 (1, "arm'd,"),
 (1, 'childish'),
 (1, "unharm'd."),
 (1, 'siege'),
 (1, 'terms,'),
 (1, 'encounter'),
 (1, 'lap'),
 (1, 'gold:'),
 (1, 'only'),
 (1, 'poor,'),
 (1, 'BENVOLIO\tThen'),
 (1, 'sworn'),
 (1, 'hath,'),
 (1, 'sparing'),
 (1, 'huge'),
 (1, 'starved'),
 (1, 'severity'),
 (1, '\tCuts'),
 (1, 'posterity.'),
 (1, 'wise,'),
 (1, 'wisely'),
 (1, 'making'),
 (1, 'forsworn'),
 (1, 'vow'),
 (1, 'now.'),
 (1, 'ruled'),
 (1, 'think'),
 (1, 'think.'),
 (1, 'BENVOLIO\tBy'),
 (1, 'thine'),
 (1, '\tExamine'),
 (1, 'beauties.'),
 (1, 'way'),
 (1, 'hers'),
 (1, 'exquisite,'),
 (1, 'question'),
 (1, 'more:'),
 (1, '\tThese'),
 (1, 'masks'),
 (1, "ladies'"),
 (1, 'brows'),
 (1, 'hide'),
 (1, 'fair;'),
 (1, 'cannot'),
 (1, 'precious'),
 (1, 'treasure'),
 (1, 'doth'),
 (1, 'serve,'),
 (1, 'read'),
 (1, 'fair?'),
 (1, '\tFarewell:'),
 (1, 'doctrine,'),
 (1, 'die'),
 (1, 'ROMEO'),
 (1, 'AND'),
 (1, 'JULIET'),
 (1, 'A'),
 (1, '\tarmed'),
 (1, 'coals.'),
 (1, 'then'),
 (1, 'strike'),
 (1, 'quickly,'),
 (1, 'GREGORY\tBut'),
 (1, 'GREGORY\tTo'),
 (1, 'stir;'),
 (1, "runn'st"),
 (1, 'wall'),
 (1, 'GREGORY\tThat'),
 (1, 'shows'),
 (1, 'thee'),
 (1, 'slave;'),
 (1, 'weakest'),
 (1, 'SAMPSON\tTrue;'),
 (1, '\tare'),
 (1, "\tMontague's"),
 (1, 'wall,'),
 (1, 'between'),
 (1, 'masters'),
 (1, "SAMPSON\t'Tis"),
 (1, 'show'),
 (1, 'cruel'),
 (1, '\tmaids,'),
 (1, 'GREGORY\tThey'),
 (1, "\t'tis"),
 (1, 'piece'),
 (1, 'flesh.'),
 (1, 'hadst,'),
 (1, '\thadst'),
 (1, 'John.'),
 (1, 'Draw'),
 (1, 'out:'),
 (1, 'thee.'),
 (1, 'run?'),
 (1, 'SAMPSON\tFear'),
 (1, 'SAMPSON\tLet'),
 (1, 'sides;'),
 (1, 'GREGORY\tI'),
 (1, '\tthey'),
 (1, 'list.'),
 (1, 'SAMPSON\tNay,'),
 (1, 'them;'),
 (1, 'ABRAHAM'),
 (1, 'BALTHASAR]'),
 (1, 'SAMPSON\t[Aside'),
 (1, 'GREGORY]'),
 (1, 'Is'),
 (1, '\tbite'),
 (1, 'GREGORY\tDo'),
 (1, 'SAMPSON\tIf'),
 (1, 'you:'),
 (1, 'you.'),
 (1, 'better.'),
 (1, 'SAMPSON\tWell,'),
 (1, "master's"),
 (1, 'SAMPSON\tYes,'),
 (1, 'Gregory,'),
 (1, 'remember'),
 (1, 'swashing'),
 (1, 'BENVOLIO\tPart,'),
 (1, 'fools!'),
 (1, '\tPut'),
 (1, 'do.'),
 (1, '\t[Beats'),
 (1, 'swords]'),
 (1, 'drawn'),
 (1, 'among'),
 (1, 'heartless'),
 (1, 'Benvolio,'),
 (1, 'keep'),
 (1, 'manage'),
 (1, 'drawn,'),
 (1, 'talk'),
 (1, 'peace!'),
 (1, 'Montagues,'),
 (1, '\t[Enter,'),
 (1, 'houses,'),
 (1, 'clubs]'),
 (1, 'First'),
 (1, 'bills,'),
 (1, 'Capulets!'),
 (1, 'Montagues!'),
 (1, 'gown,'),
 (1, 'noise'),
 (1, 'this?'),
 (1, 'crutch,'),
 (1, 'sword?'),
 (1, 'CAPULET\tMy'),
 (1, 'say!'),
 (1, 'flourishes'),
 (1, 'blade'),
 (1, 'spite'),
 (1, 'Capulet,--Hold'),
 (1, 'foot'),
 (1, 'seek'),
 (1, 'PRINCE,'),
 (1, 'subjects,'),
 (1, 'enemies'),
 (1, 'steel,--'),
 (1, 'What,'),
 (1, 'quench'),
 (1, 'pernicious'),
 (1, 'purple'),
 (1, 'veins,'),
 (1, 'torture,'),
 (1, 'those'),
 (1, '\tThrow'),
 (1, 'sentence'),
 (1, 'civil'),
 (1, 'brawls,'),
 (1, 'airy'),
 (1, '\tBy'),
 (1, 'made'),
 (1, 'beseeming'),
 (1, 'ornaments,'),
 (1, "\tCanker'd"),
 (1, 'hate:'),
 (1, 'disturb'),
 (1, 'time,'),
 (1, '\tYou'),
 (1, 'Capulet;'),
 (1, 'along'),
 (1, 'me:'),
 (1, 'come'),
 (1, 'afternoon,'),
 (1, 'pleasure'),
 (1, 'Free-town,'),
 (1, 'judgment-place.'),
 (1, '\tOnce'),
 (1, 'more,'),
 (1, 'nephew,'),
 (1, 'began?'),
 (1, 'servants'),
 (1, 'yours,'),
 (1, 'ere'),
 (1, 'approach:'),
 (1, 'drew'),
 (1, 'them:'),
 (1, 'fiery'),
 (1, 'Tybalt,'),
 (1, 'sword'),
 (1, '\tWhich,'),
 (1, 'breathed'),
 (1, 'defiance'),
 (1, 'ears,'),
 (1, 'about'),
 (1, '\tWho'),
 (1, "hiss'd"),
 (1, 'scorn:'),
 (1, 'interchanging'),
 (1, 'blows,'),
 (1, '\tTill'),
 (1, 'came,'),
 (1, 'parted'),
 (1, 'either'),
 (1, 'MONTAGUE\tO,'),
 (1, 'Romeo?'),
 (1, 'saw'),
 (1, '\tRight'),
 (1, 'glad'),
 (1, 'fray.'),
 (1, 'BENVOLIO\tMadam,'),
 (1, 'hour'),
 (1, "worshipp'd"),
 (1, 'forth'),
 (1, 'golden'),
 (1, 'window'),
 (1, 'troubled'),
 (1, 'drave'),
 (1, '\tWhere,'),
 (1, 'grove'),
 (1, 'westward'),
 (1, 'rooteth'),
 (1, 'early'),
 (1, 'walking'),
 (1, 'son:'),
 (1, '\tTowards'),
 (1, 'made,'),
 (1, 'stole'),
 (1, 'covert'),
 (1, '\tI,'),
 (1, 'measuring'),
 (1, 'affections'),
 (1, 'own,'),
 (1, "they're"),
 (1, '\tPursued'),
 (1, 'pursuing'),
 (1, 'his,'),
 (1, 'fled'),
 (1, 'seen,'),
 (1, 'fresh'),
 (1, '\tAdding'),
 (1, 'deep'),
 (1, 'soon'),
 (1, 'begin'),
 (1, 'curtains'),
 (1, "Aurora's"),
 (1, 'bed,'),
 (1, '\tAway'),
 (1, 'light'),
 (1, 'steals'),
 (1, 'son,'),
 (1, 'chamber'),
 (1, 'himself,'),
 (1, '\tShuts'),
 (1, 'windows,'),
 (1, 'locks'),
 (1, 'night:'),
 (1, '\tBlack'),
 (1, 'portentous'),
 (1, 'prove,'),
 (1, 'counsel'),
 (1, 'remove.'),
 (1, 'BENVOLIO\tMy'),
 (1, 'uncle,'),
 (1, 'cause?'),
 (1, 'him.'),
 (1, 'BENVOLIO\tHave'),
 (1, 'importuned'),
 (1, 'means?'),
 (1, 'MONTAGUE\tBoth'),
 (1, 'many'),
 (1, 'counsellor,'),
 (1, '\tIs'),
 (1, 'himself--I'),
 (1, 'true--'),
 (1, 'discovery,'),
 (1, 'bud'),
 (1, 'bit'),
 (1, 'envious'),
 (1, 'worm,'),
 (1, 'spread'),
 (1, 'whence'),
 (1, 'sorrows'),
 (1, 'know.'),
 (1, 'please'),
 (1, 'aside;'),
 (1, 'grievance,'),
 (1, 'denied.'),
 (1, 'wert'),
 (1, 'Come,'),
 (1, 'madam,'),
 (1, 'BENVOLIO\tGood-morrow,'),
 (1, 'cousin.'),
 (1, 'day'),
 (1, 'young?'),
 (1, 'BENVOLIO\tBut'),
 (1, 'struck'),
 (1, 'seem'),
 (1, '\tWas'),
 (1, 'went'),
 (1, 'hence'),
 (1, 'BENVOLIO\tIt'),
 (1, 'was.'),
 (1, 'hours?'),
 (1, 'having'),
 (1, 'which,'),
 (1, 'short.'),
 (1, 'BENVOLIO\tOf'),
 (1, 'favour,'),
 (1, 'BENVOLIO\tAlas,'),
 (1, 'view,'),
 (1, 'tyrannous'),
 (1, 'rough'),
 (1, 'ROMEO\tAlas,'),
 (1, 'view'),
 (1, 'still,'),
 (1, 'without'),
 (1, 'will!'),
 (1, 'fray'),
 (1, 'here?'),
 (1, 'all.'),
 (1, '\tWhy,'),
 (1, 'brawling'),
 (1, 'thing,'),
 (1, 'first'),
 (1, 'create!'),
 (1, 'serious'),
 (1, 'well-seeming'),
 (1, 'forms!'),
 (1, 'lead,'),
 (1, 'smoke,'),
 (1, 'I,'),
 (1, 'this.'),
 (1, '\tDost'),
 (1, 'weep.'),
 (1, 'ROMEO\tGood'),
 (1, 'heart,'),
 (1, 'such'),
 (1, 'transgression.'),
 (1, 'wilt'),
 (1, 'shown'),
 (1, '\tDoth'),
 (1, 'add'),
 (1, 'grief'),
 (1, '\tLove'),
 (1, 'smoke'),
 (1, 'fume'),
 (1, 'sparkling'),
 (1, "vex'd"),
 (1, 'tears:'),
 (1, 'else?'),
 (1, 'madness'),
 (1, 'choking'),
 (1, 'gall'),
 (1, '\tFarewell,'),
 (1, 'coz.'),
 (1, 'Soft!'),
 (1, 'along;'),
 (1, 'so,'),
 (1, 'ROMEO\tTut,'),
 (1, 'myself;'),
 (1, 'here;'),
 (1, "he's"),
 (1, 'some'),
 (1, 'ROMEO\tWhat,'),
 (1, 'BENVOLIO\tGroan!'),
 (1, 'why,'),
 (1, 'sadly'),
 (1, 'who.'),
 (1, 'sick'),
 (1, '\tAh,'),
 (1, 'word'),
 (1, '\tIn'),
 (1, 'woman.'),
 (1, 'supposed'),
 (1, 'loved.'),
 (1, 'ROMEO\tA'),
 (1, "she's"),
 (1, 'BENVOLIO\tA'),
 (1, 'soonest'),
 (1, 'miss:'),
 (1, "Cupid's"),
 (1, 'arrow;'),
 (1, 'strong'),
 (1, '\tFrom'),
 (1, 'bow'),
 (1, 'stay'),
 (1, 'bide'),
 (1, 'assailing'),
 (1, 'ope'),
 (1, 'saint-seducing'),
 (1, '\tO,'),
 (1, 'rich'),
 (1, 'beauty,'),
 (1, 'store.'),
 (1, 'still'),
 (1, 'chaste?'),
 (1, 'ROMEO\tShe'),
 (1, 'waste,'),
 (1, 'merit'),
 (1, 'bliss'),
 ...]

Text file manipulation

text_file.collect()
text_file.count()
374
lineLengths = text_file.map(lambda s: len(s))
lineLengths.collect() # 각 라인의 글자수를 출력
[16,
 0,
 0,
 5,
 0,
 0,
 0,
 31,
 0,
 0,
 53,
 32,
 0,
 51,
 0,
 43,
 0,
 47,
 0,
 61,
 0,
 38,
 0,
 49,
 0,
 48,
 0,
 58,
 49,
 0,
 58,
 48,
 0,
 58,
 13,
 0,
 60,
 51,
 51,
 13,
 0,
 60,
 0,
 57,
 51,
 32,
 0,
 31,
 0,
 57,
 33,
 0,
 48,
 0,
 56,
 41,
 0,
 56,
 48,
 35,
 0,
 58,
 0,
 35,
 0,
 20,
 0,
 31,
 0,
 57,
 0,
 58,
 11,
 0,
 56,
 46,
 0,
 30,
 0,
 42,
 0,
 32,
 0,
 42,
 0,
 60,
 4,
 0,
 11,
 0,
 58,
 20,
 0,
 28,
 0,
 29,
 0,
 67,
 0,
 18,
 0,
 18,
 0,
 60,
 0,
 25,
 0,
 16,
 0,
 65,
 0,
 13,
 0,
 17,
 0,
 21,
 46,
 0,
 26,
 0,
 15,
 0,
 56,
 42,
 0,
 51,
 40,
 0,
 55,
 41,
 22,
 0,
 13,
 0,
 51,
 33,
 0,
 66,
 49,
 0,
 46,
 0,
 54,
 0,
 58,
 0,
 46,
 41,
 0,
 35,
 0,
 55,
 0,
 55,
 0,
 32,
 0,
 45,
 45,
 51,
 45,
 47,
 44,
 46,
 44,
 42,
 36,
 48,
 35,
 41,
 41,
 49,
 39,
 47,
 41,
 37,
 40,
 43,
 45,
 45,
 0,
 55,
 0,
 50,
 42,
 0,
 50,
 46,
 41,
 43,
 43,
 43,
 45,
 47,
 48,
 46,
 0,
 52,
 41,
 0,
 49,
 44,
 41,
 40,
 44,
 37,
 42,
 39,
 39,
 46,
 36,
 44,
 0,
 48,
 45,
 50,
 40,
 42,
 38,
 46,
 41,
 45,
 39,
 45,
 42,
 0,
 47,
 0,
 48,
 0,
 46,
 0,
 47,
 40,
 41,
 39,
 36,
 40,
 47,
 35,
 49,
 41,
 0,
 14,
 0,
 56,
 44,
 0,
 48,
 46,
 0,
 36,
 0,
 29,
 0,
 26,
 0,
 29,
 0,
 33,
 44,
 0,
 54,
 0,
 55,
 0,
 17,
 0,
 11,
 0,
 17,
 0,
 44,
 0,
 48,
 43,
 0,
 51,
 48,
 47,
 42,
 49,
 43,
 38,
 35,
 40,
 42,
 13,
 44,
 45,
 21,
 0,
 32,
 0,
 26,
 0,
 40,
 0,
 40,
 43,
 44,
 51,
 45,
 47,
 48,
 48,
 42,
 39,
 18,
 0,
 49,
 40,
 0,
 45,
 42,
 0,
 50,
 0,
 40,
 0,
 24,
 23,
 0,
 46,
 42,
 39,
 0,
 52,
 0,
 51,
 0,
 53,
 0,
 51,
 41,
 45,
 50,
 45,
 42,
 40,
 37,
 47,
 0,
 61,
 0,
 53,
 37,
 36,
 44,
 37,
 43,
 41,
 0,
 48,
 0,
 47,
 0,
 43,
 24,
 0,
 18,
 42,
 47,
 47,
 40,
 44,
 41,
 42,
 47,
 45,
 0,
 53,
 0,
 9]
totalLength = lineLengths.reduce(lambda a, b : a+b)
totalLength # 전체 라인의 글자수 출력
10678

filter

text_file = sc.textFile("test2.txt")
lines = text_file.filter(lambda line : "line" in line) # line을 포함한 라인만 출력
lines.collect()
['This is first line', 'This is second line', 'This is last line']
lines = text_file.filter(lambda line : "last" in line) # last를 포함한 라인만 출력
lines.collect()
['This is last line']
lines.first()
'This is last line'

Split

data = sc.textFile("test3.txt")
data.collect()
['Carlo,5,3,3,4',
 'Mokhtar,2,5,5,3',
 'Jacques,4,2,4,5',
 'Bradaen,5,3,2,5',
 'Chris,5,4,5,1']
data1 = data.map(lambda line : line.split(",")) # list of list로 반환
data1.collect()
[['Carlo', '5', '3', '3', '4'],
 ['Mokhtar', '2', '5', '5', '3'],
 ['Jacques', '4', '2', '4', '5'],
 ['Bradaen', '5', '3', '2', '5'],
 ['Chris', '5', '4', '5', '1']]

Average

data2 = data1.map(lambda item : (item[0], item[1]+item[2]+item[3]+item[4]))
data2.collect()

# string 이기 때문에 문자열연결이 되어버림 -> 의도한 결과가 아님
[('Carlo', '5334'),
 ('Mokhtar', '2553'),
 ('Jacques', '4245'),
 ('Bradaen', '5325'),
 ('Chris', '5451')]
data2 = data1.map(lambda item: (item[0], int(item[1])+int(item[2])+int(item[3])+int(item[4])))
data2.collect()
[('Carlo', 15),
 ('Mokhtar', 15),
 ('Jacques', 15),
 ('Bradaen', 15),
 ('Chris', 15)]
data3 = data2.map(lambda item: (item[0], item[1], item[1]/4))
data3.collect() # 합과 평균을 출력
[('Carlo', 15, 3.75),
 ('Mokhtar', 15, 3.75),
 ('Jacques', 15, 3.75),
 ('Bradaen', 15, 3.75),
 ('Chris', 15, 3.75)]

mapValues

inputrdd = sc.parallelize([ ["maths", 50], ["maths", 60], ["english", 65], ["english", 85]])
inputrdd.collect()
[['maths', 50], ['maths', 60], ['english', 65], ['english', 85]]
mapped = inputrdd.mapValues(lambda mark: (mark,1)) # key값을 건들이지 않고 value값만 사용함
mapped.collect()
[('maths', (50, 1)),
 ('maths', (60, 1)),
 ('english', (65, 1)),
 ('english', (85, 1))]
reduced = mapped.reduceByKey(lambda x,y : (x[0]+y[0], x[1]+y[1]))
 # key 값을 묶으니깐 value 가 넘어오게 된다
reduced.collect()
[('maths', (110, 2)), ('english', (150, 2))]
average = reduced.map(lambda x : (x[0], x[1][0]/x[1][1]))
 # 하나씩 읽으면서 평균을 구한다. 이때 X[1]는 튜플이 되어 [0], [1]로 연산
average.collect()
[('maths', 55.0), ('english', 75.0)]

PageRank

자기 자신을 링크하는 페이지가 많을수록 중요한 페이지라고 생각할 수 있음

모든 페이지의 링크수를 계산하여 랭크를 매길 수 있다

Algorithm

  • 모든 페이지에 1을 할당

  • 링크되는 수만큼 본인의 값을 나눠준다

  • 0.85를 곱해주고 0.15(중요도)를 더해준다

  • 계속해서 반복

PageRank example

위와 같을 때 페이지의 랭크를 구해보자

Represnetation

mapLink = sc.parallelize([ ["MapR", "Baidu"], ["MapR", "Blogger"], ["Baidu", "MapR"],\
                         ["Blogger","Google"], ["Blogger", "Baidu"], ["Google", "MapR"]])
links = mapLink.groupByKey()
links.collect() #value가 나오지 않음
[('Baidu', <pyspark.resultiterable.ResultIterable at 0x7fc12856c470>),
 ('Google', <pyspark.resultiterable.ResultIterable at 0x7fc12873b400>),
 ('MapR', <pyspark.resultiterable.ResultIterable at 0x7fc12873b358>),
 ('Blogger', <pyspark.resultiterable.ResultIterable at 0x7fc12873b4e0>)]
print(list((k, list(v)) for (k,v) in links.collect()))
[('Baidu', ['MapR']), ('Google', ['MapR']), ('MapR', ['Baidu', 'Blogger']), ('Blogger', ['Google', 'Baidu'])]
ranks = links.map(lambda pairs : (pairs[0], 1))
ranks.collect()

# 초기 랭크값을 설정
[('Baidu', 1), ('Google', 1), ('MapR', 1), ('Blogger', 1)]

join

print(list(( k, list(v)) for (k,v) in links.collect()))
[('Baidu', ['MapR']), ('Google', ['MapR']), ('MapR', ['Baidu', 'Blogger']), ('Blogger', ['Google', 'Baidu'])]
ranks.collect()
[('Baidu', 1), ('Google', 1), ('MapR', 1), ('Blogger', 1)]
cvalues = links.join(ranks)
cvalues.collect()
[('Baidu', (<pyspark.resultiterable.ResultIterable at 0x7fc12874b1d0>, 1)),
 ('Google', (<pyspark.resultiterable.ResultIterable at 0x7fc12874b208>, 1)),
 ('MapR', (<pyspark.resultiterable.ResultIterable at 0x7fc12874b2b0>, 1)),
 ('Blogger', (<pyspark.resultiterable.ResultIterable at 0x7fc12874b358>, 1))]
cvalues = links.join(ranks)

"""
for (k,v) in cvalues.collect():
    print(k),
    print(linst(v[0])),
    print(v[1])
"""

print(list((k, list(v)) for (k,v) in cvalues.collect()))
print(list((k, (list(v[0]), v[1])) for (k,v) in cvalues.collect()))
[('Baidu', [<pyspark.resultiterable.ResultIterable object at 0x7fc12874f908>, 1]), ('Google', [<pyspark.resultiterable.ResultIterable object at 0x7fc12874f940>, 1]), ('MapR', [<pyspark.resultiterable.ResultIterable object at 0x7fc12874f9e8>, 1]), ('Blogger', [<pyspark.resultiterable.ResultIterable object at 0x7fc12874fa90>, 1])]
[('Baidu', (['MapR'], 1)), ('Google', (['MapR'], 1)), ('MapR', (['Baidu', 'Blogger'], 1)), ('Blogger', (['Google', 'Baidu'], 1))]

cvalues

def computeContribs(urls, rank):
    """Calculates URL contributions to the rank of other URLs."""
    num_urls = len(urls) # url의 갯수를 구하고
    for url in urls:
        yield (url, rank / num_urls) # 계산을 해준다
a = computeContribs([1,2,3], 1)
for i in a :
    print(i)
(1, 0.3333333333333333)
(2, 0.3333333333333333)
(3, 0.3333333333333333)
contribs = links.join(ranks).flatMap(
            lambda url_urls_rank: computeContribs(url_urls_rank[1][0], url_urls_rank[1][1]))
## [1][0]은 list가 되고, [1][1]은 랭크값을 의미
contribs.collect()
[('MapR', 1.0),
 ('MapR', 1.0),
 ('Baidu', 0.5),
 ('Blogger', 0.5),
 ('Google', 0.5),
 ('Baidu', 0.5)]

newRank calculation

# contribs에서 나왔던 값들을 key 값을 기준으로 묶는다
new_rank = contribs.reduceByKey(lambda x, y: x+y).collect()
print(new_rank)
new_rank1 = contribs.reduceByKey(lambda x,y : x+y).mapValues(lambda rank:0.15 + 0.85 * rank).collect()
print(new_rank1)
[('Baidu', 1.0), ('Google', 0.5), ('MapR', 2.0), ('Blogger', 0.5)]
[('Baidu', 1.0), ('Google', 0.575), ('MapR', 1.8499999999999999), ('Blogger', 0.575)]

과제 7

#7-1

text_file = sc.textFile("spark.txt")
data = text_file.flatMap(lambda l : l.split(" "))\
                .map(lambda w: (w, 1))\
                .reduceByKey(lambda x, y: x+y)\
                .filter(lambda x : x[1] >= 2)

data.collect()
[('Spark', 4), ('is', 3), ('a', 2), ('This', 2)]
#7-2

data = data.map(lambda p : (p[1],p[0]))\
                .sortByKey(ascending=True)

data.collect()
[('Spark', 4), ('This', 2), ('a', 2), ('is', 3)]
#7
mapLink = sc.parallelize([["PageA", "PageB"], ["PageA", "PageC"],\
                         ["PageB", "PageC"], ["PageC", "PageA"],\
                         ["PageD", "PageC"]])

links = mapLink.groupByKey()
print(list((k, list(v)) for (k,v) in links.collect()))
ranks = links.map(lambda pairs : (pairs[0], 1))
print(ranks.collect())
[('PageA', ['PageB', 'PageC']), ('PageB', ['PageC']), ('PageC', ['PageA']), ('PageD', ['PageC'])]
[('PageA', 1), ('PageB', 1), ('PageC', 1), ('PageD', 1)]
#7-3
def computeContribs(urls, rank):
    """Calculates URL contributions to the rank of other URLs."""
    num_urls = len(urls) # url의 갯수를 구하고
    for url in urls:
        yield (url, rank / num_urls) # 계산을 해준다

contribs = links.join(ranks).flatMap(
            lambda url_urls_rank: computeContribs(url_urls_rank[1][0], url_urls_rank[1][1]))

contribs.collect()
[('PageA', 1.0),
 ('PageB', 0.5),
 ('PageC', 0.5),
 ('PageC', 1.0),
 ('PageC', 1.0)]
#7-4
new_rank = contribs.reduceByKey(lambda x,y : x+y)\
                    .mapValues(lambda rank: 0.15+0.85*rank).collect()

print(new_rank)
[('PageC', 2.275), ('PageA', 1.0), ('PageB', 0.575)]
profile
'당신을 한 줄로 소개해보세요'를 이 블로그로 대신 해볼까합니다.

0개의 댓글