File Processing

damjaeng-i·2022년 8월 7일
0

2022 PY4E

목록 보기
12/18
post-thumbnail

File Processing

  • A text file can be thought of as a sequence of lines

Opening a File

  • Before we can read the contents of the file, we must tell Python which file we are going to work with and what we will be doing with the file
  • This is done with the open() function
  • open() returns a “file handle” - a variable used to perform operations on the file
  • Similar to “File → Open” in a Word Processor

Using open()

  • handle = open(filename, mode)
  • fhand = open(’mbox.txt’, ‘r’)
  • returns a handle use to manupulate the file
  • filename is a string
  • mode is optional and should be ‘r’ if we are planning to read the file and ‘w’ if we are going to write to the file

What is a Handle?

>>> fhand = open('mbox.txt')
>>> print(fhand)
<_io.TextIOWrapper name = 'mbox.txt' mode='r' encoding='UTF-8'>

When Files are Missing

  • Traceback Error

Then newline Character

  • We use a special character called the “newline” to indicate when a line ends
  • We present it as \n in strings
  • Newline is still one character - not two
>>> stuff = 'Hello\nWorld'
>>> stuff
'Hello\nWorld'

>>> print(stuff)
Hello 
World!

>>> stuff = 'X\nY'
>>> print(stuff)
X
Y

>>> len(stuff)
3

‘\n’ is also considered as one character.

File Processing

  • A text file can be thought of as a sequence of lines

File Handle as a Sequence

  • A file handle open for read can be treated as a sequence of strings where each line in the file is a string in the sequence
  • We can use the for statement to iterate through a sequence
  • Remember - a sequence is an ordered set
xfile = open('mbox.txt')
for cheese in xfile:
		print(cheese)

Counting Lines in a File

  • Open a file read-only
  • Use a for loop to read each line
  • Count the lines and print out the number of lines
fhand = open('mbox.txt')
count = 0
for line in fhand:
		count = count + 1
print('Line Count:', count)

$ Python open.py
Line Count: 132045

Reading the Whole File

We can read the while file (newlines and all) into a single string

>>> fhand = open('mbox-short.txt')
>>> inp = fhand.read()
>>> print(len(inp))
94626
>>> print(inpp[:20])
From stephen.marquar

Searching Through a File

We can put an if statement in our for loop to only print lines that meet some criteria

fhand = open('mbox-short.txt')
for line in fhand:
	if line.startswitch('From:') :
			print(line) 

OOPS!

What are all these blank lines doing here?

  • Each line from the file has a newline at the end
  • The print statement adds a newline to each line

Searching Through a File (fixed)

  • We can strip the whitespace from the right-hand side of the string using rstrip() from the string library
  • The newline is considered “white space” and is stripped
fhand = open('mbox-short.txt')
for line in fhand:
	line = line.rstrip()
	if line.startswitch('From:') :
			print(line) 

Skipping with continue

We can conveniently skip a line by using the continue statement

fhand = open('mbox-short.txt')
for line in fhand:
	line = line.rstrip()
	if not line.startswitch('From:') :
			continue
	print(line) 

Using in to select lines

We can look for a string anywhere in a line as our selection criteria

fhand = open('mbox-short.txt')
for line in fhand:
		line = line.rstrip()
		if not '@uct.ac.za' in line :
				continue
		print(line)

Prompt for File Name

fname = input('Enter the file name:  ')
try:
    fhand = open(fname)
except:
    print('File cannot be opened: ', fname)
    quit()

count = 0
for line in fhand:
    if line.startswith('Subject:') :
        count = count + 1
print('There were', count, 'subject lines in', fname)
profile
목표 : 부지런한 개발자

0개의 댓글