- Python - Text Processing
- Python - Text Processing Introduction
- Python - Text Processing Environment
- Python - String Immutability
- Python - Sorting Lines
- Python - Reformatting Paragraphs
- Python - Counting Token in Paragraphs
- Python - Binary ASCII Conversion
- Python - Strings as Files
- Python - Backward File Reading
- Python - Filter Duplicate Words
- Python - Extract Emails from Text
- Python - Extract URL from Text
- Python - Pretty Print
- Python - Text Processing State Machine
- Python - Capitalize and Translate
- Python - Tokenization
- Python - Remove Stopwords
- Python - Synonyms and Antonyms
- Python - Text Translation
- Python - Word Replacement
- Python - Spelling Check
- Python - WordNet Interface
- Python - Corpora Access
- Python - Tagging Words
- Python - Chunks and Chinks
- Python - Chunk Classification
- Python - Text Classification
- Python - Bigrams
- Python - Process PDF
- Python - Process Word Document
- Python - Reading RSS feed
- Python - Sentiment Analysis
- Python - Search and Match
- Python - Text Munging
- Python - Text wrapping
- Python - Frequency Distribution
- Python - Text Summarization
- Python - Stemming Algorithms
- Python - Constrained Search
- Selected Reading
- UPSC IAS Exams Notes
- Developer's Best Practices
- Questions and Answers
- Effective Resume Writing
- HR Interview Questions
- Computer Glossary
- Who is Who
Python - Text Processing Environment
To successfully create and run the example code in this tutorial we will need an environment set up which will have both general-purpose python as well as the special packages required for Data science. We will first look as installing the general-purpose python which can be python 2 or python 3. But we will prefer python 2 for this tutorial mainly because of its maturity and wider support of external packages.
The most up-to-date and current source code, binaries, documentation, news, etc., is available on the official website of Python https://www.python.org/
You can download Python documentation from https://www.python.org/doc/. The documentation is available in HTML, PDF, and PostScript formats.
Python distribution is available for a wide variety of platforms. You need to download only the binary code applicable for your platform and install Python.
If the binary code for your platform is not available, you need a C compiler to compile the source code manually. Compiling the source code offers more flexibility in terms of choice of features that you require in your installation.
Here is a quick overview of installing Python on various platforms −
Unix and Linux Installation
Here are the simple steps to install Python on Unix/Linux machine.
Open a Web browser and go to https://www.python.org/downloads/.
Follow the link to download zipped source code available for Unix/Linux.
Download and extract files.
Editing the Modules/Setup file if you want to customize some options.
run ./configure script
This installs Python at standard location /usr/local/bin and its libraries at /usr/local/lib/pythonXX where XX is the version of Python.
Here are the steps to install Python on Windows machine.
Open a Web browser and go to https://www.python.org/downloads/.
Follow the link for the Windows installer python-XYZ.msi file where XYZ is the version you need to install.
To use this installer python-XYZ.msi, the Windows system must support Microsoft Installer 2.0. Save the installer file to your local machine and then run it to find out if your machine supports MSI.
Run the downloaded file. This brings up the Python install wizard, which is really easy to use. Just accept the default settings, wait until the install is finished, and you are done.
Recent Macs come with Python installed, but it may be several years out of date. See http://www.python.org/download/mac/ for instructions on getting the current version along with extra tools to support development on the Mac. For older Mac OS's before Mac OS X 10.3 (released in 2003), MacPython is available.
Jack Jansen maintains it and you can have full access to the entire documentation at his website − http://www.cwi.nl/~jack/macpython.html. You can find complete installation details for Mac OS installation.
Setting up PATH
Programs and other executable files can be in many directories, so operating systems provide a search path that lists the directories that the OS searches for executables.
The path is stored in an environment variable, which is a named string maintained by the operating system. This variable contains information available to the command shell and other programs.
The path variable is named as PATH in Unix or Path in Windows (Unix is case sensitive; Windows is not).
In Mac OS, the installer handles the path details. To invoke the Python interpreter from any particular directory, you must add the Python directory to your path.
Setting path at Unix/Linux
To add the Python directory to the path for a particular session in Unix −
In the csh shell − type setenv PATH "$PATH:/usr/local/bin/python" and press Enter.
In the bash shell (Linux) − type export ATH="$PATH:/usr/local/bin/python" and press Enter.
In the sh or ksh shell − type PATH="$PATH:/usr/local/bin/python" and press Enter.
Note − /usr/local/bin/python is the path of the Python directory
Setting path at Windows
To add the Python directory to the path for a particular session in Windows −
At the command prompt − type path %path%;C:\Python and press Enter.
Note − C:\Python is the path of the Python directory
Python Environment Variables
Here are important environment variables, which can be recognized by Python −
|Sr.No.||Variable & Description|
It has a role similar to PATH. This variable tells the Python interpreter where to locate the module files imported into a program. It should include the Python source library directory and the directories containing Python source code. PYTHONPATH is sometimes preset by the Python installer.
It contains the path of an initialization file containing Python source code. It is executed every time you start the interpreter. It is named as .pythonrc.py in Unix and it contains commands that load utilities or modify PYTHONPATH.
It is used in Windows to instruct Python to find the first case-insensitive match in an import statement. Set this variable to any value to activate it.
It is an alternative module search path. It is usually embedded in the PYTHONSTARTUP or PYTHONPATH directories to make switching module libraries easy.
There are three different ways to start Python −
You can start Python from Unix, DOS, or any other system that provides you a command-line interpreter or shell window.
Enter python the command line.
Start coding right away in the interactive interpreter.
$python # Unix/Linux or python% # Unix/Linux or C:> python # Windows/DOS
Here is the list of all the available command line options −
|Sr.No.||Option & Description|
It provides debug output.
It generates optimized bytecode (resulting in .pyo files).
Do not run import site to look for Python paths on startup.
verbose output (detailed trace on import statements).
disable class-based built-in exceptions (just use strings); obsolete starting with version 1.6.
run Python script sent in as cmd string
run Python script from given file
Script from the Command-line
A Python script can be executed at command line by invoking the interpreter on your application, as in the following −
$python script.py # Unix/Linux or python% script.py # Unix/Linux or C: >python script.py # Windows/DOS
Note − Be sure the file permission mode allows execution.
Integrated Development Environment
You can run Python from a Graphical User Interface (GUI) environment as well, if you have a GUI application on your system that supports Python.
Unix − IDLE is the very first Unix IDE for Python.
Windows − PythonWin is the first Windows interface for Python and is an IDE with a GUI.
Macintosh − The Macintosh version of Python along with the IDLE IDE is available from the main website, downloadable as either MacBinary or BinHex'd files.
Installing NLTK Pack
NLTK is very straight forward to integrate into the python environment. Use the below command to add NLTK to the environment.
sudo pip install -U nltk
The addition of other libraries will be discussed in each chapter as and when we need for their use in the python program.