Created on 01 Aug 2018 ; Modified on 19 Sep 2018 ; Translation: italian
Beginnig to investigate how to implement full text search in a WEB based application developed with Python/Django, I read that the modern versions of SQLite 3, the DBMS library [1] distributed with Python, support it.
They do this using a FTS support library [2], to be used in versions 3 and later (FTS3, FTS4 and FTS5, at the time of writing; versions 1 and 2 must be avoided because bugged).
Also, always at the time of writing, at least FTS3 is supported by SQLite 3 distributed with Python 3.6 on MS Windows 10 system (64 bit) [3].
I often work with SQLite on Python using Django. So I decided to try this feature to check how to use it. And I observed that it is definitely simple.
For example, without being scared of the length, let's take the program sqlite_fts.py of which I quote the listing at the end of the article: Listing of sqlite_fts.py.
The create_table (db, tname) function (lines 102-117) creates two tables with fixed structure (for simplicity of coding):
The table v_{tname} is special: it feeds the appropriate managed indexes from SQLite 3/FTS3 to perform free text searches. Its primary purpose is not to record information, but only to feed the indexes. If necessary you can also use it to record information. But in our test program we do it in {tname} table.
The insert (db, tname, persons) function (lines 64-75) inserts some people in the DB. In particular, it performs two insertions:
The fts (db, tname, what) function (lines 90-99) does the full text search, made with a SELECT on the virtual table v_ {tname} using the MATCH operator. Line 95 asks to match the string what in the profile field. The string what can be a word, as a sentence.
The rest of the code is routine. In particular, the main () executes the opening of the DB [5] in line 193; and in lines 200-212 executes an infinite loop to interactively execute operator commands.
The execution of the commands is done due the cmd_exe (db, cmd) function (lines 142-172).
The other functions are support: drop tables, list of records, and so on.
The data entered in the tables for testing, are embedded in the variable PERSONS (source lines 41-60).
Let's take an example of a trial session.
Suppose we have created the sqlite_fts.py program in one temporary test directory: test. We perform the following steps:
>cd test # go to test directory >mkdir data # create data dir for DB >python sqlite_fts.py # execute program data/db.sqlite3 bd opened ('c', 'd', 'fts', 'h', 'i', 'l', 'q') cmd? c # create tables persons table name to create? (^C to stop) persons creating table(s) persons and v_persons ('c', 'd', 'fts', 'h', 'i', 'l', 'q') cmd? i # insert 5 records in tables (first 5 USA Presidents) inserting persons ('c', 'd', 'fts', 'h', 'i', 'l', 'q') cmd? l # list persons (output cutted) listing persons George Washington: George Washington (February 22, 1732[b][c] – December 14, 1799) was the first President ... ... CUT ... ('c', 'd', 'fts', 'h', 'i', 'l', 'q') cmd? fts # query for "virginia": 4 Presidents full text search of what? (^C to stop) virginia searching virginia in v_persons George Washington Thomas Jefferson James Madison Jr. James Monroe ('c', 'd', 'fts', 'h', 'i', 'l', 'q') cmd? fts # query for "virginia AND congress": 3 Presidents full text search of what? (^C to stop) virginia congress searching virginia congress in v_persons Thomas Jefferson James Madison Jr. James Monroe ('c', 'd', 'fts', 'h', 'i', 'l', 'q') cmd? fts # query for "virginia AND national AND congress": 2 Presidents full text search of what? (^C to stop) virginia national congress searching virginia national congress in v_persons Thomas Jefferson James Monroe
1 # sqlite_fts.py 2 # VERY BASIC: how manage sqlite3 using python AND do Full Text Search on it 3 # 4 # author: luciano de falco alfano 5 # license CC BY-SA 4.0: https://creativecommons.org/licenses/by-sa/4.0/ 6 # date: 2018.08.31 7 # 8 # overall logic: 9 # - main() open DB and execute an infinite loop to process user commands 10 # - cmd_exe() execute user command: 11 # - create_table() create a table with fixed record structure, and its 12 # counterpart virtual table even with fixed structure 13 # - drop_table() drop a single table 14 # - insert() insert some record 15 # - list() list all records 16 # - fts() full text search on records, list result keys 17 # 18 # excerpts from: 19 # - pure sqlite example: https://www.pythoncentral.io/introduction-to-sqlite-in-python/ 20 # https://www.pythoncentral.io/advanced-sqlite-usage-in-python/ 21 # - fts3 example: http://www.rkblog.rk.edu.pl/w/p/fulltext-search-sqlite-and-django-app/ 22 # - fts3 doc: https://www.sqlite.org/fts3.html 23 24 import pdb 25 import sqlite3 26 27 DB = 'data/db.sqlite3' 28 29 # dictionary of commands 30 DoCMDS = {'c': 'create table', 31 'd': 'drop table', 32 'fts': 'full text search', 33 'h': 'help', 34 'i': 'insert persons', 35 'l': 'list persons', 36 'q': 'quit', 37 } 38 CMDS = tuple([cmd for cmd in DoCMDS.keys()]) 39 40 # the first 5 presidents of USA, profile incipit from wikipedia 41 PERSONS = [ ('George Washington', 42 '''George Washington (February 22, 1732[b][c] – December 14, 1799) was the first President of the United States, and was among the nation's Founding Fathers. As commander-in-chief of the Continental Army during the American Revolutionary War, he was the principal force in victory, effected with the surrender of the British at Yorktown. In 1787 he presided over the Constitutional Convention which formed the new federal government. Since the late 1780s, Washington has been known as the "Father of His Country" by compatriots. Scholarly and public polling ranks him among the top three Presidents in history. 43 44 Washington was born to a moderately prosperous family of planters, who owned slaves in colonial Virginia. He had early opportunities in education, learned mathematics and quickly launched a successful career as a surveyor, which in turn enabled him to make considerable land investments. He then joined the Virginia militia and fought in the French and Indian War. During the war with the British he also fought against one of their allies, the Iroquois nation. His devotion to American Republicanism impelled him to decline further power after victory, and he resigned as commander-in-chief in 1783. ''', ), 45 ('John Adams', 46 '''John Adams (October 30 [O.S. October 19] 1735 – July 4, 1826) was an American statesman and Founding Father who served as the first Vice President (1789–1797) and second President of the United States (1797–1801). He was a lawyer, diplomat, political theorist, and a leader of the movement for American independence from Great Britain. He was also a dedicated diarist and correspondent, particularly with his wife and closest advisor, Abigail. 47 48 Adams collaborated with his cousin, revolutionary leader Samuel Adams, but established his own prominence prior to the American Revolution. Driven by his devotion to the right to counsel and the presumption of innocence, he defied local anti-British sentiment and provided a successful legal defense of the accused British soldiers involved in the Boston Massacre. Adams was sent as a delegate from colonial Massachusetts to the Continental Congress, where he played a leading role in persuading Congress to declare independence. He assisted in drafting the Declaration of Independence in 1776, and was its foremost advocate in Congress. As a diplomat in Europe, he helped negotiate the peace treaty with Great Britain and acquired vital governmental loans from Amsterdam bankers. Adams was the primary author of the Massachusetts Constitution in 1780, which influenced the development of America's own constitution, as did his earlier Thoughts on Government. ''',), 49 ('Thomas Jefferson', 50 '''Thomas Jefferson (April 13, [O.S. April 2] 1743 – July 4, 1826) was an American Founding Father who was the principal author of the Declaration of Independence and later served as the third president of the United States from 1801 to 1809. Previously, he had been elected the second vice president of the United States, serving under John Adams from 1797 to 1801. He was a proponent of democracy, republicanism, and individual rights motivating American colonists to break from Great Britain and form a new nation; he produced formative documents and decisions at both the state and national level. 51 52 Jefferson was mainly of English ancestry, born and educated in colonial Virginia. He graduated from the College of William & Mary and briefly practiced law, with the largest number of his cases concerning land ownership claims.[1] During the American Revolution, he represented Virginia in the Continental Congress that adopted the Declaration, drafted the law for religious freedom as a Virginia legislator, and served as a wartime governor (1779–1781). He became the United States Minister to France in May 1785, and subsequently the nation's first Secretary of State in 1790–1793 under President George Washington. Jefferson and James Madison organized the Democratic-Republican Party to oppose the Federalist Party during the formation of the First Party System. With Madison, he anonymously wrote the controversial Kentucky and Virginia Resolutions in 1798–1799, which sought to strengthen states' rights by nullifying the federal Alien and Sedition Acts. ''',), 53 ('James Madison Jr.', 54 '''James Madison Jr. (March 16, 1751 – June 28, 1836)[2] was an American statesman and Founding Father who served as the fourth President of the United States from 1809 to 1817. He is hailed as the "Father of the Constitution" for his pivotal role in drafting and promoting the United States Constitution and the Bill of Rights. 55 56 Born into a prominent Virginia planting family, Madison served as a member of the Virginia House of Delegates and the Continental Congress during and after the American Revolutionary War. In the late 1780s, he helped organize the Constitutional Convention, which produced a new constitution to supplant the ineffective Articles of Confederation. After the Convention, Madison became one of the leaders in the movement to ratify the Constitution, and his collaboration with Alexander Hamilton produced The Federalist Papers, among the most important treatises in support of the Constitution. ''',), 57 ('James Monroe', 58 '''James Monroe (/mənˈroʊ/; April 28, 1758 – July 4, 1831) was an American statesman and Founding Father who served as the fifth President of the United States from 1817 to 1825. Monroe was the last president of the Virginia dynasty, and his presidency ushered in what is known as the Era of Good Feelings. Born in Westmoreland County, Virginia, Monroe was of the planter class and fought in the American Revolutionary War. He was wounded in the Battle of Trenton with a musket ball to the shoulder. After studying law under Thomas Jefferson from 1780 to 1783, he served as a delegate in the Continental Congress.[1] 59 60 As an anti-federalist delegate to the Virginia convention that considered ratification of the United States Constitution, Monroe opposed ratification, claiming it gave too much power to the central government. Nonetheless, he took an active part in the new government, and in 1790 was elected to the Senate of the first United States Congress, where he joined the Democratic-Republicans. He gained experience as an executive as the Governor of Virginia and rose to national prominence as a diplomat in France when he helped negotiate the Louisiana Purchase in 1803. During the War of 1812, Monroe served in critical roles as Secretary of State and the Secretary of War under President James Madison.[2] ''',), ] 61 62 TABLE = '' 63 64 def insert(db, tname, persons): 65 '''insert a bunch of persons in a (empty) table''' 66 67 try: 68 cursor = db.cursor() 69 for person in persons: 70 cursor.execute(f'INSERT INTO {tname}(name, profile) VALUES(?,?)', person) 71 cursor.execute(f'INSERT INTO v_{tname}(name, profile) VALUES(?,?)', person) 72 db.commit() 73 except Exception as ex: 74 db.rollback() 75 raise ex 76 77 78 def list(db, tname): 79 '''list of all persons in table''' 80 81 try: 82 cursor = db.cursor() 83 cursor.execute(f'SELECT name, profile FROM {tname}') 84 for row in cursor: 85 print(f' {row[0]}: {row[1]}') 86 except Exception as ex: 87 raise ex 88 89 90 def fts(db, tname, what): 91 '''full text search of something in virtual table v_{tname}''' 92 93 try: 94 cursor = db.cursor() 95 cursor.execute(f'SELECT name FROM v_{tname} WHERE profile MATCH ?', (what,)) 96 for row in cursor: 97 print(f' {row[0]}') 98 except Exception as ex: 99 raise ex 100 101 102 def create_table(db, tname): 103 '''create a table with id/name/profile record 104 105 note: create even a virtual table named v_{tname}''' 106 107 try: 108 cursor = db.cursor() 109 cursor.execute(f'CREATE TABLE IF NOT EXISTS {tname}(\ 110 id INTEGER PRIMARY KEY, \ 111 name TEXT, \ 112 profile TEXT)') 113 cursor.execute(f'CREATE VIRTUAL TABLE v_{tname} USING fts3(name, profile)') 114 db.commit() 115 except Exception as ex: 116 db.rollback() 117 raise ex 118 119 120 def drop_table(db, tname): 121 '''drop A SINGLE table 122 123 note: tname and v_{tname} must be dropped separately 124 this is because in case of partial drop (one of two tables) 125 program could become stuck''' 126 127 try: 128 cursor = db.cursor() 129 cursor.execute(f'DROP TABLE {tname}') 130 db.commit() 131 except Exception as ex: 132 db.rollback() 133 raise ex 134 135 def help(): 136 '''list commands''' 137 138 for cmd in CMDS: 139 print(f' {cmd}: {DoCMDS[cmd]}') 140 print() 141 142 def cmd_exe(db, cmd): 143 ''' execute command, BUT not 'quit' ''' 144 145 global TABLE 146 try: 147 #pdb.set_trace() 148 if not cmd in CMDS: 149 raise ValueError(f'ERROR: cmd {cmd} unknown') 150 elif cmd == 'c': # create table 151 tname = input(f'table name to create? (^C to stop) ') 152 print(f' creating table(s) {tname} and v_{tname}') 153 create_table(db, tname) 154 TABLE = tname[:] 155 elif cmd == 'd': # drop table 156 tname = input(f'table name to drop? (^C to stop) ') 157 print(f' deleting table {tname}') 158 drop_table(db, tname) 159 elif cmd == 'fts': # full text search 160 what = input(f'full text search of what? (^C to stop) ') 161 print(f' searching {what} in v_{TABLE}') 162 fts(db, TABLE, what) 163 elif cmd == 'i': # insert persons 164 print(f' inserting persons') 165 insert(db, TABLE, PERSONS) 166 elif cmd == 'l': # list persons 167 print(f' listing persons') 168 list(db, TABLE) 169 else: 170 raise ValueError(f'WARNING: cmd {cmd} unsupported ') 171 except Exception as ex: 172 raise ex 173 174 def get_table(db): 175 '''return 1st table name not starting with v_ prefix''' 176 177 try: 178 cursor = db.cursor() 179 cursor.execute("SELECT name FROM sqlite_master WHERE type = 'table'") 180 tables = [] 181 for row in cursor: 182 if row[0].split('_')[0]=='v': 183 continue 184 else: 185 return row[0] 186 except Exception as ex: 187 raise ex 188 189 190 def main(): 191 global TABLE 192 # Creates or opens a SQLite3 DB 193 with sqlite3.connect( DB ) as db: 194 TABLE = get_table(db) 195 print(f'{ DB } bd opened') 196 if TABLE: 197 print(f'{TABLE} table exists') 198 199 #command loop 200 while True: 201 try: 202 cmd = input(f'{ CMDS } cmd? ') 203 if not cmd in CMDS: 204 print(f'cmd { cmd } unknown; h for help') 205 continue 206 elif cmd == 'q': # quit 207 print(f'quitting') 208 break 209 else: 210 cmd_exe(db, cmd) 211 except Exception as ex: 212 raise 213 214 215 if __name__=='__main__': 216 main()
This code is downloadable from github.
What else to do in order to use full text search with Python and SQLite 3? Provided we use an updated version of Python, there aren't other needs: all included in Python batteries! :-)
Attention to the fact that virtual tables are powered with pure text. This means that if we use a markup language, as is normally the case, we have to delete it before passing the contents of the record to the virtual table. If we use html directly, it is possible to entrust the task to Python library beautifulsoup4:
from BeautifulSoup import BeautifulSoup html = '<div> <p>my content</p></div>' text = ''.join(BeautifulSoup(html).findAll(text=True))
If we use another markup language, check if it's possible convert it to and then to use beautifulsoup4 [6]:
from BeautifulSoup import BeautifulSoup from markdown import markdown html = markdown(some_markdown_string) text = ''.join(BeautifulSoup(html).findAll(text=True))
Almost I forgot: what about documentation? Here we find the FTS3 e FTS4 documentation; while here we find the FTS5 documentation.
And then?. And then, with this article I touched only the surface of the problem. Limiting myself to observing how to create, in a simple way, the virtual table of search support, and how to insert, from one source only, the records to be searched. But it is necessary deepen various aspects. For example: how does commit behave? How to make changes or cancellations of records? What happens in case of concurrent requests? ...
An environment worth exploring in depth. Moreover also because it allows to have data and indexes enclosed in a single file, in addition to being a library that requires few resources.
The obvious disadvantage is the fact that it binds the application to the DBMS: SQLite 3. Which usually isn't appreciated by those who develop in Django. One of the strengths of Django is the fact that its data modeling language, used with a minimum of attention, make the application DBMS unaware, allowing to use it with SQLite rather than PostgreSQL rather than MySQL (and others).
Enjoy by ldfa
[1] | Data Base Management System. |
[2] | Full Text Search. |
[3] | I read previous versions of SQLite 3 on MS Windows didn't support FTS3. It was necessary to compile SQLite 3 to obtain this feature under Windows. |
[4] | In the sense that if the table is called people, the virtual table is will call v_persone (plus other support tables managed by SQLite 3, which we do not need to explicitly manage). |
[5] | Opening the DB, if SQLite 3 doesn't find it, it creates a new empty instance. |
[6] | As suggested in stackoverflow. |