I want to know how is the term frequency factor i.e. tf
calculated ?
I want to know the tf
of the content. The results for the following query :
curl -g 'http://localhost:8983/solr/nutch/select?indent=on&q=python&wt=json&fl=title,score,[features%20efi.query=python%20store=myfeature_store]',content
is:
...
{
"title":"Raspberry Pi Stack Exchange",
"content":"Raspberry Pi Stack Exchange\nStack Exchange Network\nStack Exchange network consists of 175 Q&A communities including Stack Overflow , the largest, most trusted online community for developers to learn, share their knowledge, and build their careers.\nVisit Stack Exchange\nLoading…\n0\n+0\nTour Start here for a quick overview of the site\nHelp Center Detailed answers to any questions you might have\nMeta Discuss the workings and policies of this site\nAbout Us Learn more about Stack Overflow the company\nBusiness Learn more about hiring developers or posting ads with us\nLog in\nSign up\ncurrent community\nRaspberry Pi\nhelp\nchat\nRaspberry Pi Meta\nyour communities\nSign up or log in to customize your list.\nmore stack exchange communities\ncompany blog\nBy using our site, you acknowledge that you have read and understand our Cookie Policy , Privacy Policy , and our Terms of Service .\nRaspberry Pi Stack Exchange is a question and answer site for users and developers of hardware and software for Raspberry Pi. It only takes a minute to sign up.\nSign up to join this community\nAnybody can ask a question\nAnybody can answer\nThe best answers are voted up and rise to the top\nHome\nQuestions\nTags\nUsers\nUnanswered\nExplore our Questions\nAsk Question\nraspbian pi-3 gpio python networking wifi pi-2 usb boot ssh\nmore tags\nActive\nHot\nWeek\nMonth\n0\nvotes\n0\nanswers\n3\nviews\nHostname on router and pi do not match\nheadless\nasked 4 mins ago\nJoseph\n1\n2\nvotes\n0\nanswers\n49\nviews\nAndroid won't connect to RasPi access point\nandroid\naccess-point\nsystemd-networkd\nwpa-supplicant\nmodified 6 mins ago\nThePunisher\n121\n2\nvotes\n3\nanswers\n53\nviews\napt-get update errors after copying Raspbian to new SD card\nraspbian\napt\nmodified 17 mins ago\nifschleife\n121\n1\nvote\n5\nanswers\n444\nviews\nWifi cuts out after a few hours, have to restart Pi\nraspbian\nnetworking\nwifi\nssh\nminecraft\nmodified 53 mins ago\nCommunity ♦\n1\n2\nvotes\n2\nanswers\n369\nviews\nCan't SSH by name on stretch; can on jessie\nssh\nraspbian-stretch\nputty\nmodified 1 hour ago\nCommunity ♦\n1\n0\nvotes\n0\nanswers\n8\nviews\nHow to use only 3 GPIO pins for a JSN-SR04T waterproof ultrasonic sensor\ngpio\nsensor\nasked 2 hours ago\nPeter bill\n191\n1\nvote\n2\nanswers\n52\nviews\nGPIO Not changing its value in a particular code section\ngpio\npython\nrelay\nmodified 2 hours ago\ntlfong01\n2,465\n0\nvotes\n0\nanswers\n1\nview\nMakes OpenVPN a local Apache Webserver accessable from outside?\nweb-server\nvpn\napache-httpd\nweb-browsers\nweb\nasked 2 hours ago\nJakob\n113\n0\nvotes\n1\nanswer\n15\nviews\nsainsmart relay - switches on when pi shuts down\npi-3\nboot-issues\nanswered 2 hours ago\npir8ped\n79\n0\nvotes\n1\nanswer\n301\nviews\nRaspberry Pi Matchbox virtual keyboard missing colon\ndisplay\nmodified 2 hours ago\nCommunity ♦\n1\n-1\nvotes\n0\nanswers\n27\nviews\nHow to fix ssh connection that's been broken by dhcpcd service\nlinux\nnetworking\nssh\ndhcp\nmodified 3 hours ago\nBelserich\n1\n4\nvotes\n2\nanswers\n8k\nviews\nHow can I use OpenCV with Python 3 on a Raspberry Pi?\nopencv\npython-3\nanswered 3 hours ago\nIngo\n19.1k\n2\nvotes\n0\nanswers\n14\nviews\nRPi-Zero, HID keyboard gadget for BIOS keyboard\nusb\nkeyboard\nhid\nlibcomposite\nmodified 3 hours ago\nEphemeral\n1,561\n0\nvotes\n0\nanswers\n13\nviews\nHow do I go about auto-mounting my NTFS hard drive at boot?\nboot\nmount\nfstab\nntfs\nasked 3 hours ago\nHasake\n11\nBrowse more Questions\nHot Network Questions\nTriple Approx Symbol\nBest ways to invest for a planned house purchase in 1 year?\nVariable selection in logistic regression model\nShould rooms be designed to minimize waste of sheet goods?\nWhy is Perihelion and Shortest day in North Hemisphere different?\nHow can I estimate the speed of this code section for this microcontroller?\nShell - Navigate up 'n' directories\nLooking for an effective pattern to cope with switch statements in C#\n",
"score":0.00982895,
"[features]":"tf=2.0"},
...
How is the value 2.0 coming? The word python
is coming 4 times and there are 330 words in the content
.
Solr now uses the BM25 scorer and not TF/IDF directly. The tf
value used in BM25 is not the exact count of the times the term occur, but uses sqrt(TF)
.
sqrt(4) == 2.0
Raw TF TF Score
1 1.0
2 1.141
4 2.0
8 2.828
16 4.0
If I want to calculate the term frequency ratio (covered query term number divided by the number of query terms), how to do that ?
stackoverflow.com/a/34614215/4582711 here it pointed that
tf
calculate the ratio rather than the term no. So, why is it just calculating thesqrt(tf)
?What you're referring to seems to be
TF/IDF
- i.e. the ratio. The BM25 scorer uses the square root of the tf instead to get better relevancy than straightTF
. For theidf
factor it useslog ( numDocs / docFreq + 1) + 1
instead of the pureidf
value (since an increase in 100 documents isn't really 100 times less relevant). Term frequency is as the name indicates, how often the term occurs.Actually I'm trying to use Microsoft mslr dataset microsoft.com/en-us/research/project/mslr It seems that features 6-9 are not
TF/IDF
. It seems it simplyterm_no/total_terms
.I'm not sure to tell you - the answer to your question, why the
tf
ofpython
in your example is 2.0 and not 4, is because BM25 usessqrt(RAW TF)
as it's actualtf
value.