|
|
|
|
| View previous topic :: View next topic |
| Author |
Message |
Antrax
ESL Student
|
Posted: Sun Jun 25, 2006 6:36 am Post subject: 1 |
|
|
A lecturer mentioned there exist sorting algorithms, not based on comparisons of course, that sort in better complexity than nlogn, without making any assumptions on the input (other than possibly the fact it's n numbers and not just entities with some order relation defined on them).
Is anyone familiar with these? Can point me in the direction of one, or even better, explain how such a thing would work? _________________ After years of disappointment with get rich quick schemes, I know I'm gonna get rich with this scheme. And quick! |
|
| Back to top |
|
 |
Samadhi
+1
|
Posted: Sun Jun 25, 2006 6:55 am Post subject: 2 |
|
|
| Quote: |
| not based on comparisons of course |
Then based on what? _________________ And he lived happily ever after. Except for the dieing at the end and the heartbreak in between. |
|
| Back to top |
|
 |
Antrax
ESL Student
|
Posted: Sun Jun 25, 2006 12:26 pm Post subject: 3 |
|
|
Some sort of bit computations. I'm not entirely sure myself, which is why I'm asking, because I was surprised to find out such algorithms exist. _________________ After years of disappointment with get rich quick schemes, I know I'm gonna get rich with this scheme. And quick! |
|
| Back to top |
|
 |
Talzor
Daedalian Member
|
Posted: Sun Jun 25, 2006 1:15 pm Post subject: 4 |
|
|
| He is probably thinking about Radix sort. |
|
| Back to top |
|
 |
Antrax
ESL Student
|
Posted: Sun Jun 25, 2006 1:54 pm Post subject: 5 |
|
|
No, Radix Sort (and Bucket/Counting sort) assumes the input is bounded. Otherwise the complexity is not better than nlogn. _________________ After years of disappointment with get rich quick schemes, I know I'm gonna get rich with this scheme. And quick! |
|
| Back to top |
|
 |
Lepton*
Guest
|
Posted: Sun Jun 25, 2006 5:38 pm Post subject: 6 |
|
|
| I don't understand the field, but it seems like the Radix Sort could be implemented in such a way as to deal with data with an unknown range. I do something similar in an complicated n-dimensional numerical integrator on which I am working. |
|
| Back to top |
|
 |
Bicho the Inhaler
Daedalian Member
|
Posted: Sun Jun 25, 2006 6:11 pm Post subject: 7 |
|
|
| Radix sorting doesn't require the input to be bounded. This is clearly true, e. g., for LSB algorithms. (Sort on the least significant bit or character using a stable sorting algorithm. Then sort on the second-least significant bit or character using a stable algorithm. Keep going until you run out of bits or characters.) |
|
| Back to top |
|
 |
Antrax
ESL Student
|
Posted: Sun Jun 25, 2006 6:24 pm Post subject: 8 |
|
|
The complexity of Radix Sort is O(k*n) where k is the "size" of the input elements. So, when sorting n numbers, each 2^(2^(2^n)), you get a run time that is far worse than nlogn, so this algorithm's complexity depends on the input being bounded to something reasonable. So, this is not what I was asking about.
extro? _________________ After years of disappointment with get rich quick schemes, I know I'm gonna get rich with this scheme. And quick! |
|
| Back to top |
|
 |
Bicho the Inhaler
Daedalian Member
|
Posted: Sun Jun 25, 2006 7:25 pm Post subject: 9 |
|
|
| Antrax, while that's true, that point is unfair unless you scrutinize other sorting algorithms in the same light. When analyzing a comparison-based sorting algorithm, we usually abstract away the time taken to compare elements. A comparison-based algorithm that is O(n*log(n)) is therefore actually O(n*log(n)*k), where k is the length of the input numbers (since comparison is an O(k) operation). Your point therefore isn't a sensible objection. |
|
| Back to top |
|
 |
Antrax
ESL Student
|
Posted: Sun Jun 25, 2006 8:00 pm Post subject: 10 |
|
|
Bicho, I disagree, though mostly because that's what I was taught. Also because I know for a fact he didn't mean Radix sort and friends, but rather something even stronger (I asked the professor). _________________ After years of disappointment with get rich quick schemes, I know I'm gonna get rich with this scheme. And quick! |
|
| Back to top |
|
 |
Nsof
Daedalian Member
|
Posted: Sun Jun 25, 2006 10:26 pm Post subject: 11 |
|
|
There are some sorting algs here http://en.wikipedia.org/wiki/Sort_algorithm some of which if run on a specialized hardware will run at O(n) time. _________________ Will sell this place for beer |
|
| Back to top |
|
 |
extro...*
Guest
|
Posted: Mon Jun 26, 2006 12:19 am Post subject: 12 |
|
|
Er, ok.
Not sure.
But O(n*k), where k is the average size of the element, is going to be the same as O(n * lg(n)) if we're going to allow at n distinct elements - it will take O(lg(n)) bits to represent each of n distinct values.
Bicho: While O(k) may be the worst case cost of comparison of two arbitrary strings (or numbers) of size k, I think the average cost is constant (in fact, in comparing two bit strings, on average we'll compare 2 bits, I think). |
|
| Back to top |
|
 |
extro...*
Guest
|
Posted: Mon Jun 26, 2006 2:15 am Post subject: 13 |
|
|
I don't think there can be a sorting algorithm that does better than O(n lg n) average or worst case without some significant restrictions on the input it can handle.
For comparison based sorts, the proof of that is based on the fact that a binary tree with n! leaf nodes has O(n lg n) depth. Any sorting algorithm must decide which of the n! possible permutations of the n element list it is sorting is the correct one (the sorted one). Whether they're comparisons or some other operations, I think you can always build a tree representing all possible sequences of operations that will be performed by that algorithm for all possible input sequences of size n, that tree then having at least n! leaf nodes, and thus a depth bounded by O(n lg n). That's an informal argument - not a proof - but I think the principle for a proof is there. |
|
| Back to top |
|
 |
Antrax
ESL Student
|
Posted: Mon Jun 26, 2006 6:42 am Post subject: 14 |
|
|
Yeah, that's the proof we've seen in class a while back.
In any case, now my interest is really piqued, so I'll find that professor tomorrow and ask him for a name. He specifically said better than nlogn for worst case without having to assume anything on the input apart from it being numbers. _________________ After years of disappointment with get rich quick schemes, I know I'm gonna get rich with this scheme. And quick! |
|
| Back to top |
|
 |
Bicho the Inhaler
Daedalian Member
|
Posted: Mon Jun 26, 2006 9:12 am Post subject: 15 |
|
|
| extro...* wrote: |
| Bicho: While O(k) may be the worst case cost of comparison of two arbitrary strings (or numbers) of size k, I think the average cost is constant (in fact, in comparing two bit strings, on average we'll compare 2 bits, I think). |
Touché...yes, comparing two random numbers of a given length, you expect to compare exactly two bits. |
|
| Back to top |
|
 |
Nsof
Daedalian Member
|
|
| Back to top |
|
 |
Bicho the Inhaler
Daedalian Member
|
Posted: Mon Jun 26, 2006 6:48 pm Post subject: 17 |
|
|
I also thought about MSB radix algorithms. This will assume all the numbers are the same length. First sort everything by the most significant bit. The array now has two sections: numbers beginning with 0 and numbers beginning with 1. Recurse on each section, ignoring the first bit. The basis case is a subarray with one element.
Its complexity is Theta(n*j), where j is the average number of bit comparisons needed to distinguish adjacent elements of the sorted list. (The sorting at each level of recursion can be implemented in such a way that each bit at that level is visited exactly once. Overall, any given bit of a given input number is visited at most once, and it is visited only if necessary to distinguish that number from one of the numbers next to it.) However, if the numbers are long, then I don't think this can be better than Omega(n*log(n)). If the numbers are extremely long and some of them are equal or nearly equal, then it could be a lot worse than n*log(n), but then I think the same would apply to all sorting algorithms, since it takes a lot of work just to tell the numbers apart. |
|
| Back to top |
|
 |
Jack_Ian
Big Endian
|
Posted: Tue Jun 27, 2006 5:19 pm Post subject: 18 |
|
|
Assuming we're talking about integers, comparing individual bits of numbers takes longer than comparing the actual numbers since the bit must first be extracted before it's compared and the comparison is performed on two full words.
In real situations, the problem involves looking at the broader picture.
Sometimes, the fastest solutions involve minimising the "swaps" or minimising memory paging or caching chunks of the data to sort in memory and then merging the chunks.
I remember one case where I was forced to slow down my sorting as it was so efficient that it hogged computer resources and caused a noticeable drop in performance for other users on the system. I was particularly proud of my solution at the time, as it was written in COBOL and just screamed along (no mean feat for a COBOL program). It almost killed me to introduce the delay in my code. I remember it was encased by a large comment describing the reasons why it was done and who had forced me to do it and how the whole thing just sucked. |
|
| Back to top |
|
 |
Bicho the Inhaler
Daedalian Member
|
Posted: Tue Jun 27, 2006 10:51 pm Post subject: 19 |
|
|
| Jack_Ian wrote: |
| I remember one case where I was forced to slow down my sorting as it was so efficient that it hogged computer resources and caused a noticeable drop in performance for other users on the system. I was particularly proud of my solution at the time, as it was written in COBOL and just screamed along (no mean feat for a COBOL program). It almost killed me to introduce the delay in my code. I remember it was encased by a large comment describing the reasons why it was done and who had forced me to do it and how the whole thing just sucked. |
Allocating resources effectively seems more like a job for the operating system than for individual users' programs, but then operating systems have come a long way in the past couple decades. |
|
| Back to top |
|
 |
extro...*
Guest
|
Posted: Wed Jun 28, 2006 4:39 am Post subject: 20 |
|
|
| Jack_Ian wrote: |
| ... it was so efficient that it hogged computer resources and caused a noticeable drop in performance for other users on the system. |
I'm struggling to imagine a context in which that could be possible (i.e. that some measure of "efficiency" would have that negative effect). |
|
| Back to top |
|
 |
Antrax
ESL Student
|
Posted: Wed Jun 28, 2006 7:01 am Post subject: 21 |
|
|
| Quote: |
| comparing individual bits of numbers takes longer than comparing the actual numbers since the bit must first be extracted before it's compared and the comparison is performed on two full words. |
Well, the context in which it was brought up was after a lecture on circuit complexity, where he mentioned that so far no NP problem has a non-trivial lower bound on circuit complexity. That's where I asked "what about sorting?" and the whole thing started.
(so, in the context of circuit complexity, it does make sense to look at the input as a long bitstream)
Anyway, I asked him again, and he said he doesn't remember offhand, but I should email him. That was yesterday. Watch this space, I guess  _________________ After years of disappointment with get rich quick schemes, I know I'm gonna get rich with this scheme. And quick! |
|
| Back to top |
|
 |
Antrax
ESL Student
|
Posted: Wed Jun 28, 2006 7:03 am Post subject: 22 |
|
|
And, typically, he just replied. Here's his email:
I haven't read it myself, yet, though. Off to class. _________________ After years of disappointment with get rich quick schemes, I know I'm gonna get rich with this scheme. And quick! |
|
| Back to top |
|
 |
Jack_Ian
Big Endian
|
Posted: Wed Jun 28, 2006 3:09 pm Post subject: 23 |
|
|
| extro...* wrote: |
| Jack_Ian wrote: |
| ... it was so efficient that it hogged computer resources and caused a noticeable drop in performance for other users on the system. |
I'm struggling to imagine a context in which that could be possible (i.e. that some measure of "efficiency" would have that negative effect). |
The program involved reading enormous amounts of historical data (many millions of records) stored sequentially in chronological order (essentially a log file) and producing another file, sorted in ProductClass/ProductCode/Date order for further analysis, looking for seasonal trends etc. It was to run on the primary processing computer for a VERY-large warehouse and was scheduled to run overnight, along with other processes, such as the printing of delivery dockets for the trucks that would begin arriving early the next morning. As the analysis progressed, the requirements for the sorted file would change and the program would need to be modified and run again.
A colleague bet me that I could not get the program to complete in a single night and I accepted.
Because I was very aware of the strengths of both the system and the hardware, I came up with a solution that created multiple sorted file-chunks of about 100 thousand records each, distributed over several different physical disks (there were about 20 in the disk bank) and then merged them 19 at a time (# disks - 1), forming larger and larger contiguous files, deleting the processed files as I went, until the final merge, resulting in the desired file.
The source file was approximately in PurchaseOrderNr/ProductClass/ProductCode order and I used this fact to merge several Purchase orders together at a time, creating one file, in the required order, for each date. This resulted in a few thousand files distributed over several physical disks.
I had arranged the file selection so that 19 disks would be sequentially reading while one disk was sequentially writing and had pre-allocated the required physical file chunk required for each disk over the duration of the program and ensured that it was contiguous.
Because the hardware prioritised disk access using proximity as a parameter, my program hogged the disks, even when my process was given a lower priority.
The end result was that other processes scheduled for overnight, would run sluggishly. Delivery dockets were printed in spurts and the warehouse staff became worried that they would not be ready in time to prepare deliveries for the following morning.
I had to introduce a delay to prevent the disk schedulers from giving me priority (essentially undo the work I had done to optimise the performance of my program) and my program would then run over several nights in order to complete instead of a single night.
Since the program did initially complete in a single night though, I won the bet.  |
|
| Back to top |
|
 |
Antrax
ESL Student
|
Posted: Wed Jun 28, 2006 4:50 pm Post subject: 24 |
|
|
So what'd you win? _________________ After years of disappointment with get rich quick schemes, I know I'm gonna get rich with this scheme. And quick! |
|
| Back to top |
|
 |
Jack_Ian
Big Endian
|
Posted: Thu Jun 29, 2006 5:35 am Post subject: 25 |
|
|
| Antrax wrote: |
| So what'd you win? |
Can't remember, but the standard bet was either lunch or a pint (sometimes they were one and the same).  |
|
| Back to top |
|
 |
|
|
You cannot post new topics in this forum You can reply to topics in this forum You cannot edit your posts in this forum You cannot delete your posts in this forum You cannot vote in polls in this forum
|
Powered by phpBB © 2001, 2005 phpBB Group
|
|